CN115691486A

CN115691486A - Voice instruction execution method, electronic device and medium

Info

Publication number: CN115691486A
Application number: CN202110859300.5A
Authority: CN
Inventors: 王烨东; 郁东健; 李艳明; 张腾; 燕瑞
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2023-02-03

Abstract

The application provides a voice instruction execution method, an electronic device and a medium, wherein the voice instruction execution method comprises the following steps: the electronic equipment analyzes the current display interface after detecting a voice instruction of a user to acquire analysis content; acquiring a matching text of each control in the current display interface according to the analysis content and the corresponding relation between each control information in each display interface of the electronic equipment and the text information and the icon information; matching the text in the voice instruction with the matched text of each control in the current display interface to obtain a target control corresponding to the voice instruction; and correspondingly operating the target control according to the operation in the voice instruction. According to the technical scheme, the target control corresponding to the voice instruction of the user can be accurately found, so that the execution accuracy of the voice instruction can be effectively improved.

Description

Voice instruction execution method, electronic device and medium

Technical Field

The present disclosure relates to the field of voice control technologies, and in particular, to a method, an electronic device, and a medium for executing a voice command.

Background

With the development of artificial intelligence technology, the voice intelligent platform or the voice assistant can recognize the voice input of the user and generate corresponding operation instructions under certain conditions, so that great convenience is provided for the user to use the voice to control electronic equipment such as a mobile phone, a tablet computer, a vehicle computer and the like, and the voice intelligent platform or the voice assistant is widely applied.

As shown in fig. 1 (a), a bluetooth setting interface is displayed on the mobile phone of the electronic device, and if a user wants to connect the electronic device named as "beatss Solo", the user can control the mobile phone to perform an operation of establishing a communication connection with the bluetooth electronic device named as "beatss Solo" by sending a voice command of corresponding connection to the beatss Solo. For example, the corresponding voice command may be "click on Beats Solo" or "connect to Beats Solo" or the like. After the mobile phone executes the voice command of "clicking the beacons Solo", the bluetooth interface state is as shown in fig. 1 (b), which is a state of performing bluetooth pairing connection between the current electronic device and the electronic device named by "beacons Solo".

In the prior art, a method for a mobile phone to execute a voice instruction of a user generally includes analyzing a current interface to obtain content on the current interface, and then matching the voice instruction with the content on the current interface to obtain a control corresponding to the voice instruction, so as to execute corresponding operation on the control according to the voice instruction.

Disclosure of Invention

In order to solve the above-mentioned technical problem that in the prior art, because interface parsing is generally not comprehensive enough or corresponding processing is not performed on parsed contents, it is sometimes difficult for an electronic device to find a control corresponding to a voice instruction, and thus a voice instruction of a user cannot be executed or cannot be accurately executed, a first aspect of an embodiment of the present application provides a voice instruction execution method, which can be applied to an electronic device, and the method includes:

the electronic equipment detects a voice instruction of a user; the voice instructions include operations and text.

The electronic equipment analyzes the current display interface to obtain analysis content;

the electronic equipment acquires the matching text of each control in the current display interface according to the analysis content and the corresponding relation between each control information in each display interface of the electronic equipment and the text information and the icon information respectively;

the electronic equipment matches a text in the voice instruction with a matching text of each control in the current display interface, and takes a control corresponding to the matching text matched with the text in the voice instruction as a target control corresponding to the voice instruction;

and the electronic equipment performs corresponding operation on the target control according to the operation in the voice instruction.

According to the voice instruction execution method, after the electronic equipment obtains the analysis content of the current interface, the analysis content is further subjected to subsequent processing, the control of the current interface is subjected to text matching to obtain the matching text of each control, so that after the electronic equipment receives the voice instruction of the user, the matching text matched with the text in the voice instruction of the user can be accurately found, the corresponding control is obtained according to the found matching text, and corresponding operation is performed on the control according to the operation in the voice instruction. The method can effectively improve the accuracy of voice instruction execution.

It can be understood that, in the embodiment of the present application, the electronic device may recognize the voice instruction of the user to obtain the operation and the text included in the voice instruction. The operation in the voice instruction may be a specific action, for example, "click", "open", and the like, and the text in the voice instruction may be an operation object corresponding to the operation in the voice instruction, for example, if the voice instruction is "open bluetooth", the operation in the voice instruction is "open", and the operation object (text) corresponding to the operation "open" is "bluetooth".

In some embodiments, the electronic device may recognize the voice command of the user by parsing the voice command of the user, and specifically, the voice command may be parsed by a lifting slot mentioned in the following embodiments to obtain the operation and the text included in the voice command.

It can be understood that, in the embodiment of the present application, the currently displayed interface of the electronic device may be an interface that is currently displayed when the electronic device receives a voice instruction of a user.

It can be understood that the parsing content of the current display interface mentioned in the embodiment of the present application is consistent with the parsing result of the current display interface mentioned in the embodiment of the present application.

In a possible implementation of the first aspect, the analysis content of the current display interface includes text information, icon information, and control information of the current display interface;

the text information comprises a text and position information corresponding to the text;

the icon information comprises a conversion text of an icon and position information corresponding to the icon;

the control information comprises the type of the control and the position information corresponding to the control.

It can be understood that the position information corresponding to the text may be a position or an area where the text is located, the position information corresponding to the icon may be a position or an area where the icon is located, and the position information corresponding to the control may be a position or an area where the control is located.

It is to be understood that, in some embodiments, the positions or areas of the text, the icon, and the control mentioned above may all be presented in the form of coordinates, for example, taking the position or area where the text is located as an example, if the display interface of the mobile phone is a coordinate system, the position corresponding to the text is a specific coordinate point of the center position of the text, the area where the text is located is an area surrounded by a plurality of coordinate points on the periphery of the text, for example, the area may be a rectangular area, and the text is located in the rectangular area.

In a possible implementation of the first aspect, the correspondence between each control information in each display interface and the text information and the icon information is obtained by labeling the correspondence between the control information and the text information of all interfaces of the electronic device.

In the embodiment of the application, each display interface of the electronic device can be obtained by manually intercepting the display interfaces in various application programs of the electronic device.

It is understood that the number of each display interface in the display interfaces of the various applications of the electronic device mentioned above may be one, or may be multiple at different times. For example, most display interfaces are changed in real time, such as a bluetooth interface, and devices that can be searched in the periphery are different at different positions, so that the display interfaces are different. Therefore, the corresponding relation between the control information and the text information can be marked by intercepting the interface at a plurality of moments, and the corresponding relation between the control information and the text information can be analyzed and summarized to more accurately obtain the corresponding relation between the control information and the text information of the interface.

In an implementable solution of the embodiment of the present application, after obtaining the correspondence between each control information, text information, and icon information in the display interface of the electronic device, the correspondence may be stored in an artificial intelligence module such as a voice recognition module of the electronic device. After the electronic equipment acquires the analysis content, the analysis content can be sent to the voice recognition module, and the voice recognition module directly performs text matching on the controls on the current display interface according to the stored labeling data to acquire the matching text of each control in the current display interface.

In some embodiments, the voice recognition module may be disposed in a processor of an electronic device, and may be configured to execute the voice instruction execution method mentioned in the embodiments of the present application.

In another implementable scheme of the embodiment of the present application, after labeling the correspondence between the control information, the text information, and the icon information of all interfaces of the electronic device, all labeled data may be trained to obtain the fusion model. Therefore, the corresponding relation between the control information, the text information and the icon information of each interface of the electronic equipment is more accurately acquired. For example, for the display interface which can change in real time, the corresponding relation between the control information and the text information and the corresponding relation between the icon information can be labeled by intercepting the interface at a plurality of moments, and the corresponding relation between the control information and the text information of the interface can be acquired more accurately by performing continuous machine learning and training.

It can be understood that the fusion model may include correspondence between control information, text information, and icon information of all interfaces of the trained electronic device. When the analysis content of the current display interface is input into the fusion model, the corresponding relation between the control of the current display page and all texts (including the original text and the text converted by the icon) can be output after matching through the fusion model.

It can be understood that the correspondence between the control information and the text information of each interface of the electronic device may include a correspondence between a control and a text, and a correspondence between a position corresponding to the control and a position corresponding to the text; the correspondence between the control information and the icon information may include a correspondence between a text converted from the control and the icon, and a correspondence between a position corresponding to the control and a position corresponding to the icon.

In a possible implementation of the first aspect, the electronic device obtains a matching text of each control in the current display interface according to the analysis content and a corresponding relationship between each control and text information in each display interface of the electronic device; the method comprises the following steps:

the electronic equipment acquires the text and/or the converted text of the icon corresponding to each control in the current display interface according to the analyzed content and the corresponding relationship between each control in each display interface of the electronic equipment and the text information;

and combining the converted texts of the texts and/or icons which have corresponding relations with the controls in the current display interface to obtain the matched texts of the controls in the current display interface.

It can be understood that in the embodiment of the present application, one text may be correspondingly matched to one control, may also be correspondingly matched to multiple controls, or may not be corresponding to any control. For example, the "Beats Solo" text shown in FIG. 9 may correspond to a "Beats Solo" control as well as a "settings" icon control. And the "paired device" text does not have any control that can correspond.

In a possible implementation of the first aspect, the text corresponding to each control in the current display interface exists in the current display interface and/or a previous interface of the current display interface.

It is understood that in some embodiments, the text having correspondence with each control in the current display interface may exist in the current display interface, for example, the text corresponding to the "Beats Solo" control in fig. 9 is the "Beats Solo" text, and the "Beats Solo" text exists in the current display interface.

It will be appreciated that in other embodiments, some of the corresponding text of some of the icon controls may not be text in the current interface. For example, the theme text of the interface above the current interface may be used.

For example, for the "back" icon control in the bluetooth interface in fig. 8, when the "back" icon control is clicked, the previous level interface setting interface of the bluetooth interface may be returned. Therefore, when the return icon is labeled, besides the text converted from the return icon labeled in the bluetooth interface, the return icon also has a corresponding relationship with the theme text "set" of the previous level interface setting interface.

Therefore, if the analysis result of the bluetooth interface, that is, all controls and positions thereof, texts and positions thereof, and texts and positions thereof after icon conversion of the bluetooth interface are input into the fusion model, the text output by the fusion model and having a corresponding relationship with the return icon control includes the text after the return icon conversion of the current interface and the theme text "set" of the previous interface of the bluetooth interface.

In a possible implementation of the first aspect, the electronic device matches a text in the voice instruction with a matching text of each control in the current display interface, and takes a control corresponding to the matching text matching the text in the voice instruction as a target control corresponding to the voice instruction; the method comprises the following steps:

the electronic equipment matches the text in the voice instruction with the matching text of each control in the current display interface to obtain the matching degree of the text in the voice instruction and the matching text of each control in the current display interface;

determining a matching text matched with the text corresponding to the voice instruction according to the matching degree of the text in the voice instruction and the matching texts of all controls in the current display interface;

and taking the control corresponding to the matched text matched with the text in the voice instruction as the target control corresponding to the voice instruction.

In the embodiment of the application, the consistency between the text in the voice instruction and the matching texts of all the controls of the current interface can be respectively judged, when the consistency between the text in the voice instruction and the corresponding text of a certain control of the current interface is higher, the matching degree between the text in the voice instruction and the corresponding text of the certain control of the current interface is higher, and the control with the highest matching degree is selected as the target control of the voice instruction.

In the embodiment of the application, the control corresponding to the text in the user voice instruction in the current interface can be accurately found by judging the consistency between the text in the voice instruction and the corresponding texts of all the controls in the current interface, so that the voice instruction of the user can be accurately executed.

In a possible implementation of the first aspect, the method further includes: and if the matching degree of the text corresponding to the voice instruction and the matching text of each control in the current display interface is lower than a first set value, the target control does not exist.

It is to be understood that, in some embodiments, a first setting value, that is, a matching degree threshold value, may be further set, and when the matching degree between the text in the voice instruction and the corresponding texts of all the controls of the current interface is lower than the matching degree threshold value, it represents that there is no operable control, that is, there is no target control, and then no operation is performed.

In a possible implementation of the first aspect, the electronic device is a mobile phone.

In the voice instruction execution method provided by the first aspect of the embodiment of the application, after obtaining the analysis content of the current interface, the electronic device performs further subsequent processing on the analysis content, and the subsequent processing is performed by performing text matching on the controls of the current interface to obtain the matching text of each control, so that after receiving the voice instruction of the user, the electronic device can accurately find the matching text matched with the text in the voice instruction of the user, further obtain the corresponding control according to the found matching text, and perform corresponding operation on the control according to the operation in the voice instruction. The method can effectively improve the accuracy of voice instruction execution.

A second aspect of embodiments of the present application provides an electronic device, including:

a memory for storing instructions for execution by one or more processors of the electronic device, an

A processor, which is one of the one or more processors of the electronic device, to execute the voice instruction execution.

A third aspect of the embodiments of the present application provides a computer-readable medium, which has instructions stored thereon, and when the instructions are executed on a machine, the instructions cause the machine to execute the above-mentioned voice instruction execution method.

A fourth aspect of the embodiments of the present application provides a computer program product, where the computer program product includes instructions for implementing the above-mentioned voice instruction execution method.

A fifth aspect of an embodiment of the present application provides a chip apparatus, including:

a communication interface for inputting and/or outputting information;

and the processor is used for executing the computer executable program so that the equipment provided with the chip device executes the voice instruction execution method.

Drawings

Fig. 1 is a schematic view of an application scenario of a voice instruction execution method according to an embodiment of the present application;

FIG. 2 is a diagram illustrating a Bluetooth interface according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram illustrating a method for performing subsequent processing on an analysis result according to an embodiment of the present application;

fig. 4 is a schematic diagram illustrating a method for performing subsequent processing on an analysis result according to an embodiment of the present application;

fig. 5 is a schematic diagram illustrating a correspondence relationship between a part of controls of a bluetooth interface and a text according to an embodiment of the present application;

fig. 6 (a) is a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 6 (b) is a software block diagram of an electronic device according to an embodiment of the present application;

FIG. 7 is a flowchart illustrating a method for executing a voice command according to an embodiment of the present application;

fig. 8 is a schematic diagram illustrating an analysis result of a bluetooth interface according to an embodiment of the present application;

fig. 9 is a schematic diagram illustrating a correspondence relationship between a part of controls of a bluetooth interface and a text according to an embodiment of the present application;

fig. 10 is a schematic diagram illustrating a correspondence relationship between a part of controls of a bluetooth interface and a text according to an embodiment of the present application;

fig. 11 is a schematic view of a display interface after the "set" icon control is clicked in the embodiment of the present application.

Detailed Description

The embodiment of the application discloses a voice instruction execution method, electronic equipment and a medium.

As mentioned above, the user can control each control in the interface displayed on the electronic device through the voice instruction, and when the user controls the interface displayed on the electronic device through the voice instruction, the success rate of the electronic device for executing the voice instruction mainly depends on the analysis result of the electronic device on the screen interface and the subsequent processing of the analysis result in the voice instruction execution method.

It can be understood that the analysis result of the interface may specifically include text information, icon information, control information, and the like in the interface, where the text information may include a text and a position corresponding to the text, the icon information may include an icon, a text corresponding to the icon, and a position corresponding to the icon, and the control information may include a control category and a control position. For example, as shown in fig. 2, the bluetooth interface includes text such as "skins Solo" text 201, "huamei" text 203, bluetooth text 211, and the like; the control comprises a "Beats Solo" control 202, a "HUAWEI" control 204, a "switch" icon control 208, a "set" icon control 210, a "return" icon control 206 and the like; included are icons such as a "Back" icon 205, a "switches" icon 207, a "settings" icon 209, and a "cell phone" icon 213.

It is understood that, in the embodiment of the present application, the control is an object with which the user 001 may interact to input or operate data. The controls may include text controls and icon controls, among others. The text control can be a control which can be controlled by clicking text, and the text control shown in fig. 2 can include a "Beats Solo" control 202, a "HUAWEI" control 204, and the like; an icon control may refer to a control that can be manipulated by clicking on an icon. The icon controls shown in fig. 1 may include an "on-off" icon control 208, a "set" icon control 210, and a "back" icon control 206, among others.

It can be understood that, in the bluetooth interface shown in fig. 2, the content in the dashed square frame circle is a text, the content in the dashed oval frame is an icon, and the content in the solid bold square frame circle is a control.

In addition, it is understood that the subsequent processing of the parsing result may generally include text matching of a control in the parsing result, so that the electronic device can compare the text in the user's voice instruction with the matching text of the control in a consistent manner, thereby finding the control corresponding to the text in the user's instruction, and performing a specific operation on the control. For example, for the bluetooth interface, the subsequent processing of the parsing result may generally be performing text matching on a control on the interface, and the like.

Therefore, if the current display interface of the electronic device is not analyzed accurately and comprehensively, or the text matching of the control in the analysis result is inaccurate or lacking, the electronic device is difficult to compare the text in the voice instruction of the user with the matching text of the corresponding control, so that the control corresponding to the text in the user instruction is difficult to find. Causing the electronic device to fail to execute or fail to accurately execute the user's voice command.

For example, in some embodiments, after the electronic device parses the current interface in the voice instruction execution method, the manner of performing subsequent processing on the parsing result is to number the text, the icon, and the control of the current interface. Specifically, the text, the icon or the control of the current interface is numbered numerically. So that the user can operate the control of the display interface of the electronic equipment by sending out a voice instruction of clicking a specific number.

Specifically, as shown in fig. 3, the number of the control of the electronic device may be numbered, for example, the number of the "back" icon control 206 is (1), the number of the "switch" icon control 208 is (2), the number of the "Beats Solo" control 202 is (3), the number of the "huaawei" control 204 is (4), and the like, at this time, if the user 001 wants to make the electronic device 002 execute the operation of clicking on the "Beats Solo", the user 001 may find the number (3) corresponding to the text 201 of the "Beats Solo", and the issued instruction may be a voice instruction of "clicking on 3". At this time, the electronic device 002 may perform an operation of clicking the "Beats Solo" control 202 corresponding to the number (3).

By adopting the above way of performing subsequent processing on the analysis result, the user 001 needs to send an instruction for clicking a specific number to control the electronic device to execute the corresponding operation, so that the user 001 needs to spend time to perform one-to-one correspondence between the text and the number to send the instruction for clicking the specific number, the process is complicated, and the user 001 experience is poor.

In order to solve the above problem, an embodiment of the present application provides another method for executing a voice instruction, where a manner of performing subsequent processing on an analysis result by an electronic device is to perform text matching on a text control and an icon control of a current interface, so that a user can operate a control of a display interface of the electronic device by issuing a voice instruction for clicking a specific text.

In one aspect, the text matching is performed on the text control in the analysis result, and the matched text corresponding to the text control is the text located in the text control area.

Specifically, as shown in fig. 4, if the "bes Solo" control 202 is a text control, the text corresponding to the text control "bes Solo" control 202 is the text "bes Solo" located in the area of the "bes Solo" control 202. The "huabei" control 204 is a text control, and then the text control "huabei" control 204 is matched with the corresponding text as the text "huabei" located in the area of the "huabei" control 204.

And on the other hand, finding the associated text of the icon control by the icon control on the current interface according to the set rules of some common icons, taking the corresponding main body name of the icon in the area of the icon control as the initial text of the icon control, matching the associated text of the icon control with the initial text of the icon control, and acquiring the composite text of the icon.

For example, a commonly used "on-off" icon control 208 is set to have the text to the left of the "on-off" icon control 208 as the associated text for the "on-off" icon control. The initial text of the "switch" icon control is the corresponding body name "switch" of the "switch" icon 207 located within the area of the "switch" icon control 208. The composite text obtained after the initial text of the "switch" icon 207 is matched with the associated text is "bluetooth switch".

At this time, if the user 001 issues an instruction to turn on the bluetooth switch, the electronic device may be matched to the "switch" icon control 208, and perform an operation to turn on the bluetooth switch.

The interface analysis mode shown in fig. 4 can be used for analyzing the text control on the current interface, so that the user 001 performs voice control on the text control under most conditions, but the mode for setting the icon rule is only to set a few icons regularly, and because the types and positions of the icons are diversified, the above-mentioned embodiment cannot perform uniform rule setting on most icon controls. Therefore, for the icon control which is not set by the rule, the scheme cannot acquire the associated text of the icon control, and only the initial text of the icon control can be acquired.

For example, the "set" icon control 210 shown in FIG. 4 has no corresponding setting rules, and thus the "set" icon control 210 has no associated text, only the initial text, i.e., the corresponding subject name "set" for the "set" icon 209 located within the area of the "set" icon control 210. Therefore, when the user 001 issues a voice instruction of "clicking setting of the tables Solo", the electronic device cannot determine whether the text content of the voice instruction of the user 001 corresponds to the text "setting" corresponding to the "setting" icon control 210 or the text "tables Solo" corresponding to the "tables Solo" control 202, and therefore, the electronic device cannot determine whether the user wants to click the "tables Solo" control 202 or the "setting" icon control 210, so that the electronic device cannot perform corresponding operations according to the instruction of the user 001.

In order to solve the above problem, an embodiment of the present application further provides a third method for executing a voice instruction, where the method for executing a voice instruction includes matching an analyzed result with a fusion model having correspondence between texts and controls of all interfaces of an electronic device, and finding correspondence between all controls and texts on a current display interface, so as to accurately perform text matching on all controls on the interface. Therefore, the electronic equipment can accurately find the control matched with the voice instruction of the user 001 and perform corresponding operation on the control.

For example, the analysis result obtained by analyzing the bluetooth interface is input into the fusion model, and a part of the output correspondence relationship between the control of the bluetooth interface and the text is illustrated as fig. 5. As can be seen from the arrows in fig. 5, the corresponding texts corresponding to the "setting" icon control 210 are the text 201 of "Beats Solo" and the converted text of the "setting" icon 209; wherein the converted text of the "set" icon 209 is "set". The electronic device combines the text "Beats Solo" 201 and the text "set" that have a correspondence with the "set" icon control 210, and can acquire the matching text of the "set" icon control 210 as "Beats Solo set".

At this time, the user 001 issues a voice instruction of "click setting of the tables Solo", and the electronic device 002 matches the text content of the voice instruction of the user 001 with the matching text "tables Solo setting" corresponding to the "setting" icon control 210, so that the electronic device 002 executes an operation of clicking the "setting" icon control 210 according to the instruction of the user 001.

Before describing another voice instruction execution method provided by the embodiment of the present application in detail, the electronic device provided by the embodiment of the present application is first described below.

It is to be appreciated that reference to an electronic device in embodiments of the present application includes, but is not limited to, a laptop computer, a desktop computer, a tablet computer, a smartphone, a server, a wearable device, a head-mounted display, a mobile email device, a portable game console, a portable music player, a reader device, a television having one or more processors embedded or coupled therein, or other electronic device 002 having computing functionality.

For convenience of description, an electronic device will be described as an example of the mobile phone 002.

As shown in fig. 6 (a), the mobile phone 002 may include a processor 110, a power module 140, a memory 180, a mobile communication module 130, a wireless communication module 120, a sensor module 190, an audio module 150, a camera 170, an interface module 160, keys 101, a display screen 102, and the like.

It is to be understood that the structure illustrated in the embodiment of the present invention is not specifically limited to the mobile phone 002. In other embodiments of the present application, the handset 002 can include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more Processing units, for example, a Processing module or a Processing circuit that may include a Central Processing Unit (CPU), an image Processing Unit (GPU), a Digital Signal Processor (DSP), a neural Network Processing Unit (NPU), a microprocessor Unit (MCU), an Artificial Intelligence (AI) processor, or a Programmable logic device (FPGA), etc. The different processing units may be separate devices or may be integrated into one or more processors. A memory unit may be provided in the processor 110 for storing instructions and data. In some embodiments, the storage unit in processor 110 is cache 180. The processing unit can execute the voice instruction execution method provided by the embodiment of the application.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can realize applications such as intelligent recognition of the mobile phone 002, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The power module 140 may include a power supply, power management components, and the like. The power source may be a battery. The power management component is used for managing the charging of the power supply and the power supply of the power supply to other modules. In some embodiments, the power management component includes a charge management module and a power management module. The charging management module is used for receiving charging input from the charger; the power management module is used for connecting a power supply, the charging management module and the processor 110. The power management module receives power and/or charge management module input and provides power to the processor 110, the display 102, the camera 170, and the wireless communication module 120.

The mobile communication module 130 may include, but is not limited to, an antenna, a power amplifier, a filter, an LNA (Low noise amplifier), and the like. The mobile communication module 130 may provide a solution for wireless communication including 2G/3G/4G/5G, etc. applied to the cellular phone 002.

The wireless communication module 120 may include an antenna, and implement transceiving of electromagnetic waves via the antenna. The wireless communication module 120 may provide a solution for wireless communication applied to the mobile phone 002, including Wireless Local Area Networks (WLANs) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and the like. The handset 002 can communicate with a network and other devices via wireless communication techniques.

In some embodiments, the mobile communication module 130 and the wireless communication module 120 of the handset 002 may also be located in the same module.

The display screen 102 is used for displaying human-computer interaction interfaces, images, videos and the like. The display screen 102 includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), or the like. In this embodiment, the display screen 102 may be used to display various application interfaces of the mobile phone 002.

The sensor module 190 may include a proximity light sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.

The audio module 150 is used to convert digital audio information into an analog audio signal output or convert an analog audio input into a digital audio signal. The audio module 150 may also be used to encode and decode audio signals. In some embodiments, the audio module 150 may be disposed in the processor 110, or some functional modules of the audio module 150 may be disposed in the processor 110. In some embodiments, audio module 150 may include speakers, earphones, a microphone, and a headphone interface. In this embodiment, the audio module 150 may be configured to receive a voice instruction of a user.

The camera 170 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The light receiving element converts an optical Signal into an electrical Signal, and then transmits the electrical Signal to an ISP (Image Signal Processing) to convert the electrical Signal into a digital Image Signal. The mobile phone 002 can implement a shooting function through the ISP, the camera 170, the video codec, the GPU (graphics Processing Unit), the display screen 102, the application processor, and the like.

The interface module 160 includes an external memory interface, a Universal Serial Bus (USB) interface, a subscriber 001 identification module (SIM) card interface, and the like. The external memory interface may be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capability of the mobile phone 002. The external memory card communicates with the processor 110 through an external memory interface to implement a data storage function. The usb interface is used for communication between the mobile phone 002 and other electronic devices 002. The subscriber 001 identity module card interface is used to communicate with a SIM card mounted to the handset 00210, such as to read a phone number stored in the SIM card or to write a phone number into the SIM card.

In some embodiments, the handset 002 also includes keys 101, motors, indicators, and the like. The keys 101 may include a volume key, an on/off key, and the like. The motor is used to generate a vibration effect to the mobile phone 002, for example, when the mobile phone 002 of the user 001 is called, to prompt the user 001 to answer the incoming call of the mobile phone 002. The indicators may include laser indicators, radio frequency indicators, LED indicators, and the like.

The software system of the mobile phone 002 may adopt a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present invention takes an Android system with a layered architecture as an example, and exemplifies a software structure of the electronic device 002.

Fig. 6 (b) is a block diagram of a software configuration of a mobile phone 002 according to the embodiment of the present invention.

The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, which are an application layer, an application framework layer, a system library, and a kernel layer from top to bottom. The application layer may include a series of application packages.

As shown in fig. 6 (b), the application package may include voice assistant, bluetooth, settings, etc. applications.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. After receiving the voice instruction of the user 001, the voice assistant may acquire the current display interface of the mobile phone 002.

As shown in fig. 6 (b), the application framework layer may include a content provider, a view system, an explorer, and the like.

The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

In this embodiment, the voice assistant may control the mobile phone 002 to obtain, through a standard interface of an operating system installed thereon, an interface object corresponding to the currently displayed interface, such as a text and a control on the interface, from the view system, and obtain a view structure corresponding to the interface object.

The content provider is used to store and retrieve data and make it accessible to applications. The data may include interface content, video, images, audio, calls made and answered, browsing history and bookmarks, phone books, etc. of the various applications. In an embodiment of the application, the content provider may be accessed by a voice assistant for obtaining instant interface content of each application stored in the content provider.

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application layer and the application framework layer as binary files. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface managers (surface managers), media Libraries (Media Libraries), and the like.

The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like.

The kernel layer is a layer between hardware and software, and at least comprises a display driver, a camera driver, an audio driver and the like.

Wherein the voice assistant in the application package may be a system level application. The voice assistant may also be called a human-computer interaction robot, a human-computer conversation robot, or a chat robot (ChatBOT), among others. The voice assistant application may also be referred to as a smart assistant application, etc. The voice assistant is widely applied to various electronic devices such as mobile phones, tablet computers, intelligent sound boxes and intelligent televisions at present, and an intelligent voice interaction mode is provided for the user 001. The voice assistant is one of the cores of human-computer interaction.

The third voice command execution method according to the embodiment of the present application is described in detail below with reference to the mobile phone 002. Fig. 7 shows a schematic diagram of a voice instruction execution method, wherein the voice instruction execution method shown in fig. 7 may be executed by the voice assistant application of the handset 002 by controlling the processor 110 of the handset 002. As shown in fig. 7, the voice instruction execution method shown in fig. 7 includes:

s701, receiving a voice instruction of a user 001;

in the embodiment of the present application, when the user 001 can speak a voice instruction, for example, as shown in fig. 5, the user 001 issues a voice instruction of "clicking the setting of the tables Solo". The electronic device 002 can receive the voice instruction.

It is understood that the voice instruction may be issued after the user 001 observes the current interface of the electronic device 002, and the voice instruction is intended to implement a setting operation on a certain control on the current interface of the electronic device 002.

In some embodiments, the user 001 needs to speak the wake-up word before speaking the voice command to wake the electronic device 002 to begin receiving the voice command of the user 001, for example, the user 001 speaks the "hi, xiao" wake-up word, and the voice assistant of the electronic device 002 turns on and can receive the voice command of the user 001.

In some embodiments, the electronic device 002 may display a prompt message on the current interface in the process of receiving the voice instruction of the user 001 to prompt the user 001 to use the voice recognition function of the electronic device 002. The prompt message may be in the form of a text "in speech recognition," or may be in the form of an icon corresponding to the text "in speech recognition," or the like.

S702, acquiring a current display interface of the electronic equipment 002;

it is to be understood that, in the embodiment of the present application, the currently displayed interface of the electronic device 002 may be an interface currently displayed when the electronic device 002 receives a voice instruction of the user 001, as shown in fig. 8, the currently displayed interface may be a bluetooth interface.

In some embodiments, if the system of the electronic device 002 is an android system, the electronic assistant may obtain the current display interface of the electronic device 002 through an android standard interface.

And S703, analyzing the current interface to obtain analysis content. The analysis content may include text information, icon information, and control information of the current display interface.

The text information may include text and position information corresponding to the text; it is understood that the position information corresponding to the text may be a position or an area where the text is located.

The icon information may include a conversion text of the icon and position information corresponding to the icon; it is understood that the position information corresponding to the icon may be a position or an area where the icon is located. It will be appreciated that the translated text of an icon may be the corresponding subject name of the icon. For example, the corresponding subject name of a switch icon is a switch, the corresponding subject name of a setup icon is a setup, and so on.

The control information may include a category of the control and position information corresponding to the control. It is understood that the position information corresponding to the control may be a position or an area where the control is located.

In some embodiments, the positions or areas of the text, the icon, and the control mentioned above may all be presented in the form of coordinates, for example, taking the position or area where the text is located as an example, taking a display interface of a mobile phone as a coordinate system, where the position corresponding to the text is a specific coordinate point of a center position of the text, where the area where the text is located is an area surrounded by a plurality of coordinate points on the periphery of the text, for example, the area may be a rectangular area, and the text is located in the rectangular area.

It can be understood that, in the embodiment of the present application, a result of analyzing the bluetooth interface may be as shown in fig. 8, and texts included in the obtained bluetooth interface may include a "Beats Solo" text 201, a "HUAWEI" text 203, a "bluetooth" text 211, and the like; the included controls can be a "Beats Solo" control 202, a "HUAWEI" control 204, a "switch" icon control 208, a "set" icon control 210, a "return" icon control 206 and the like; included icons may be a "back" icon 205, a "switch" icon 207, a "set" icon 209, a "cell phone" icon 213, and so on;

the above-mentioned controls can be divided into text controls and icon controls as before. For example, the text controls shown in FIG. 8 may include a "Beats Solo" control 202, a "HUAWEI" control 204, and so forth; the icon controls shown in fig. 8 may include an "on-off" icon control 208, a "set" icon control 210, and a "back" icon control 206, among others.

It can be understood that the content in the dotted square frame circle shown in fig. 8 is text, the content in the dotted oval frame circle is an icon, and the content in the solid bold square frame circle is a control.

In this embodiment of the application, the specific manner for the electronic device 002 to analyze the current interface to obtain the analysis content may be as follows:

traversing each element of the current interface, if some elements are texts, directly extracting the texts in a character string form to obtain specific texts, if some elements are icons, extracting the icons in a bitmap form to obtain images corresponding to the icons, and performing subsequent processing on the images corresponding to the icons to obtain the texts corresponding to the images, namely finally obtaining the texts corresponding to the icon elements. If some elements are controls, the categories of the controls are directly obtained. In addition, the position information corresponding to each element in the current display interface is respectively obtained.

The mode of traversing each element of the current interface is as follows: the method comprises the steps of obtaining a root view of a current interface, obtaining a top view of the current interface according to the root view of the current interface, and traversing each sub-view in the current interface from the top view, namely each element in the current interface.

In some embodiments, the manner of converting the image corresponding to the icon into the specific text may be:

after the image corresponding to the icon is acquired, the image is preprocessed, for example, cropped or scaled, to meet the requirements of the image processing algorithm. And then classifying the preprocessed image through a classification model, acquiring a text corresponding to the icon information according to the classification, and taking the text as the text after icon conversion.

The method for classifying the preprocessed images through the classification model can be that the preprocessed images are input into the classification model, the features of the input images are extracted through the classification model and clustered to obtain a plurality of feature sets containing different features, the feature sets are input into a trained classifier in the classification model to obtain the maximum probability classification of the images, and then the texts corresponding to the images are obtained through the maximum probability classification of the images.

And S704, acquiring a matching text of each control in the current display interface according to the analysis content and the corresponding relation between each control information, the text information and the icon information in each display interface of the electronic equipment.

It is to be understood that the number of each display interface in the display interfaces of the electronic device mentioned above may be one, or may be a plurality of display interfaces at different times. For example, most display interfaces are changed in real time, such as a bluetooth interface, and devices that can be searched in the periphery are different at different positions, so that the display interfaces are different. Therefore, the corresponding relation between the control information and the text information can be marked by intercepting the interface at a plurality of moments, and the corresponding relation between the control information and the text information can be analyzed and summarized to more accurately obtain the corresponding relation between the control information and the text information of the interface.

The corresponding relation among the control information, the text information and the icon information in each display interface can be obtained in various ways: in some embodiments, the annotation data may be obtained by obtaining each display interface of the electronic device, and then annotating the correspondence between each control information, the text information, and the icon information on each display interface. And storing the marked data in artificial intelligent software such as a voice recognition module of the electronic equipment. After the electronic equipment acquires the analysis content, the analysis content can be sent to the voice recognition module, and the voice recognition module directly performs text matching on the controls on the current display interface according to the stored labeling data to acquire the matching text of each control in the current display interface.

It is understood that the speech recognition module can be disposed in a processor of the electronic device, and can be used to execute the speech instruction execution module shown in fig. 7 in the embodiment of the present application.

Each display interface of the electronic equipment can be acquired by manually intercepting interfaces in various application programs of the electronic equipment.

In other embodiments, the correspondence between each control information, the text information, and the icon information in each display interface of the electronic device may be obtained through a fusion model.

For example, the embodiment of the application can manually intercept interfaces in various application programs of the electronic device, and mark icon information, text information and control information with corresponding relations on the interfaces. And training all the marked data to obtain a fusion model. Therefore, the corresponding relation between the control information, the text information and the icon information of each interface of the electronic equipment is more accurately acquired. For example, for the display interface which changes in real time, the corresponding relation between the control information and the text information and the corresponding relation between the icon information and the control information can be labeled by intercepting the interface at a plurality of moments, and the corresponding relation between the control information and the text information of the interface can be acquired more accurately by performing continuous machine learning and training.

It can be understood that the fusion model may include correspondence between control information, text information, and icon information of all interfaces of the electronic device. When the analysis content of the current display interface is input into the fusion model, the corresponding relation between the control of the current display page and all texts (including the original text and the text converted by the icon) can be output after matching through the fusion model.

Further, the electronic device may obtain the text having a correspondence relationship with each control according to the correspondence relationship between the controls of the current display page and all the texts, where the text having a correspondence relationship with one control may be one or multiple, and then the corresponding texts of each control on the current display interface may be combined, so as to obtain the matching text of each control.

In the text and the control with the corresponding relationship on the labeling interface, the text and the control with the corresponding relationship may mean that the text and the control have a certain contact, taking the label of the bluetooth interface as an example, as shown in fig. 8, clicking the text 201 of the "tables Solo" may control the control 202 of the "tables Solo", and then the text 201 of the "tables Solo" and the control 202 of the "tables Solo" have the corresponding relationship. Then the text 201 of "bes Solo" on the bluetooth interface and the control 202 of "bes Solo" can be labeled as having a corresponding relationship when labeled manually.

For another example, the "switch" icon control 208 on the bluetooth interface shown in fig. 8 mainly controls the bluetooth to be turned on and off, that is, the bluetooth and the switch icon control have a corresponding relationship, and when manually labeled, the "bluetooth" text 211 on the bluetooth interface and the "switch" icon control 208 may be labeled as having a corresponding relationship.

For another example, the "setting" icon control 210 on the bluetooth interface shown in fig. 8 mainly displays the detailed information of the electronic device corresponding to the left "Beats Solo" text 201, and when the "setting" icon control 210 and the "Beats Solo" text 201 are labeled to have a corresponding relationship through manual labeling.

It is understood that the above labeling of the bluetooth interface is only an example, and the training of the fusion model requires the labeling and fusion of all interfaces in the application. After the analysis content on a certain interface, such as all controls and position information thereof, texts and position information thereof, and texts converted by icons and position information corresponding to icons, is input into the fusion model, the corresponding relation between all controls on the interface and the texts or the texts converted by the icons can be directly output, wherein the texts on the interface can correspond to one control or a plurality of controls, and one control can correspond to one text or a plurality of texts.

For example, the parsed content of the bluetooth interface shown in fig. 8 is input into the fusion model, and the part of the correspondence relationship between the control of the output bluetooth interface and the text is exemplified as shown in fig. 9.

As can be seen from the arrows in fig. 9, the text corresponding to the "setting" icon control 210 is the text 201 of "Beats Solo" and the converted text of the "setting" icon 209; wherein the converted text of the "set" icon 209 may be "set".

After the text having the corresponding relationship of each control is acquired, the electronic device may combine the texts having the corresponding relationship with the controls to acquire a control matching text.

For example, taking the above "set" icon control 210 as an example, the electronic device may combine the text "skins Solo" 201 and the "set" text having a correspondence relationship with the "set" icon control 210, and may obtain the matching text of the "set" icon control 210 as "skins Solo set".

It is understood that, in the embodiment of the present application, the manner of combining the plurality of texts may be any combination of partial pinching sequences, for example, the text obtained by combining the "Beats Solo" text 201 and the "setting" text may be set for "Beats Solo", and may be set for "Beats Solo".

For another example, taking the "skins Solo" control 202 as an example, as can be seen from the arrows in fig. 9, the text corresponding to the "skins Solo" control 202 is the text 201 of the "skins Solo" and the text converted from the "cell phone" icon 213; the text converted by the "mobile phone" icon 213 is "mobile phone". The electronic equipment combines the text 201 of the text Beats Solo with the text of the mobile phone with the corresponding relation, and can acquire the matching text of the control 202 of the Beats Solo as the setting of the mobile phone Beats Solo.

In the embodiment of the application, the trained fusion model can be installed in a voice assistant, and after all controls and positions thereof, texts and positions thereof, and texts and positions thereof after icon conversion of the current interface are obtained, all controls and positions thereof, texts and positions thereof, and texts and positions thereof after icon conversion of the current interface are input into the fusion model for matching. The corresponding relation between all texts on the current interface and the control can be obtained through the fusion model. And performing accurate text matching on all controls on the current interface according to the corresponding relation between the text and the controls.

In a display interface of the electronic device, corresponding texts of some icon controls are fixed, the trained fusion model can directly output a matching text of the icon control, for example, a text corresponding to the "on-off" icon control 208 in the bluetooth interface is always the "on-off" text 211 and the text converted from the "on-off" icon 207, and the matching text of the "on-off" icon control 208 can be directly limited to be the "bluetooth on-off". The fusion model may directly output the matching text "bluetooth switch" of the "switch" icon control 208 according to the position of the "switch" icon control 208. The corresponding relation between the 'switch' icon control 208 and the text in the interface does not need to be output, the step that the electronic equipment needs to combine all the texts corresponding to the 'switch' icon control 208 again is reduced, and the execution efficiency of the voice command is improved.

In some embodiments, the corresponding text of some icon controls is not fixed, for example, the corresponding text of the "set" icon control 210 may change with a change in the name of the device or a change in the device, for example, when the name of the electronic device named "Beats Solo" changes, or the electronic device is replaced, the corresponding text of the "set" icon control 210 is also replaced. For example, the name of the electronic device named "Beats Solo" is changed to "BS", and the text of "Beats Solo" in the corresponding text of the "settings" icon control 210 is changed to "BS" text. The fusion model can directly train out the matching text of the icon control, and the fusion model can limit the corresponding text of the icon control through the region in the embodiment of the application.

For example, as shown in FIG. 10, it is defined in the fusion model that the icon controls located in region A301 correspond to the text located in region B302 and the converted text of the icons located in region A301.

Then, after the analysis results of the bluetooth interface, that is, all controls and positions thereof, texts and positions thereof, and texts and positions thereof after icon conversion of the bluetooth interface are input into the fusion model, it is obtained that the "setting" icon control 210 is located in the area a301, the "Beats Solo" text 201 is located in the area B302, and the "setting" icon 209 is located in the area a301, and then the fusion model outputs texts having a corresponding relationship with the "setting" icon control 210 as the "Beats Solo" text 201 and the text after icon 209 conversion.

It can be understood that when the user issues a click on the setting of the "tables Solo", the electronic device may match the text content of the voice instruction of the user 001 with the matching text of the "setting" icon control 209, so that the electronic device performs an operation of clicking on the "setting" icon control 210 according to the instruction of the user 001.

It will be appreciated that in some embodiments, some of the corresponding text of some icon controls may not be the text in the current interface. For example, the theme text of the interface above the current interface may be used.

For example, for the "back" icon control 206 in the bluetooth interface, when the "back" icon control 206 is clicked, the previous level interface setup interface of the bluetooth interface can be returned. Therefore, when the "return" icon 206 is labeled manually, besides the text converted from the "return" icon 205 labeled in the bluetooth interface, the text also corresponds to the subject text "set" of the previous interface setting interface.

Therefore, after the analysis results of the bluetooth interface, that is, all controls and positions thereof, texts and positions thereof, and texts and positions thereof after icon conversion of the bluetooth interface are input into the fusion model, the text output by the fusion model and having a corresponding relationship with the "return" icon control 206 includes the text after the "return" icon 205 conversion of the current interface, and the theme text "set" of the previous interface of the bluetooth interface. That is, by combining all corresponding texts of the "return" icon control 206 output by the fusion model, it can be obtained that the matching text of the "return" icon control 206 is the "return setting".

As can be seen from the above, one text on the interface may be correspondingly matched to one control, for example, as shown in fig. 9, the "bluetooth" text 211 corresponds to the "on/off" icon control 208; one text on the interface can also be correspondingly matched with a plurality of controls. For example, "Beats Solo" text 201 may correspond to the "Beats Solo" control 202, and may also correspond to the "settings" icon control 210. A piece of text on the interface may also not correspond to any control, e.g., the "paired device" text 212 does not have any control that may correspond.

In the interface analysis mode provided by the embodiment of the application, the fusion model having the corresponding relationship between the texts and the controls of all the interfaces of the electronic device 002 is obtained at first, after the voice instruction of the user 001 is received, all the texts, the icons and the controls of the current interface of the electronic device 002 are analyzed, and the analyzed result is matched with the fusion model, so that all the controls on the interface can be subjected to accurate text matching. Therefore, the electronic device 002 can accurately find the control matched with the voice instruction of the user 001, and the accuracy of voice execution is improved.

S705, analyzing the voice command of the user 001, and acquiring the operation and the text in the voice command.

In the embodiment of the application, the voice instruction of the user 001 may be analyzed by extracting the voice instruction of the user 001, and converting the voice instruction of the user 001 into an operation and a text. For example, the voice instruction of the user 001 may be converted into an operation and a text according to a Natural Language Processing (NLP) slot model. Specifically, when the user 001 issues a voice instruction of "clicking the setting of the tables Solo", the operation after the slot lifting is performed according to the NLP slot lifting model is "clicking", and the text is the setting of the tables Solo ".

It can be understood that, in the embodiment of the present application, the step may be located after S701, that is, after receiving the voice instruction of the user, the voice instruction of the user is parsed.

S706, the electronic equipment matches the text in the voice instruction with the matching text of each control in the current display interface, takes the control corresponding to the matching text matched with the text identification result in the voice instruction as the target control corresponding to the voice instruction, and performs corresponding operation on the target control according to the operation in the voice instruction.

In the embodiment of the application, the executable operation of the control can be determined according to the category of the control. For example, if the type of the control is a text control and an icon control, the operation may be a click operation, and if the type of the control is a selection control, the operation may be a click operation, a slide operation, or the like.

In the embodiment of the application, the consistency between the text of the voice instruction after the groove lifting and the matching text of all the controls of the current interface can be respectively judged, when the consistency between the text of the voice instruction after the groove lifting and the corresponding text of a certain control of the current interface is higher, the matching degree between the text of the voice instruction after the groove lifting and the corresponding text of the certain control of the current interface is higher, and the control with the highest matching degree is selected as the target control of the operation.

In some embodiments, a matching degree threshold may also be set, and when the matching degree between the text of the voice instruction after the lifting of the slot and the corresponding text of a control of the current interface is lower than the matching degree threshold, it represents that there is no target control that can be operated, and then no operation is performed.

The manner of respectively judging the consistency between the text of the voice instruction after the groove lifting and the corresponding texts of all the controls of the current interface may be to convert the text of the voice instruction after the groove lifting and the corresponding texts of all the controls of the current interface into sentence vectors for sorting, wherein the closer the sentence vector distance is, the higher the consistency between the text of the voice instruction after the groove lifting and the corresponding text of the control is represented. And selecting the control with the sentence vector closest to the control as the control of the operation, and performing the executable operation. In addition, if all the sentence vector distances are larger than the set sentence vector distance threshold value, no operable target control is represented, and no operation is executed.

In the embodiment of the application, the target control corresponding to the text after the voice instruction of the user 001 is lifted from the slot in the current interface can be accurately found by judging the consistency between the text of the voice instruction after the lifting from the slot and the corresponding texts of all the controls of the current interface, so that the voice instruction of the user 001 can be accurately executed.

And S707, controlling the display screen 102 to display the operated interface.

In this embodiment, after the processor of the mobile phone 002 performs corresponding operation on the corresponding control on the current interface according to the voice instruction of the user 001, the display screen is controlled to display the operated interface.

For example, as described above, when the user 001 issues the voice instruction of "click on the setting of the tables Solo", the mobile phone 002 can accurately find out that the control corresponding to the text "setting of the tables Solo" after the voice instruction of the user 001 is the "setting" icon control 210 whose matching text is the setting of the "tables Solo" in the current interface by judging the consistency between the text "setting of the tables Solo" of the voice instruction after the groove lifting and the corresponding texts of all the controls in the current interface, and the mobile phone 001 executes the operation of clicking the "setting" icon control 210, where the display interface of the mobile phone 002 after clicking the "setting" icon control 210 is shown in fig. 11, the user can rename the electronic device named as bluetooth "on the interface, and can open or close the function of accessing the internet with the electronic device named as bluetooth" tables Solo ".

Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.

Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.

The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared digital signals, etc.) using the internet in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

The embodiments of the present application also provide a computer program or a computer program product including a computer program, which when executed on a computer, will make the computer implement the above-mentioned voice instruction execution method. Implementable, the computer program product may include instructions for implementing the voice instruction execution method described above.

An embodiment of the present application further provides a chip apparatus, where the chip apparatus may include: a communication interface for inputting and/or outputting information; and the processor is used for executing the computer executable program so that the equipment provided with the chip device executes the voice instruction execution method. Wherein the chip apparatus may further comprise an interconnect unit coupled to the application processor; a system agent unit; an integrated memory controller unit; a set or one or more coprocessors which may include integrated graphics logic, an image processor, an audio processor, and a video processor; a Static Random Access Memory (SRAM) cell; a Direct Memory Access (DMA) unit. In one embodiment, the coprocessor includes a special-purpose processor, such as, for example, a network or communication processor, compression engine, GPGPU, a high-throughput MIC processor, embedded processor, or the like.

In the drawings, some features of structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.

It should be noted that, in each device embodiment of the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solving the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.

Claims

1. A voice instruction execution method for an electronic device, the method comprising:

the electronic equipment detects a voice instruction of a user; the voice instruction comprises an operation and a text;

the electronic equipment matches the text in the voice instruction with the matching text of each control in the current display interface, and takes the control corresponding to the matching text matched with the text in the voice instruction as the target control corresponding to the voice instruction;

2. The method for executing the voice instruction according to claim 1, wherein the parsed content of the current display interface comprises text information, icon information and control information of the current display interface;

the control information comprises the category of the control and the position information corresponding to the control.

3. The voice instruction execution method of claim 1,

and the corresponding relation between the control information in each display interface and the text information and the icon information is obtained by marking the corresponding relation between the control information of all the interfaces of the electronic equipment and the text information and the icon information respectively.

4. The method according to claim 2, wherein the electronic device obtains a matching text of each control in the current display interface according to the analysis content and a corresponding relationship between each control information in each display interface of the electronic device and the text information and the icon information, respectively; the method comprises the following steps:

the electronic equipment acquires a text and/or a converted text of an icon corresponding to each control in the current display interface according to the analysis content and through the corresponding relationship between each control information in each display interface of the electronic equipment and the text information and the icon information respectively;

5. The method for executing the voice instruction according to claim 4, wherein the text having the correspondence relationship with each control in the current display interface exists in the current display interface and/or a higher-level interface of the current display interface.

6. The method for executing the voice instruction according to claim 1, wherein the electronic device matches a text in the voice instruction with a matching text of each control in the current display interface, and takes a control corresponding to the matching text matching with the text in the voice instruction as a target control corresponding to the voice instruction; the method comprises the following steps:

and taking a control corresponding to the matched text matched with the text in the voice instruction as a target control corresponding to the voice instruction.

7. The method of claim 6, further comprising: and if the matching degree of the text corresponding to the voice instruction and the matching text of each control in the current display interface is lower than a first set value, the target control does not exist.

8. The method of claim 1, wherein the electronic device is a mobile phone.

9. An electronic device, comprising:

A processor, one of the one or more processors of the electronic device, to perform the execution of the voice instructions of any of claims 1-8.

10. A computer-readable storage medium having stored thereon instructions that, when executed, cause a computer to perform the method of any of claims 1-8.