CN112908323B

CN112908323B - Voice control method and device of application interface and intelligent equipment

Info

Publication number: CN112908323B
Application number: CN202110067144.9A
Authority: CN
Inventors: 申立明; 谢志栋; 孙宇; 方华
Original assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Current assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Priority date: 2021-01-19
Filing date: 2021-01-19
Publication date: 2024-03-08
Anticipated expiration: 2041-01-19
Also published as: CN112908323A

Abstract

The embodiment of the invention discloses a voice control method and device of an application interface and intelligent equipment. The method comprises the following steps: determining a current interface state; analyzing a target interface state from the voice instruction; a navigation path table is retrieved to determine a navigation path to switch from the current interface state to the target interface state, wherein the navigation path includes a sequence of navigation key events, the navigation path table generated based on an application interface state machine that includes interface states and navigation key events between interface states. The embodiment of the invention realizes voice control by using the application interface state machine, and reduces the complexity of voice control. The embodiment of the invention also utilizes the state matrix to describe the interface state, thereby being convenient for converting the control navigation problem into the path searching problem and shortening the operation time of the navigation path.

Description

Voice control method and device of application interface and intelligent equipment

Technical Field

The present invention relates to the field of speech control technologies, and in particular, to a speech control method and apparatus for an application interface, and an intelligent device.

Background

Voice control has become the dominant way of manipulating smart devices. The manner in which voice control is triggered generally includes:

(1) A Push To Talk (PTT) mode, which belongs To near field voice control, and after a record key is pressed, a voice instruction is spoken To a record device;

(2) A Wake Up Word (WUW) mode belongs to far-field voice control, and a user does not need to touch equipment, and firstly speaks a Wake Up Word to enable the equipment to enter an activated state, and then speaks a voice command.

After the smart device receives the recording containing the user's voice instructions, the audio stream is processed by automatic speech recognition (Automatic Speech Recognition, ASR) and natural language understanding (Natural Language Understanding, NLU), recognizes the user's operational intent, and distributes the operational intent to the application. The application registers a voice controlled application program interface (Application Programming Interface, API) to enable interception and response of operational events.

It can be seen that in prior art speech control applications, the application is required to implement an API that registers callback functions for speech control events.

However, given the typically large number of applications, if each application implements an API, the complexity of speech control will be significantly increased. In addition, applications often pursue cross-platform features, which also increase the difficulty of maintaining the application if the application implements APIs for each platform separately.

Disclosure of Invention

The invention provides a voice control method, a voice control device and intelligent equipment for an application interface, so that the complexity of voice control is reduced.

The technical scheme of the embodiment of the invention is as follows:

a voice control method of an application interface, comprising:

determining a current interface state;

analyzing a target interface state from the voice instruction;

a navigation path table is retrieved to determine a navigation path to switch from the current interface state to the target interface state, wherein the navigation path includes a sequence of navigation key events, the navigation path table generated based on an application interface state machine that includes interface states and navigation key events between interface states.

In one embodiment, the determining the current interface state includes:

identifying each control in the current interface state;

establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state;

establishing a state matrix of the current interface state based on the vector of each control;

and determining a preset interface state matched with the state matrix as the current interface state.

In one embodiment, before the retrieving the navigation path table, the method further includes:

Acquiring the application interface state machine;

determining, for each interface state in the application interface state machine, each navigation path that switches from the each interface state to each of the remaining interface states other than the each interface state using a full source shortest path algorithm;

and establishing a navigation path table containing the interface states, the rest interface states except the interface states and each navigation path for switching from the interface states to the rest interface states except the interface states.

In one embodiment, the method further comprises:

traversing all interface states by utilizing a preset navigation key event to generate the application interface state machine.

In one embodiment, traversing all interface states using a predetermined navigation key event to generate the application interface state machine comprises:

starting from an initial interface state of an initial application interface, respectively entering each interface state of each application interface by utilizing a preset navigation key event;

and patterning each interface state of each application interface as a node of the application interface state machine, and patterning the preset navigation key event as an edge between the nodes.

In one embodiment, the method further comprises:

identifying each control in the entered interface state;

establishing a state matrix of the entered interface state based on the vector of each control;

and storing the state matrix of the entered interface state.

In one embodiment, the method further comprises:

setting an identifier on each control of each application interface;

the parsing the target interface state from the voice command includes: analyzing an identifier corresponding to the target interface state from the voice instruction; the determining a navigation path to switch from a current interface state to the target interface state includes: and determining the target interface state based on the identification corresponding to the target interface state.

In one embodiment, the application interface is provided by a device that does not support voice control functions; the method further comprises the steps of:

sending the navigation path to the device which does not support the voice control function;

enabling the device which does not support the voice control function to execute the navigation key event sequence.

A voice control apparatus for an application interface, comprising:

the determining module is used for determining the current interface state;

the analysis module is used for analyzing the target interface state from the voice instruction;

a search module for searching a navigation path table to determine a navigation path to switch from the current interface state to the target interface state, wherein the navigation path comprises a sequence of navigation key events, the navigation path table being generated based on an application interface state machine that includes interface states and navigation key events between interface states.

In one embodiment, the determining module is configured to identify each control in the current interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the current interface state based on the vector of each control; and determining a preset interface state matched with the state matrix as the current interface state.

In one embodiment, the retrieving module is configured to obtain the application interface state machine before the retrieving the navigation path table; determining, for each interface state in the application interface state machine, each navigation path that switches from the each interface state to each of the remaining interface states other than the each interface state using a full source shortest path algorithm; and establishing a navigation path table containing the interface states, the rest interface states except the interface states and each navigation path for switching from the interface states to the rest interface states except the interface states.

In one embodiment, the search module is configured to traverse all interface states using a predetermined navigation key event to generate the application interface state machine.

In one embodiment, the search module is configured to start from a starting interface state of a starting application interface, and enter each interface state of each application interface by using a predetermined navigation key event; and patterning each interface state of each application interface as a node of the application interface state machine, and patterning the preset navigation key event as an edge between the nodes.

In one embodiment, the retrieval module is configured to identify each control in the entered interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the entered interface state based on the vector of each control; and storing the state matrix of the entered interface state.

In one embodiment, the method further comprises:

the setting module is used for setting an identifier on each control of each application interface;

the analysis module is used for analyzing the identification corresponding to the target interface state from the voice instruction;

The retrieval module is used for determining the target interface state based on the identification corresponding to the target interface state.

In one embodiment, the application interface is provided by a device that does not support voice control functions; the apparatus further comprises:

a sending module, configured to send the navigation path to the device that does not support a voice control function; enabling the device which does not support the voice control function to execute the navigation key event sequence.

An intelligent device comprising a processor and a memory;

the memory stores therein an application executable by the processor for causing the processor to execute the voice control method of the application interface as described in any one of the above.

A computer readable storage medium having stored therein computer readable instructions for performing the voice control method of the application interface as claimed in any one of the above.

From the above technical solution, in the embodiment of the present invention, the current interface state is determined; analyzing a target interface state from the voice instruction; a navigation path table is retrieved to determine a navigation path to switch from a current interface state to a target interface state, wherein the navigation path includes a sequence of navigation key events, the navigation path table being generated based on an application interface state machine that includes interface states and navigation key events between the interface states. Therefore, the embodiment of the invention provides the concept of the application interface state machine, and the application interface state machine is utilized to realize voice control on the graphical interface which does not realize the voice control API, so that the complexity of voice control is reduced.

In addition, the embodiment of the invention can quickly determine the navigation path switched from the current interface state to the target interface state by establishing the navigation path table, and further improves the determination efficiency of the navigation path.

In addition, the embodiment of the invention realizes the definition and description of the interface state by establishing the state matrix of the interface state, is convenient for converting the control navigation problem into the path searching problem of the graph theory, and further shortens the operation time of the navigation path.

Drawings

Fig. 1 is a flowchart of a voice control method of an application interface according to an embodiment of the present invention.

Fig. 2 is an exemplary schematic diagram of an application interface according to an embodiment of the present invention.

Fig. 3 is an exemplary schematic diagram of a state matrix according to an embodiment of the present invention.

FIG. 4 is an exemplary schematic diagram of an application interface state machine according to an embodiment of the present invention.

Fig. 5 is an exemplary flow chart of a voice control application according to an embodiment of the present invention.

FIG. 6 is an exemplary diagram of generating an application interface state machine according to an embodiment of the present invention.

Fig. 7 is an exemplary architecture diagram of a computer whose application interface may be voice controlled according to an embodiment of the present invention.

Fig. 8 is a first exemplary processing diagram of a speech control application according to an embodiment of the present invention.

Fig. 9 is a second exemplary processing diagram of a speech control application according to an embodiment of the present invention.

Fig. 10 is a third exemplary processing diagram of a speech control application according to an embodiment of the present invention.

Fig. 11 is a block diagram of a voice control apparatus of an application interface according to an embodiment of the present invention.

Fig. 12 is a block diagram of a smart device having a memory-processor architecture according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.

For simplicity and clarity of description, the following description sets forth aspects of the invention by describing several exemplary embodiments. Numerous details in the embodiments are provided solely to aid in the understanding of the invention. It will be apparent, however, that the embodiments of the invention may be practiced without limitation to these specific details. Some embodiments are not described in detail in order to avoid unnecessarily obscuring aspects of the present invention, but rather only to present a framework. Hereinafter, "comprising" means "including but not limited to", "according to … …" means "according to at least … …, but not limited to only … …". The term "a" or "an" is used herein to refer to a number of components, either one or more, or at least one, unless otherwise specified.

The applicant found that: in the prior art voice control technology, an application program is required to realize an API provided by a platform where the application program is located so as to actively register a voice control instruction event of the platform where the application program is located, so that the response to voice control can be completed. Applicants have also found that: the number of third party applications (applications not provided by the system vendor) of the system provider platform is enormous, requiring all third party applications to fully implement the API has difficulty. Moreover, from the perspective of third party applications, the application itself pursues cross-platform features, and modifications to a particular platform may increase the difficulty of its maintenance.

In the embodiment of the invention, a technical scheme of a voice control application interface is provided, and voice control can be performed on an application program when the application program is not registered with a voice control event of a system platform, so that a large number of third party application programs and external equipment applications which can not support the voice function of the system platform can still realize voice control without modification.

For an application program which does not register a platform voice control event through an API, in order to enable a user to control an interface of the application program through voice, the embodiment of the invention mainly comprises the following steps:

(1) And identifying a graphical interface of the application program, and judging the control currently acquiring the focus and the control which the user wants to select.

(2) And determining a focus movement navigation path, and controlling an application graphical interface to move the focus to a target control of the user through a key event so as to realize the help of the user to select the target control. Wherein: to determine the focus movement navigation path, embodiments of the present invention define three concepts: (a) a state matrix of the graphical interface; (b) a finite state machine for the graphical interface; (c) navigating the path.

As shown in fig. 1, the method includes:

step 101: the current interface state is determined.

An application typically includes one or more application interfaces, each of which typically includes one or more interface states. The application interface typically contains one or more controls, and different interface states can be distinguished based on the focus state of each control. The current interface state is herein the current interface state of the current application interface of the application program.

For example, assume that application interface 1 contains control 1, control 2, control 3, and control 4. The interface states of the application interface 1 include: the interface state 1 when the control 1 of the application interface 1 is the focus; interface state 2 when control 2 of application interface 1 is the focus; the interface state 3 when the control 3 of the application interface 1 is the focus; the control 4 of the application interface 1 is the interface state 4 when in focus. In the design of a graphical interface, the control is highlighted when it acquires focus, as opposed to other controls of the same level. And when the system receives a confirmation or enter button, the control will be selected.

A state matrix for each interface state may be pre-established. The state matrix contains a vector of each control in the corresponding interface state, wherein the vector contains the position of each control, the size of each control and the focus state of each control. When the graphical interface of the application program waits for responding to the user operation, the position, the size and the state information of whether the focus exists or not of all the controls in the interface form a graphical interface state. The position of the control comprises a horizontal coordinate and a vertical coordinate of the control on a plane; the size of the control comprises the size of the area occupied by the control and can be expressed by length and width.

Assume, for the example above, that application interface 1 contains control 1, control 2, control 3, and control 4. The application interface 1 has 4 interface states, respectively: the interface state 1 when the control 1 of the application interface 1 is the focus; interface state 2 when control 2 of application interface 1 is the focus; the interface state 3 when the control 3 of the application interface 1 is the focus; the control 4 of the application interface 1 is the interface state 4 when in focus. At this time, for each interface state of interface state 1, interface state 2, interface state 3 and interface state 4, a respective state matrix may be pre-established, so as to facilitate a subsequent matching operation based on the state matrix to determine the current interface state.

In fig. 2, the application interface contains 6 controls, namely control 1, control 2, control 3, control 4, control 5 and control 6.

Taking control 1 as an example, a vector containing position, size, and focus state (whether there is focus) can be represented by five numbers: [ x ] ₁ ,y ₁ ,w ₁ ,h ₁ ,f ₁ ]The method comprises the steps of carrying out a first treatment on the surface of the Wherein x is ₁ ,y ₁ The horizontal coordinate and the vertical coordinate of the upper left corner of the control 1 (for example, the upper left corner of the interface is taken as the origin of coordinates); w (w) ₁ Is the length of control 1; h is a ₁ The height of the control 1; f (f) ₁ Is the focus state of control 1. When control 1 does not get focus, f ₁ =0; when control 1 gets focus, f ₁ ＝1。

Similarly, the vector for control 2 is: [ x ] ₂ ,y ₂ ,w ₂ ,h ₂ ,f ₂ ]The method comprises the steps of carrying out a first treatment on the surface of the Wherein x is ₂ ,y ₂ Respectively horizontal coordinates and vertical coordinates of the upper left corner of the control 2 (with the upper left corner of the interface as the origin of coordinates); w (w) ₂ Is the length of control 2; h is a ₂ Is the height of control 2; f (f) ₂ Is a control2. When control 2 does not get focus, f ₂ =0; when control 2 gets focus, f ₂ ＝1。

The vector of control 3 is: [ x ] ₃ ,y ₃ ,w ₃ ,h ₃ ,f ₃ ]The method comprises the steps of carrying out a first treatment on the surface of the Wherein x is ₃ ,y ₃ Respectively horizontal coordinates and vertical coordinates of the upper left corner of the control 3 (with the upper left corner of the interface as the origin of coordinates); w (w) ₃ Is the length of the control 3; h is a ₃ Is the height of control 3; f (f) ₃ Is the focus state of control 3. When control 3 does not get focus, f ₃ =0; when control 3 gets focus, f ₃ ＝1。

The vector of control 4 is: [ x ] ₄ ,y ₄ ,w ₄ ,h ₄ ,f ₄ ]The method comprises the steps of carrying out a first treatment on the surface of the Wherein x is ₄ ,y ₄ Respectively horizontal coordinates and vertical coordinates of the upper left corner of the control 4 (with the upper left corner of the interface as the origin of coordinates); w (w) ₄ Is the length of the control 4; h is a ₄ Is the height of the control 4; f (f) ₄ Is the focus state of control 4. When control 4 does not get focus, f ₄ =0; when control 4 gets focus, f ₄ ＝1。

The vector of control 5 is: [ x ] ₅ ,y ₅ ,w ₅ ,h ₅ ,f ₅ ]The method comprises the steps of carrying out a first treatment on the surface of the Wherein x is ₅ ,y ₅ Respectively horizontal coordinates and vertical coordinates of the upper left corner of the control 5 (with the upper left corner of the interface as the origin of coordinates); w (w) ₅ Is the length of the control 5; h is a ₅ Is the height of the control 5; f (f) ₅ Is the focus state of control 5. When control 5 does not get focus, f ₅ =0; when control 5 gets focus, f ₅ ＝1。

The vector of control 6 is: [ x ] ₆ ,y ₆ ,w ₆ ,h ₆ ,f ₆ ]Wherein x is ₆ ,y ₆ Respectively horizontal coordinates and vertical coordinates of the upper left corner of the control 6 (with the upper left corner of the interface as the origin of coordinates); w (w) ₆ Is the length of the control 6; h is a ₆ Is the height of the control 6; f (f) ₆ Is the focus state of control 6. When control 6 does not get focus, f ₆ =0; when control 6 gets focus, f ₆ ＝1。

And combining the vectors of the control 1, the control 2, the control 3, the control 4, the control 5 and the control 6 to obtain a state matrix of the graphical application interface 1. Fig. 3 is an exemplary schematic diagram of a state matrix according to an embodiment of the present invention.

It can be seen that in the state matrix shown in fig. 3, vectors for control 1, control 2, control 3, control 4, control 5 and control 6 are included. Moreover, the state matrix may also describe various states of the application interface shown in FIG. 2.

In one embodiment, determining the current interface state in step 101 includes: identifying each control in the current interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the current interface state based on the vector of each control; a predetermined interface state that matches the state matrix is determined to be the current interface state. Thus, the current interface state can be conveniently determined through a matching process for the current state matrix.

Step 102: and analyzing the target interface state from the voice command.

Here, the target interface state is a target interface state of a target interface to which migration is desired. Also, sources of voice instructions include:

(1) A PTT mode; after the record key is pressed, a voice instruction is spoken to the record equipment;

(2) WUW mode; the user does not need to touch the equipment, firstly speaks the wake-up word to enable the equipment to enter an activated state, and then speaks the voice command.

Preferably, after the voice command is acquired, the target interface state is further identified through ASR and NLU processing.

For example, assume a user voice input: "control 1 of selected interface 2". After ASR and NLU processing, the target interface state can be determined as follows: interface state when control 1 of interface 2 is in focus.

The foregoing exemplary description describes the source of voice instructions, and those skilled in the art will recognize that this description is merely exemplary and is not intended to limit the scope of embodiments of the present invention.

Step 103: a navigation path table is retrieved to determine a navigation path to switch from a current interface state to a target interface state, wherein the navigation path includes a sequence of navigation key events, the navigation path table being generated based on an application interface state machine that includes interface states and navigation key events between the interface states.

The embodiment of the invention provides a state machine of a graphical interface. When the system receives a navigation key event, the control for obtaining focus is changed. This causes a change in the state of the interface, which causes the interface to migrate from one state to another, either within the interface or across the interface. The interface states of the application are limited and all interface states can be traversed by the navigation keys. All interface states are used as nodes and navigation key events among the states are used as edges, so that a directed graph can be formed, namely the application interface state machine.

In one embodiment, all interface states are traversed using predetermined navigation key events to generate an application interface state machine.

Specifically, the method comprises the following steps: starting from an initial interface state of an initial application interface, respectively entering each interface state of each application interface by utilizing a preset navigation key event; each interface state of each application interface is patterned into nodes of the application interface state machine, and predetermined navigation key events are patterned into edges between the nodes. In order to identify whether different interface states have been traversed, a process is also included for saving a state matrix of the entered interface states.

Specifically, the process of saving the state matrix of the entered interface state includes: identifying each control in the entered interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the entered interface state based on the vector of each control; the state matrix of the entered interface state is saved. Therefore, by comparing the state matrix of the entering interface state with the stored state matrix, it can be determined whether the entering interface state is a new interface state, if so, the state is stored, otherwise, other interface states are entered until all interface states are traversed.

Based on the application interface described in fig. 2, fig. 4 is an exemplary schematic diagram of an application interface state machine according to an embodiment of the present invention.

For example, in interface state 1 (i.e., control 1 is in focus): through the rightward navigation key, the interface state 2 can be migrated (namely, the control 2 is the focus); by navigating down the buttons, it is possible to migrate to interface state 4 (i.e. control 4 is in focus).

For simplicity of description, the example shown in fig. 4 is state transition within the application interface 1. In practice, an application typically contains multiple application interfaces, so the state machine at the application interface level contains both state transitions across the interfaces and state transitions within the interfaces.

The navigation path is explained again below. The navigational path between two interface states is a sequence of navigational key events that enables the interface state to migrate to another interface state, and this sequence is the shortest one of all possible sequences. By sending a sequence of keys in the navigation path to the system, the interface state will migrate from the start state to the end state. The finite state machine of the graphical interface is a directed graph, and a navigation path between any two states can be acquired on the directed graph by utilizing a full-source shortest path algorithm.

In one embodiment, prior to retrieving the navigation path table, a process of building the navigation path table is also included at step 103.

Specifically, before retrieving the navigation path table, it further includes: acquiring an application interface state machine; determining, for each interface state in the application interface state machine, each navigation path that switches from the each interface state to each of the remaining interface states other than the each interface state using a full source shortest path algorithm; a navigation path table is established that contains each interface state, each remaining interface state other than the each interface state, and each navigation path that switches from each interface state to each remaining interface state other than the each interface state.

Table 1 is an exemplary navigation path table.

State of origin	Endpoint status	Navigation path
			Interface state 1	Interface state 2	[ Right side ]]
Interface state 1	Interface state 3	[ Right, right ]]
			Interface state 1	Interface state 4	[ Right, right ]]
Interface state 1	Interface state 5	[ below ]]
			Interface state 1	Interface state 6	[ lower, right ]]
Interface state 2	Interface state 1	[ left side ]]
			…	…	…

TABLE 1

Table 1 is a navigation path table of the application interface shown in fig. 2.

As can be seen from table 1, when the starting point state is interface state 1 and the ending point state is interface state 2, the navigation key event sequence included in the navigation path is [ right ], i.e. the key moves to the right to the next object; when the starting point state is interface state 1 and the end point state is interface state 3, the navigation key event sequence contained in the navigation path is [ right, right ], namely, the key moves rightwards to the next object, and then continues to move rightwards to the next object; … when the starting point state is interface state 1 and the end point state is interface state 6, the navigation key event sequence included in the navigation path is [ down, right ], namely the key moves downwards to the next object, and then moves to the next object continuously to the right; etc.

In one embodiment, the method further comprises: setting an identifier on each control of each application interface; the parsing the target interface state from the voice command in step 102 includes: analyzing an identifier corresponding to the target interface state from the voice instruction; determining a navigation path to switch from a current interface state to a target interface state in step 103 includes: the target interface state is determined based on the identification corresponding to the target interface state.

In one embodiment, the application interface is provided by a device that does not support voice control functions; the method further comprises the steps of: sending the navigation path to the device which does not support the voice control function; enabling the device which does not support the voice control function to execute the navigation key event sequence.

As shown in fig. 5, the process includes:

step one: a graphical interface control is identified.

In the field of deep learning, numerous algorithms can implement control recognition of a graphical interface of an application program. For example, a single-stage multi-frame predictor (Single Shot Multi Box Detector, SSD), a look-and-feel (You Only Look Once, YOLO), or a regional convolutional neural network (Regions with CNN features, R-CNN) algorithm may be selected to implement control recognition for a graphical interface.

Exemplary, the data annotation of the graphical interface control needs to include: (1) control horizontal coordinates; (2) control vertical coordinates; (3) control width; (4) control height; (5) whether the control acquires a focus; etc.

Step two: a state matrix of the interface is generated.

When the image of the graphical interface is input, the position, the size and the focus state information of all the controls in the interface can be obtained, and the information is assembled to form a state matrix capable of forming the current interface.

Step three: and establishing a state machine of the interface.

Here, all interfaces of the application are traversed using navigation key events to build a state machine (typically a finite state machine) of the graphical interface. The state machine includes nodes and edges, wherein interface states are represented as nodes and navigation key events are represented as edges. The state machine of the graphical interface can be obtained by recording all interface states and migration events between the states.

FIG. 6 is an exemplary diagram of generating an application interface state machine according to an embodiment of the present invention. In fig. 6, comparing the state matrix of the entering interface state with the stored state matrix can determine whether the entering interface state is a new interface state, if so, storing, otherwise, entering other interface states until all interface states are traversed.

Step four: a navigation path table is generated.

After the state machine of the interface is established, a shortest path between any two interface states can be obtained by using a total-source shortest path algorithm, so that a navigation path table containing each interface state, each other interface state except the interface state and each navigation path for switching from each interface state to each other interface state except the interface state is formed.

Step five: the current interface state is determined.

And identifying the control of the current graphical interface as a pending state matrix, and comparing the pending state matrix with all interface state matrices of the application program, wherein the state with the same state matrix value is the state of the current interface.

Step six: and determining the target interface state.

Here, the target interface state refers to an interface state that the user intends to reach. According to the embodiment of the invention, the intention of the user is acquired through a voice technology, so that voice control is realized. And (3) carrying out automatic voice recognition and natural language understanding processing on voice audios of the users to obtain text description of the controls which are intended to be selected by the users, and matching the text with the text in all the controls in the current interface, wherein the successfully matched controls are the controls which are intended to be selected by the users. The state matrix of the focus of the target control can be found by searching in all interface state matrices, so that the target state is determined.

Step seven: a navigation path is determined.

And (3) inquiring the navigation path table between any two interface states generated in the step four according to the current interface state determined in the step five and the target interface state determined in the step six, so as to obtain the navigation path from the current interface state to the target interface state.

Step eight: and sending a navigation key sequence.

And D, sending a navigation key sequence to the application program according to the navigation path obtained in the step seven, and executing the navigation key sequence by the application program, so that the user is helped to select the target control.

In fig. 7, the computer includes a processor, a display terminal, and a memory. The memory may process an instruction set, the functions of which may include: (1) identifying a graphical interface control; (2) a state machine for constructing a graphical interface.

A typical application scenario of the embodiment of the present invention is described below.

In the application scenario shown in fig. 8, a third party application is voice controlled (the application does not register for system platform voice control events).

First, based on control identification and state matrix matching, determining that a control currently obtaining focus by a third party application program is: "film one", i.e., state 1 where the current interface state is determined to be the "film one" control selected. The user then speaks a voice instruction: "select film five". By performing operations such as natural language processing on voice instructions, the target text can be determined as: "film five", thus determining the target state as: the "movie five" control is selected state 5. Next, the navigation path table is searched with state 1 as the start state and state 5 as the target state to determine the navigation path to transition from state 1 to state 5 as: the key sequence [ down, right ] is sent to the third party application program, and is executed by the third party application program, namely, the key sequence is firstly moved downwards to the 'film four' control and then moved to the right to the 'film five' control.

In the application scenario illustrated in FIG. 9, selectable controls are identified at the application program interface. All controls in the graphical interface can be identified in terms of position and size, and the boundary range of the controls can be determined. Thus, the identification can be drawn in association with the control near the boundary of the control. The identification may be a number or other type. The user's selection is facilitated when the control has the identification. For example, when a control is identified by a number, the user may be assisted in selecting the control by directly speaking the number.

Displaying a digital identifier near the boundary of each control, wherein the digital identifier of the film one is 1; the number of the film two is identified as 2; and so on, the number of "film six" is identified as 6.

First, based on control identification and state matrix matching, determining that a control currently obtaining focus by a third party application program is: "film one", i.e., state 1 where the current interface state is determined to be the "film one" control selected. The user then speaks a voice instruction: "select third". By performing operations such as natural language processing on voice instructions, the target text can be determined as: "3", thus determining the target state as: the "movie 3" control is selected for state 3. Next, the navigation path table is searched with state 1 as the start state and state 3 as the target state to determine the navigation path to transition from state 1 to state 3 as: the key sequence [ right, right ] is sent to the third party application program, and is executed by the third party application program, namely, the key sequence is firstly moved to the right to the control of the film two, and then moved to the right to the control of the film three.

Fig. 10 is a third exemplary processing diagram of a speech control application according to an embodiment of the present invention. In the application scenario shown in fig. 10, the input source graphical interface is controlled. When the graphical interface comes from a device external to the system (e.g., a set-top box), the interface is not dependent on the voice control events of the system.

In summary, the technology of the invention can carry out voice control on the graphical interface of the voice control event of the unregistered system and the graphical interface of the external source.

Based on the above description, the embodiment of the invention also provides a voice control device of the application interface.

As shown in fig. 11, the voice control apparatus of the application interface includes:

a determining module 1101, configured to determine a current interface state;

the parsing module 1102 is configured to parse the target interface state from the voice command;

a retrieving module 1103 for retrieving a navigation path table to determine a navigation path for switching from the current interface state to the target interface state, wherein the navigation path comprises a sequence of navigation key events, the navigation path table being generated based on an application interface state machine comprising interface states and navigation key events between the interface states.

In one embodiment, determination module 1101 is configured to identify each control in the current interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the current interface state based on the vector of each control; and determining a preset interface state matched with the state matrix as the current interface state.

In one embodiment, the retrieving module 1103 is configured to obtain the application interface state machine before the retrieving the navigation path table; determining, for each interface state in the application interface state machine, each navigation path that switches from the each interface state to each of the remaining interface states other than the each interface state using a full source shortest path algorithm; and establishing a navigation path table containing the interface states, the rest interface states except the interface states and each navigation path for switching from the interface states to the rest interface states except the interface states.

In one embodiment, the retrieving module 1103 is configured to traverse all interface states using a predetermined navigation key event to generate the application interface state machine.

In one embodiment, the retrieving module 1103 is configured to start from a starting interface state of a starting application interface, and enter each interface state of each application interface respectively by using a predetermined navigation key event; and patterning each interface state of each application interface as a node of the application interface state machine, and patterning the preset navigation key event as an edge between the nodes.

In one embodiment, the retrieval module 1103 is configured to identify each control in the entered interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the entered interface state based on the vector of each control; and storing the state matrix of the entered interface state.

In one embodiment, the method further comprises: a setting module 1104 for setting an identifier on each control of each application interface; the parsing module 1102 is configured to parse an identifier corresponding to the target interface state from the voice command; the retrieving module 1103 is configured to determine the target interface state based on the identifier corresponding to the target interface state.

In one embodiment, the application interface is provided by a device that does not support voice control functions; the apparatus further comprises: a sending module 1105, configured to send the navigation path to the device that does not support the voice control function; enabling the device which does not support the voice control function to execute the navigation key event sequence.

The embodiment of the invention also provides an intelligent device with a memory-processor architecture.

Fig. 12 is a block diagram of a smart device having a memory-processor architecture in accordance with the present invention.

As shown in fig. 12, a smart device having a memory-processor architecture includes: a processor 1201 and a memory 1202; in which a memory 1202 has stored therein an application executable by a processor 1201 for causing the processor 1201 to execute the speech control method of the application interface as described in any of the above.

The memory 1202 may be implemented as a variety of storage media such as an electrically erasable programmable read-only memory (EEPROM), a Flash memory (Flash memory), a programmable read-only memory (PROM), and the like. The processor 1201 may be a Central Processing Unit (CPU). By way of example, the processor 1201 includes at least one CPU, semiconductor-based microprocessor, programmable Logic Device (PLD), or the like. Exemplary PLDs include Application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), programmable Array Logic (PALs), complex Programmable Logic Devices (CPLDs), and Erasable Programmable Logic Devices (EPLDs). The processor 1201 may include multiple processing elements integrated in a single device or distributed among the devices. The processor sources may process instructions sequentially, simultaneously, or partially simultaneously.

It should be noted that not all the steps and modules in the above processes and the structure diagrams are necessary, and some steps or modules may be omitted according to actual needs. The execution sequence of the steps is not fixed and can be adjusted as required. The division of the modules is merely for convenience of description and the division of functions adopted in the embodiments, and in actual implementation, one module may be implemented by a plurality of modules, and functions of a plurality of modules may be implemented by the same module, and the modules may be located in the same device or different devices.

The hardware modules in the various embodiments may be implemented mechanically or electronically. For example, a hardware module may include custom designed permanent circuits or logic devices for performing specific operations. A hardware module may also include programmable logic devices or circuits (e.g., including a general purpose processor or other programmable processor) temporarily configured by software for performing particular operations.

The present invention also provides a machine-readable storage medium storing instructions for causing a machine to perform a method as described herein. Specifically, a system or apparatus provided with a storage medium on which a software program code realizing the functions of any of the above embodiments is stored, and a computer (or CPU or MPU) of the system or apparatus may be caused to read out and execute the program code stored in the storage medium. Further, some or all of the actual operations may be performed by an operating system or the like operating on a computer based on instructions of the program code. The program code read out from the storage medium may also be written into a memory provided in an expansion board inserted into a computer or into a memory provided in an expansion unit connected to the computer, and then, based on instructions of the program code, a CPU or the like mounted on the expansion board or the expansion unit may be caused to perform part or all of actual operations, thereby realizing the functions of any of the above embodiments.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A voice control method for an application interface, comprising:

determining a current interface state;

analyzing a target interface state from the voice instruction;

retrieving a navigation path table to determine a navigation path to switch from the current interface state to the target interface state, wherein the navigation path comprises a sequence of navigation key events, the navigation path table generated based on an application interface state machine that includes interface states and navigation key events between interface states; further comprises: traversing all interface states by utilizing a preset navigation key event to generate the application interface state machine;

the traversing all interface states using a predetermined navigation key event to generate the application interface state machine comprises: starting from an initial interface state of an initial application interface, respectively entering each interface state of each application interface by utilizing a preset navigation key event; and patterning each interface state of each application interface as a node of the application interface state machine, and patterning the preset navigation key event as an edge between the nodes.

2. The method for voice control of an application interface according to claim 1, wherein said determining a current interface state comprises:

identifying each control in the current interface state;

3. The voice control method of an application interface according to claim 1, further comprising, prior to said retrieving a navigation path table:

acquiring the application interface state machine;

4. The voice control method of an application interface according to claim 1, further comprising:

identifying each control in the entered interface state;

and storing the state matrix of the entered interface state.

5. The voice control method of an application interface according to claim 1, further comprising:

setting an identifier on each control of each application interface;

the parsing the target interface state from the voice command includes: analyzing an identifier corresponding to the target interface state from the voice instruction;

the determining a navigation path to switch from a current interface state to the target interface state includes: and determining the target interface state based on the identification corresponding to the target interface state.

6. The voice control method of an application interface according to any one of claims 1 to 5, wherein the application interface is provided by a device that does not support a voice control function; the method further comprises the steps of:

7. A voice control apparatus for an application interface, comprising:

the determining module is used for determining the current interface state;

a retrieval module for retrieving a navigation path table to determine a navigation path to switch from the current interface state to the target interface state, wherein the navigation path comprises a sequence of navigation key events, the navigation path table being generated based on an application interface state machine that includes interface states and navigation key events between interface states;

the search module is used for traversing all interface states by utilizing a preset navigation key event to generate the application interface state machine;

8. The voice control device of an application interface of claim 7,

the determining module is used for identifying each control in the current interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the current interface state based on the vector of each control; and determining a preset interface state matched with the state matrix as the current interface state.

9. The voice control device of an application interface of claim 7,

the retrieval module is used for acquiring the application interface state machine before the navigation path table is retrieved; determining, for each interface state in the application interface state machine, each navigation path that switches from the each interface state to each of the remaining interface states other than the each interface state using a full source shortest path algorithm; and establishing a navigation path table containing the interface states, the rest interface states except the interface states and each navigation path for switching from the interface states to the rest interface states except the interface states.

10. The voice control device of an application interface of claim 7,

the retrieval module is used for identifying each control in the entered interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the entered interface state based on the vector of each control; and storing the state matrix of the entered interface state.

11. The voice control device of an application interface of claim 7, further comprising:

12. The voice control apparatus of an application interface according to any one of claims 7 to 11, wherein the application interface is provided by a device that does not support a voice control function; the apparatus further comprises:

13. An intelligent device is characterized by comprising a processor and a memory;

the memory has stored therein an application executable by the processor for causing the processor to perform the voice control method of the application interface of any one of claims 1 to 6.

14. A computer readable storage medium having stored therein computer readable instructions for performing the voice control method of the application interface of any one of claims 1 to 6.