CN112908323B - Voice control method and device of application interface and intelligent equipment - Google Patents

Voice control method and device of application interface and intelligent equipment Download PDF

Info

Publication number
CN112908323B
CN112908323B CN202110067144.9A CN202110067144A CN112908323B CN 112908323 B CN112908323 B CN 112908323B CN 202110067144 A CN202110067144 A CN 202110067144A CN 112908323 B CN112908323 B CN 112908323B
Authority
CN
China
Prior art keywords
interface
state
control
interface state
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110067144.9A
Other languages
Chinese (zh)
Other versions
CN112908323A (en
Inventor
申立明
谢志栋
孙宇
方华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics China R&D Center, Samsung Electronics Co Ltd filed Critical Samsung Electronics China R&D Center
Priority to CN202110067144.9A priority Critical patent/CN112908323B/en
Publication of CN112908323A publication Critical patent/CN112908323A/en
Application granted granted Critical
Publication of CN112908323B publication Critical patent/CN112908323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The embodiment of the invention discloses a voice control method and device of an application interface and intelligent equipment. The method comprises the following steps: determining a current interface state; analyzing a target interface state from the voice instruction; a navigation path table is retrieved to determine a navigation path to switch from the current interface state to the target interface state, wherein the navigation path includes a sequence of navigation key events, the navigation path table generated based on an application interface state machine that includes interface states and navigation key events between interface states. The embodiment of the invention realizes voice control by using the application interface state machine, and reduces the complexity of voice control. The embodiment of the invention also utilizes the state matrix to describe the interface state, thereby being convenient for converting the control navigation problem into the path searching problem and shortening the operation time of the navigation path.

Description

Voice control method and device of application interface and intelligent equipment
Technical Field
The present invention relates to the field of speech control technologies, and in particular, to a speech control method and apparatus for an application interface, and an intelligent device.
Background
Voice control has become the dominant way of manipulating smart devices. The manner in which voice control is triggered generally includes:
(1) A Push To Talk (PTT) mode, which belongs To near field voice control, and after a record key is pressed, a voice instruction is spoken To a record device;
(2) A Wake Up Word (WUW) mode belongs to far-field voice control, and a user does not need to touch equipment, and firstly speaks a Wake Up Word to enable the equipment to enter an activated state, and then speaks a voice command.
After the smart device receives the recording containing the user's voice instructions, the audio stream is processed by automatic speech recognition (Automatic Speech Recognition, ASR) and natural language understanding (Natural Language Understanding, NLU), recognizes the user's operational intent, and distributes the operational intent to the application. The application registers a voice controlled application program interface (Application Programming Interface, API) to enable interception and response of operational events.
It can be seen that in prior art speech control applications, the application is required to implement an API that registers callback functions for speech control events.
However, given the typically large number of applications, if each application implements an API, the complexity of speech control will be significantly increased. In addition, applications often pursue cross-platform features, which also increase the difficulty of maintaining the application if the application implements APIs for each platform separately.
Disclosure of Invention
The invention provides a voice control method, a voice control device and intelligent equipment for an application interface, so that the complexity of voice control is reduced.
The technical scheme of the embodiment of the invention is as follows:
a voice control method of an application interface, comprising:
determining a current interface state;
analyzing a target interface state from the voice instruction;
a navigation path table is retrieved to determine a navigation path to switch from the current interface state to the target interface state, wherein the navigation path includes a sequence of navigation key events, the navigation path table generated based on an application interface state machine that includes interface states and navigation key events between interface states.
In one embodiment, the determining the current interface state includes:
identifying each control in the current interface state;
establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state;
establishing a state matrix of the current interface state based on the vector of each control;
and determining a preset interface state matched with the state matrix as the current interface state.
In one embodiment, before the retrieving the navigation path table, the method further includes:
Acquiring the application interface state machine;
determining, for each interface state in the application interface state machine, each navigation path that switches from the each interface state to each of the remaining interface states other than the each interface state using a full source shortest path algorithm;
and establishing a navigation path table containing the interface states, the rest interface states except the interface states and each navigation path for switching from the interface states to the rest interface states except the interface states.
In one embodiment, the method further comprises:
traversing all interface states by utilizing a preset navigation key event to generate the application interface state machine.
In one embodiment, traversing all interface states using a predetermined navigation key event to generate the application interface state machine comprises:
starting from an initial interface state of an initial application interface, respectively entering each interface state of each application interface by utilizing a preset navigation key event;
and patterning each interface state of each application interface as a node of the application interface state machine, and patterning the preset navigation key event as an edge between the nodes.
In one embodiment, the method further comprises:
identifying each control in the entered interface state;
establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state;
establishing a state matrix of the entered interface state based on the vector of each control;
and storing the state matrix of the entered interface state.
In one embodiment, the method further comprises:
setting an identifier on each control of each application interface;
the parsing the target interface state from the voice command includes: analyzing an identifier corresponding to the target interface state from the voice instruction; the determining a navigation path to switch from a current interface state to the target interface state includes: and determining the target interface state based on the identification corresponding to the target interface state.
In one embodiment, the application interface is provided by a device that does not support voice control functions; the method further comprises the steps of:
sending the navigation path to the device which does not support the voice control function;
enabling the device which does not support the voice control function to execute the navigation key event sequence.
A voice control apparatus for an application interface, comprising:
the determining module is used for determining the current interface state;
the analysis module is used for analyzing the target interface state from the voice instruction;
a search module for searching a navigation path table to determine a navigation path to switch from the current interface state to the target interface state, wherein the navigation path comprises a sequence of navigation key events, the navigation path table being generated based on an application interface state machine that includes interface states and navigation key events between interface states.
In one embodiment, the determining module is configured to identify each control in the current interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the current interface state based on the vector of each control; and determining a preset interface state matched with the state matrix as the current interface state.
In one embodiment, the retrieving module is configured to obtain the application interface state machine before the retrieving the navigation path table; determining, for each interface state in the application interface state machine, each navigation path that switches from the each interface state to each of the remaining interface states other than the each interface state using a full source shortest path algorithm; and establishing a navigation path table containing the interface states, the rest interface states except the interface states and each navigation path for switching from the interface states to the rest interface states except the interface states.
In one embodiment, the search module is configured to traverse all interface states using a predetermined navigation key event to generate the application interface state machine.
In one embodiment, the search module is configured to start from a starting interface state of a starting application interface, and enter each interface state of each application interface by using a predetermined navigation key event; and patterning each interface state of each application interface as a node of the application interface state machine, and patterning the preset navigation key event as an edge between the nodes.
In one embodiment, the retrieval module is configured to identify each control in the entered interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the entered interface state based on the vector of each control; and storing the state matrix of the entered interface state.
In one embodiment, the method further comprises:
the setting module is used for setting an identifier on each control of each application interface;
the analysis module is used for analyzing the identification corresponding to the target interface state from the voice instruction;
The retrieval module is used for determining the target interface state based on the identification corresponding to the target interface state.
In one embodiment, the application interface is provided by a device that does not support voice control functions; the apparatus further comprises:
a sending module, configured to send the navigation path to the device that does not support a voice control function; enabling the device which does not support the voice control function to execute the navigation key event sequence.
An intelligent device comprising a processor and a memory;
the memory stores therein an application executable by the processor for causing the processor to execute the voice control method of the application interface as described in any one of the above.
A computer readable storage medium having stored therein computer readable instructions for performing the voice control method of the application interface as claimed in any one of the above.
From the above technical solution, in the embodiment of the present invention, the current interface state is determined; analyzing a target interface state from the voice instruction; a navigation path table is retrieved to determine a navigation path to switch from a current interface state to a target interface state, wherein the navigation path includes a sequence of navigation key events, the navigation path table being generated based on an application interface state machine that includes interface states and navigation key events between the interface states. Therefore, the embodiment of the invention provides the concept of the application interface state machine, and the application interface state machine is utilized to realize voice control on the graphical interface which does not realize the voice control API, so that the complexity of voice control is reduced.
In addition, the embodiment of the invention can quickly determine the navigation path switched from the current interface state to the target interface state by establishing the navigation path table, and further improves the determination efficiency of the navigation path.
In addition, the embodiment of the invention realizes the definition and description of the interface state by establishing the state matrix of the interface state, is convenient for converting the control navigation problem into the path searching problem of the graph theory, and further shortens the operation time of the navigation path.
Drawings
Fig. 1 is a flowchart of a voice control method of an application interface according to an embodiment of the present invention.
Fig. 2 is an exemplary schematic diagram of an application interface according to an embodiment of the present invention.
Fig. 3 is an exemplary schematic diagram of a state matrix according to an embodiment of the present invention.
FIG. 4 is an exemplary schematic diagram of an application interface state machine according to an embodiment of the present invention.
Fig. 5 is an exemplary flow chart of a voice control application according to an embodiment of the present invention.
FIG. 6 is an exemplary diagram of generating an application interface state machine according to an embodiment of the present invention.
Fig. 7 is an exemplary architecture diagram of a computer whose application interface may be voice controlled according to an embodiment of the present invention.
Fig. 8 is a first exemplary processing diagram of a speech control application according to an embodiment of the present invention.
Fig. 9 is a second exemplary processing diagram of a speech control application according to an embodiment of the present invention.
Fig. 10 is a third exemplary processing diagram of a speech control application according to an embodiment of the present invention.
Fig. 11 is a block diagram of a voice control apparatus of an application interface according to an embodiment of the present invention.
Fig. 12 is a block diagram of a smart device having a memory-processor architecture according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.
For simplicity and clarity of description, the following description sets forth aspects of the invention by describing several exemplary embodiments. Numerous details in the embodiments are provided solely to aid in the understanding of the invention. It will be apparent, however, that the embodiments of the invention may be practiced without limitation to these specific details. Some embodiments are not described in detail in order to avoid unnecessarily obscuring aspects of the present invention, but rather only to present a framework. Hereinafter, "comprising" means "including but not limited to", "according to … …" means "according to at least … …, but not limited to only … …". The term "a" or "an" is used herein to refer to a number of components, either one or more, or at least one, unless otherwise specified.
The applicant found that: in the prior art voice control technology, an application program is required to realize an API provided by a platform where the application program is located so as to actively register a voice control instruction event of the platform where the application program is located, so that the response to voice control can be completed. Applicants have also found that: the number of third party applications (applications not provided by the system vendor) of the system provider platform is enormous, requiring all third party applications to fully implement the API has difficulty. Moreover, from the perspective of third party applications, the application itself pursues cross-platform features, and modifications to a particular platform may increase the difficulty of its maintenance.
In the embodiment of the invention, a technical scheme of a voice control application interface is provided, and voice control can be performed on an application program when the application program is not registered with a voice control event of a system platform, so that a large number of third party application programs and external equipment applications which can not support the voice function of the system platform can still realize voice control without modification.
For an application program which does not register a platform voice control event through an API, in order to enable a user to control an interface of the application program through voice, the embodiment of the invention mainly comprises the following steps:
(1) And identifying a graphical interface of the application program, and judging the control currently acquiring the focus and the control which the user wants to select.
(2) And determining a focus movement navigation path, and controlling an application graphical interface to move the focus to a target control of the user through a key event so as to realize the help of the user to select the target control. Wherein: to determine the focus movement navigation path, embodiments of the present invention define three concepts: (a) a state matrix of the graphical interface; (b) a finite state machine for the graphical interface; (c) navigating the path.
Fig. 1 is a flowchart of a voice control method of an application interface according to an embodiment of the present invention.
As shown in fig. 1, the method includes:
step 101: the current interface state is determined.
An application typically includes one or more application interfaces, each of which typically includes one or more interface states. The application interface typically contains one or more controls, and different interface states can be distinguished based on the focus state of each control. The current interface state is herein the current interface state of the current application interface of the application program.
For example, assume that application interface 1 contains control 1, control 2, control 3, and control 4. The interface states of the application interface 1 include: the interface state 1 when the control 1 of the application interface 1 is the focus; interface state 2 when control 2 of application interface 1 is the focus; the interface state 3 when the control 3 of the application interface 1 is the focus; the control 4 of the application interface 1 is the interface state 4 when in focus. In the design of a graphical interface, the control is highlighted when it acquires focus, as opposed to other controls of the same level. And when the system receives a confirmation or enter button, the control will be selected.
A state matrix for each interface state may be pre-established. The state matrix contains a vector of each control in the corresponding interface state, wherein the vector contains the position of each control, the size of each control and the focus state of each control. When the graphical interface of the application program waits for responding to the user operation, the position, the size and the state information of whether the focus exists or not of all the controls in the interface form a graphical interface state. The position of the control comprises a horizontal coordinate and a vertical coordinate of the control on a plane; the size of the control comprises the size of the area occupied by the control and can be expressed by length and width.
Assume, for the example above, that application interface 1 contains control 1, control 2, control 3, and control 4. The application interface 1 has 4 interface states, respectively: the interface state 1 when the control 1 of the application interface 1 is the focus; interface state 2 when control 2 of application interface 1 is the focus; the interface state 3 when the control 3 of the application interface 1 is the focus; the control 4 of the application interface 1 is the interface state 4 when in focus. At this time, for each interface state of interface state 1, interface state 2, interface state 3 and interface state 4, a respective state matrix may be pre-established, so as to facilitate a subsequent matching operation based on the state matrix to determine the current interface state.
Fig. 2 is an exemplary schematic diagram of an application interface according to an embodiment of the present invention.
In fig. 2, the application interface contains 6 controls, namely control 1, control 2, control 3, control 4, control 5 and control 6.
Taking control 1 as an example, a vector containing position, size, and focus state (whether there is focus) can be represented by five numbers: [ x ] 1 ,y 1 ,w 1 ,h 1 ,f 1 ]The method comprises the steps of carrying out a first treatment on the surface of the Wherein x is 1 ,y 1 The horizontal coordinate and the vertical coordinate of the upper left corner of the control 1 (for example, the upper left corner of the interface is taken as the origin of coordinates); w (w) 1 Is the length of control 1; h is a 1 The height of the control 1; f (f) 1 Is the focus state of control 1. When control 1 does not get focus, f 1 =0; when control 1 gets focus, f 1 =1。
Similarly, the vector for control 2 is: [ x ] 2 ,y 2 ,w 2 ,h 2 ,f 2 ]The method comprises the steps of carrying out a first treatment on the surface of the Wherein x is 2 ,y 2 Respectively horizontal coordinates and vertical coordinates of the upper left corner of the control 2 (with the upper left corner of the interface as the origin of coordinates); w (w) 2 Is the length of control 2; h is a 2 Is the height of control 2; f (f) 2 Is a control2. When control 2 does not get focus, f 2 =0; when control 2 gets focus, f 2 =1。
The vector of control 3 is: [ x ] 3 ,y 3 ,w 3 ,h 3 ,f 3 ]The method comprises the steps of carrying out a first treatment on the surface of the Wherein x is 3 ,y 3 Respectively horizontal coordinates and vertical coordinates of the upper left corner of the control 3 (with the upper left corner of the interface as the origin of coordinates); w (w) 3 Is the length of the control 3; h is a 3 Is the height of control 3; f (f) 3 Is the focus state of control 3. When control 3 does not get focus, f 3 =0; when control 3 gets focus, f 3 =1。
The vector of control 4 is: [ x ] 4 ,y 4 ,w 4 ,h 4 ,f 4 ]The method comprises the steps of carrying out a first treatment on the surface of the Wherein x is 4 ,y 4 Respectively horizontal coordinates and vertical coordinates of the upper left corner of the control 4 (with the upper left corner of the interface as the origin of coordinates); w (w) 4 Is the length of the control 4; h is a 4 Is the height of the control 4; f (f) 4 Is the focus state of control 4. When control 4 does not get focus, f 4 =0; when control 4 gets focus, f 4 =1。
The vector of control 5 is: [ x ] 5 ,y 5 ,w 5 ,h 5 ,f 5 ]The method comprises the steps of carrying out a first treatment on the surface of the Wherein x is 5 ,y 5 Respectively horizontal coordinates and vertical coordinates of the upper left corner of the control 5 (with the upper left corner of the interface as the origin of coordinates); w (w) 5 Is the length of the control 5; h is a 5 Is the height of the control 5; f (f) 5 Is the focus state of control 5. When control 5 does not get focus, f 5 =0; when control 5 gets focus, f 5 =1。
The vector of control 6 is: [ x ] 6 ,y 6 ,w 6 ,h 6 ,f 6 ]Wherein x is 6 ,y 6 Respectively horizontal coordinates and vertical coordinates of the upper left corner of the control 6 (with the upper left corner of the interface as the origin of coordinates); w (w) 6 Is the length of the control 6; h is a 6 Is the height of the control 6; f (f) 6 Is the focus state of control 6. When control 6 does not get focus, f 6 =0; when control 6 gets focus, f 6 =1。
And combining the vectors of the control 1, the control 2, the control 3, the control 4, the control 5 and the control 6 to obtain a state matrix of the graphical application interface 1. Fig. 3 is an exemplary schematic diagram of a state matrix according to an embodiment of the present invention.
It can be seen that in the state matrix shown in fig. 3, vectors for control 1, control 2, control 3, control 4, control 5 and control 6 are included. Moreover, the state matrix may also describe various states of the application interface shown in FIG. 2.
In one embodiment, determining the current interface state in step 101 includes: identifying each control in the current interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the current interface state based on the vector of each control; a predetermined interface state that matches the state matrix is determined to be the current interface state. Thus, the current interface state can be conveniently determined through a matching process for the current state matrix.
Step 102: and analyzing the target interface state from the voice command.
Here, the target interface state is a target interface state of a target interface to which migration is desired. Also, sources of voice instructions include:
(1) A PTT mode; after the record key is pressed, a voice instruction is spoken to the record equipment;
(2) WUW mode; the user does not need to touch the equipment, firstly speaks the wake-up word to enable the equipment to enter an activated state, and then speaks the voice command.
Preferably, after the voice command is acquired, the target interface state is further identified through ASR and NLU processing.
For example, assume a user voice input: "control 1 of selected interface 2". After ASR and NLU processing, the target interface state can be determined as follows: interface state when control 1 of interface 2 is in focus.
The foregoing exemplary description describes the source of voice instructions, and those skilled in the art will recognize that this description is merely exemplary and is not intended to limit the scope of embodiments of the present invention.
Step 103: a navigation path table is retrieved to determine a navigation path to switch from a current interface state to a target interface state, wherein the navigation path includes a sequence of navigation key events, the navigation path table being generated based on an application interface state machine that includes interface states and navigation key events between the interface states.
The embodiment of the invention provides a state machine of a graphical interface. When the system receives a navigation key event, the control for obtaining focus is changed. This causes a change in the state of the interface, which causes the interface to migrate from one state to another, either within the interface or across the interface. The interface states of the application are limited and all interface states can be traversed by the navigation keys. All interface states are used as nodes and navigation key events among the states are used as edges, so that a directed graph can be formed, namely the application interface state machine.
In one embodiment, all interface states are traversed using predetermined navigation key events to generate an application interface state machine.
Specifically, the method comprises the following steps: starting from an initial interface state of an initial application interface, respectively entering each interface state of each application interface by utilizing a preset navigation key event; each interface state of each application interface is patterned into nodes of the application interface state machine, and predetermined navigation key events are patterned into edges between the nodes. In order to identify whether different interface states have been traversed, a process is also included for saving a state matrix of the entered interface states.
Specifically, the process of saving the state matrix of the entered interface state includes: identifying each control in the entered interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the entered interface state based on the vector of each control; the state matrix of the entered interface state is saved. Therefore, by comparing the state matrix of the entering interface state with the stored state matrix, it can be determined whether the entering interface state is a new interface state, if so, the state is stored, otherwise, other interface states are entered until all interface states are traversed.
Based on the application interface described in fig. 2, fig. 4 is an exemplary schematic diagram of an application interface state machine according to an embodiment of the present invention.
For example, in interface state 1 (i.e., control 1 is in focus): through the rightward navigation key, the interface state 2 can be migrated (namely, the control 2 is the focus); by navigating down the buttons, it is possible to migrate to interface state 4 (i.e. control 4 is in focus).
For simplicity of description, the example shown in fig. 4 is state transition within the application interface 1. In practice, an application typically contains multiple application interfaces, so the state machine at the application interface level contains both state transitions across the interfaces and state transitions within the interfaces.
The navigation path is explained again below. The navigational path between two interface states is a sequence of navigational key events that enables the interface state to migrate to another interface state, and this sequence is the shortest one of all possible sequences. By sending a sequence of keys in the navigation path to the system, the interface state will migrate from the start state to the end state. The finite state machine of the graphical interface is a directed graph, and a navigation path between any two states can be acquired on the directed graph by utilizing a full-source shortest path algorithm.
In one embodiment, prior to retrieving the navigation path table, a process of building the navigation path table is also included at step 103.
Specifically, before retrieving the navigation path table, it further includes: acquiring an application interface state machine; determining, for each interface state in the application interface state machine, each navigation path that switches from the each interface state to each of the remaining interface states other than the each interface state using a full source shortest path algorithm; a navigation path table is established that contains each interface state, each remaining interface state other than the each interface state, and each navigation path that switches from each interface state to each remaining interface state other than the each interface state.
Table 1 is an exemplary navigation path table.
State of origin Endpoint status Navigation path
Interface state 1 Interface state 2 [ Right side ]]
Interface state 1 Interface state 3 [ Right, right ]]
Interface state 1 Interface state 4 [ Right, right ]]
Interface state 1 Interface state 5 [ below ]]
Interface state 1 Interface state 6 [ lower, right ]]
Interface state 2 Interface state 1 [ left side ]]
TABLE 1
Table 1 is a navigation path table of the application interface shown in fig. 2.
As can be seen from table 1, when the starting point state is interface state 1 and the ending point state is interface state 2, the navigation key event sequence included in the navigation path is [ right ], i.e. the key moves to the right to the next object; when the starting point state is interface state 1 and the end point state is interface state 3, the navigation key event sequence contained in the navigation path is [ right, right ], namely, the key moves rightwards to the next object, and then continues to move rightwards to the next object; … when the starting point state is interface state 1 and the end point state is interface state 6, the navigation key event sequence included in the navigation path is [ down, right ], namely the key moves downwards to the next object, and then moves to the next object continuously to the right; etc.
In one embodiment, the method further comprises: setting an identifier on each control of each application interface; the parsing the target interface state from the voice command in step 102 includes: analyzing an identifier corresponding to the target interface state from the voice instruction; determining a navigation path to switch from a current interface state to a target interface state in step 103 includes: the target interface state is determined based on the identification corresponding to the target interface state.
In one embodiment, the application interface is provided by a device that does not support voice control functions; the method further comprises the steps of: sending the navigation path to the device which does not support the voice control function; enabling the device which does not support the voice control function to execute the navigation key event sequence.
Fig. 5 is an exemplary flow chart of a voice control application according to an embodiment of the present invention.
As shown in fig. 5, the process includes:
step one: a graphical interface control is identified.
In the field of deep learning, numerous algorithms can implement control recognition of a graphical interface of an application program. For example, a single-stage multi-frame predictor (Single Shot Multi Box Detector, SSD), a look-and-feel (You Only Look Once, YOLO), or a regional convolutional neural network (Regions with CNN features, R-CNN) algorithm may be selected to implement control recognition for a graphical interface.
Exemplary, the data annotation of the graphical interface control needs to include: (1) control horizontal coordinates; (2) control vertical coordinates; (3) control width; (4) control height; (5) whether the control acquires a focus; etc.
Step two: a state matrix of the interface is generated.
When the image of the graphical interface is input, the position, the size and the focus state information of all the controls in the interface can be obtained, and the information is assembled to form a state matrix capable of forming the current interface.
Step three: and establishing a state machine of the interface.
Here, all interfaces of the application are traversed using navigation key events to build a state machine (typically a finite state machine) of the graphical interface. The state machine includes nodes and edges, wherein interface states are represented as nodes and navigation key events are represented as edges. The state machine of the graphical interface can be obtained by recording all interface states and migration events between the states.
FIG. 6 is an exemplary diagram of generating an application interface state machine according to an embodiment of the present invention. In fig. 6, comparing the state matrix of the entering interface state with the stored state matrix can determine whether the entering interface state is a new interface state, if so, storing, otherwise, entering other interface states until all interface states are traversed.
Step four: a navigation path table is generated.
After the state machine of the interface is established, a shortest path between any two interface states can be obtained by using a total-source shortest path algorithm, so that a navigation path table containing each interface state, each other interface state except the interface state and each navigation path for switching from each interface state to each other interface state except the interface state is formed.
Step five: the current interface state is determined.
And identifying the control of the current graphical interface as a pending state matrix, and comparing the pending state matrix with all interface state matrices of the application program, wherein the state with the same state matrix value is the state of the current interface.
Step six: and determining the target interface state.
Here, the target interface state refers to an interface state that the user intends to reach. According to the embodiment of the invention, the intention of the user is acquired through a voice technology, so that voice control is realized. And (3) carrying out automatic voice recognition and natural language understanding processing on voice audios of the users to obtain text description of the controls which are intended to be selected by the users, and matching the text with the text in all the controls in the current interface, wherein the successfully matched controls are the controls which are intended to be selected by the users. The state matrix of the focus of the target control can be found by searching in all interface state matrices, so that the target state is determined.
Step seven: a navigation path is determined.
And (3) inquiring the navigation path table between any two interface states generated in the step four according to the current interface state determined in the step five and the target interface state determined in the step six, so as to obtain the navigation path from the current interface state to the target interface state.
Step eight: and sending a navigation key sequence.
And D, sending a navigation key sequence to the application program according to the navigation path obtained in the step seven, and executing the navigation key sequence by the application program, so that the user is helped to select the target control.
Fig. 7 is an exemplary architecture diagram of a computer whose application interface may be voice controlled according to an embodiment of the present invention.
In fig. 7, the computer includes a processor, a display terminal, and a memory. The memory may process an instruction set, the functions of which may include: (1) identifying a graphical interface control; (2) a state machine for constructing a graphical interface.
A typical application scenario of the embodiment of the present invention is described below.
Fig. 8 is a first exemplary processing diagram of a speech control application according to an embodiment of the present invention.
In the application scenario shown in fig. 8, a third party application is voice controlled (the application does not register for system platform voice control events).
First, based on control identification and state matrix matching, determining that a control currently obtaining focus by a third party application program is: "film one", i.e., state 1 where the current interface state is determined to be the "film one" control selected. The user then speaks a voice instruction: "select film five". By performing operations such as natural language processing on voice instructions, the target text can be determined as: "film five", thus determining the target state as: the "movie five" control is selected state 5. Next, the navigation path table is searched with state 1 as the start state and state 5 as the target state to determine the navigation path to transition from state 1 to state 5 as: the key sequence [ down, right ] is sent to the third party application program, and is executed by the third party application program, namely, the key sequence is firstly moved downwards to the 'film four' control and then moved to the right to the 'film five' control.
Fig. 9 is a second exemplary processing diagram of a speech control application according to an embodiment of the present invention.
In the application scenario illustrated in FIG. 9, selectable controls are identified at the application program interface. All controls in the graphical interface can be identified in terms of position and size, and the boundary range of the controls can be determined. Thus, the identification can be drawn in association with the control near the boundary of the control. The identification may be a number or other type. The user's selection is facilitated when the control has the identification. For example, when a control is identified by a number, the user may be assisted in selecting the control by directly speaking the number.
Displaying a digital identifier near the boundary of each control, wherein the digital identifier of the film one is 1; the number of the film two is identified as 2; and so on, the number of "film six" is identified as 6.
First, based on control identification and state matrix matching, determining that a control currently obtaining focus by a third party application program is: "film one", i.e., state 1 where the current interface state is determined to be the "film one" control selected. The user then speaks a voice instruction: "select third". By performing operations such as natural language processing on voice instructions, the target text can be determined as: "3", thus determining the target state as: the "movie 3" control is selected for state 3. Next, the navigation path table is searched with state 1 as the start state and state 3 as the target state to determine the navigation path to transition from state 1 to state 3 as: the key sequence [ right, right ] is sent to the third party application program, and is executed by the third party application program, namely, the key sequence is firstly moved to the right to the control of the film two, and then moved to the right to the control of the film three.
Fig. 10 is a third exemplary processing diagram of a speech control application according to an embodiment of the present invention. In the application scenario shown in fig. 10, the input source graphical interface is controlled. When the graphical interface comes from a device external to the system (e.g., a set-top box), the interface is not dependent on the voice control events of the system.
In summary, the technology of the invention can carry out voice control on the graphical interface of the voice control event of the unregistered system and the graphical interface of the external source.
Based on the above description, the embodiment of the invention also provides a voice control device of the application interface.
Fig. 11 is a block diagram of a voice control apparatus of an application interface according to an embodiment of the present invention.
As shown in fig. 11, the voice control apparatus of the application interface includes:
a determining module 1101, configured to determine a current interface state;
the parsing module 1102 is configured to parse the target interface state from the voice command;
a retrieving module 1103 for retrieving a navigation path table to determine a navigation path for switching from the current interface state to the target interface state, wherein the navigation path comprises a sequence of navigation key events, the navigation path table being generated based on an application interface state machine comprising interface states and navigation key events between the interface states.
In one embodiment, determination module 1101 is configured to identify each control in the current interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the current interface state based on the vector of each control; and determining a preset interface state matched with the state matrix as the current interface state.
In one embodiment, the retrieving module 1103 is configured to obtain the application interface state machine before the retrieving the navigation path table; determining, for each interface state in the application interface state machine, each navigation path that switches from the each interface state to each of the remaining interface states other than the each interface state using a full source shortest path algorithm; and establishing a navigation path table containing the interface states, the rest interface states except the interface states and each navigation path for switching from the interface states to the rest interface states except the interface states.
In one embodiment, the retrieving module 1103 is configured to traverse all interface states using a predetermined navigation key event to generate the application interface state machine.
In one embodiment, the retrieving module 1103 is configured to start from a starting interface state of a starting application interface, and enter each interface state of each application interface respectively by using a predetermined navigation key event; and patterning each interface state of each application interface as a node of the application interface state machine, and patterning the preset navigation key event as an edge between the nodes.
In one embodiment, the retrieval module 1103 is configured to identify each control in the entered interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the entered interface state based on the vector of each control; and storing the state matrix of the entered interface state.
In one embodiment, the method further comprises: a setting module 1104 for setting an identifier on each control of each application interface; the parsing module 1102 is configured to parse an identifier corresponding to the target interface state from the voice command; the retrieving module 1103 is configured to determine the target interface state based on the identifier corresponding to the target interface state.
In one embodiment, the application interface is provided by a device that does not support voice control functions; the apparatus further comprises: a sending module 1105, configured to send the navigation path to the device that does not support the voice control function; enabling the device which does not support the voice control function to execute the navigation key event sequence.
The embodiment of the invention also provides an intelligent device with a memory-processor architecture.
Fig. 12 is a block diagram of a smart device having a memory-processor architecture in accordance with the present invention.
As shown in fig. 12, a smart device having a memory-processor architecture includes: a processor 1201 and a memory 1202; in which a memory 1202 has stored therein an application executable by a processor 1201 for causing the processor 1201 to execute the speech control method of the application interface as described in any of the above.
The memory 1202 may be implemented as a variety of storage media such as an electrically erasable programmable read-only memory (EEPROM), a Flash memory (Flash memory), a programmable read-only memory (PROM), and the like. The processor 1201 may be a Central Processing Unit (CPU). By way of example, the processor 1201 includes at least one CPU, semiconductor-based microprocessor, programmable Logic Device (PLD), or the like. Exemplary PLDs include Application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), programmable Array Logic (PALs), complex Programmable Logic Devices (CPLDs), and Erasable Programmable Logic Devices (EPLDs). The processor 1201 may include multiple processing elements integrated in a single device or distributed among the devices. The processor sources may process instructions sequentially, simultaneously, or partially simultaneously.
It should be noted that not all the steps and modules in the above processes and the structure diagrams are necessary, and some steps or modules may be omitted according to actual needs. The execution sequence of the steps is not fixed and can be adjusted as required. The division of the modules is merely for convenience of description and the division of functions adopted in the embodiments, and in actual implementation, one module may be implemented by a plurality of modules, and functions of a plurality of modules may be implemented by the same module, and the modules may be located in the same device or different devices.
The hardware modules in the various embodiments may be implemented mechanically or electronically. For example, a hardware module may include custom designed permanent circuits or logic devices for performing specific operations. A hardware module may also include programmable logic devices or circuits (e.g., including a general purpose processor or other programmable processor) temporarily configured by software for performing particular operations.
The present invention also provides a machine-readable storage medium storing instructions for causing a machine to perform a method as described herein. Specifically, a system or apparatus provided with a storage medium on which a software program code realizing the functions of any of the above embodiments is stored, and a computer (or CPU or MPU) of the system or apparatus may be caused to read out and execute the program code stored in the storage medium. Further, some or all of the actual operations may be performed by an operating system or the like operating on a computer based on instructions of the program code. The program code read out from the storage medium may also be written into a memory provided in an expansion board inserted into a computer or into a memory provided in an expansion unit connected to the computer, and then, based on instructions of the program code, a CPU or the like mounted on the expansion board or the expansion unit may be caused to perform part or all of actual operations, thereby realizing the functions of any of the above embodiments.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (14)

1. A voice control method for an application interface, comprising:
determining a current interface state;
analyzing a target interface state from the voice instruction;
retrieving a navigation path table to determine a navigation path to switch from the current interface state to the target interface state, wherein the navigation path comprises a sequence of navigation key events, the navigation path table generated based on an application interface state machine that includes interface states and navigation key events between interface states; further comprises: traversing all interface states by utilizing a preset navigation key event to generate the application interface state machine;
the traversing all interface states using a predetermined navigation key event to generate the application interface state machine comprises: starting from an initial interface state of an initial application interface, respectively entering each interface state of each application interface by utilizing a preset navigation key event; and patterning each interface state of each application interface as a node of the application interface state machine, and patterning the preset navigation key event as an edge between the nodes.
2. The method for voice control of an application interface according to claim 1, wherein said determining a current interface state comprises:
identifying each control in the current interface state;
establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state;
establishing a state matrix of the current interface state based on the vector of each control;
and determining a preset interface state matched with the state matrix as the current interface state.
3. The voice control method of an application interface according to claim 1, further comprising, prior to said retrieving a navigation path table:
acquiring the application interface state machine;
determining, for each interface state in the application interface state machine, each navigation path that switches from the each interface state to each of the remaining interface states other than the each interface state using a full source shortest path algorithm;
and establishing a navigation path table containing the interface states, the rest interface states except the interface states and each navigation path for switching from the interface states to the rest interface states except the interface states.
4. The voice control method of an application interface according to claim 1, further comprising:
identifying each control in the entered interface state;
establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state;
establishing a state matrix of the entered interface state based on the vector of each control;
and storing the state matrix of the entered interface state.
5. The voice control method of an application interface according to claim 1, further comprising:
setting an identifier on each control of each application interface;
the parsing the target interface state from the voice command includes: analyzing an identifier corresponding to the target interface state from the voice instruction;
the determining a navigation path to switch from a current interface state to the target interface state includes: and determining the target interface state based on the identification corresponding to the target interface state.
6. The voice control method of an application interface according to any one of claims 1 to 5, wherein the application interface is provided by a device that does not support a voice control function; the method further comprises the steps of:
Sending the navigation path to the device which does not support the voice control function;
enabling the device which does not support the voice control function to execute the navigation key event sequence.
7. A voice control apparatus for an application interface, comprising:
the determining module is used for determining the current interface state;
the analysis module is used for analyzing the target interface state from the voice instruction;
a retrieval module for retrieving a navigation path table to determine a navigation path to switch from the current interface state to the target interface state, wherein the navigation path comprises a sequence of navigation key events, the navigation path table being generated based on an application interface state machine that includes interface states and navigation key events between interface states;
the search module is used for traversing all interface states by utilizing a preset navigation key event to generate the application interface state machine;
the traversing all interface states using a predetermined navigation key event to generate the application interface state machine comprises: starting from an initial interface state of an initial application interface, respectively entering each interface state of each application interface by utilizing a preset navigation key event; and patterning each interface state of each application interface as a node of the application interface state machine, and patterning the preset navigation key event as an edge between the nodes.
8. The voice control device of an application interface of claim 7,
the determining module is used for identifying each control in the current interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the current interface state based on the vector of each control; and determining a preset interface state matched with the state matrix as the current interface state.
9. The voice control device of an application interface of claim 7,
the retrieval module is used for acquiring the application interface state machine before the navigation path table is retrieved; determining, for each interface state in the application interface state machine, each navigation path that switches from the each interface state to each of the remaining interface states other than the each interface state using a full source shortest path algorithm; and establishing a navigation path table containing the interface states, the rest interface states except the interface states and each navigation path for switching from the interface states to the rest interface states except the interface states.
10. The voice control device of an application interface of claim 7,
the retrieval module is used for identifying each control in the entered interface state; establishing a vector of each control, wherein the vector comprises a control position, a control size and a control focus state; establishing a state matrix of the entered interface state based on the vector of each control; and storing the state matrix of the entered interface state.
11. The voice control device of an application interface of claim 7, further comprising:
the setting module is used for setting an identifier on each control of each application interface;
the analysis module is used for analyzing the identification corresponding to the target interface state from the voice instruction;
the retrieval module is used for determining the target interface state based on the identification corresponding to the target interface state.
12. The voice control apparatus of an application interface according to any one of claims 7 to 11, wherein the application interface is provided by a device that does not support a voice control function; the apparatus further comprises:
a sending module, configured to send the navigation path to the device that does not support a voice control function; enabling the device which does not support the voice control function to execute the navigation key event sequence.
13. An intelligent device is characterized by comprising a processor and a memory;
the memory has stored therein an application executable by the processor for causing the processor to perform the voice control method of the application interface of any one of claims 1 to 6.
14. A computer readable storage medium having stored therein computer readable instructions for performing the voice control method of the application interface of any one of claims 1 to 6.
CN202110067144.9A 2021-01-19 2021-01-19 Voice control method and device of application interface and intelligent equipment Active CN112908323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110067144.9A CN112908323B (en) 2021-01-19 2021-01-19 Voice control method and device of application interface and intelligent equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110067144.9A CN112908323B (en) 2021-01-19 2021-01-19 Voice control method and device of application interface and intelligent equipment

Publications (2)

Publication Number Publication Date
CN112908323A CN112908323A (en) 2021-06-04
CN112908323B true CN112908323B (en) 2024-03-08

Family

ID=76115236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110067144.9A Active CN112908323B (en) 2021-01-19 2021-01-19 Voice control method and device of application interface and intelligent equipment

Country Status (1)

Country Link
CN (1) CN112908323B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115857661A (en) * 2021-09-24 2023-03-28 华为技术有限公司 Voice interaction method, electronic device and computer-readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107608652A (en) * 2017-08-28 2018-01-19 三星电子(中国)研发中心 A kind of method and apparatus of Voice command graphical interfaces
CN108293081A (en) * 2015-11-06 2018-07-17 三星电子株式会社 Pass through the program playback deep linking of user interface event to mobile application state
CN108549560A (en) * 2018-02-28 2018-09-18 腾讯科技(成都)有限公司 Switching method and apparatus, storage medium, the electronic device of interface state
CN109960537A (en) * 2019-03-29 2019-07-02 北京金山安全软件有限公司 Interaction method and device and electronic equipment
CN110321497A (en) * 2019-07-09 2019-10-11 网易(杭州)网络有限公司 The method and device of interface navigation, electronic equipment, storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7409344B2 (en) * 2005-03-08 2008-08-05 Sap Aktiengesellschaft XML based architecture for controlling user interfaces with contextual voice commands
US10203866B2 (en) * 2017-05-16 2019-02-12 Apple Inc. Devices, methods, and graphical user interfaces for navigating between user interfaces and interacting with control objects
US10783061B2 (en) * 2018-06-22 2020-09-22 Microsoft Technology Licensing, Llc Reducing likelihood of cycles in user interface testing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108293081A (en) * 2015-11-06 2018-07-17 三星电子株式会社 Pass through the program playback deep linking of user interface event to mobile application state
CN107608652A (en) * 2017-08-28 2018-01-19 三星电子(中国)研发中心 A kind of method and apparatus of Voice command graphical interfaces
CN108549560A (en) * 2018-02-28 2018-09-18 腾讯科技(成都)有限公司 Switching method and apparatus, storage medium, the electronic device of interface state
CN109960537A (en) * 2019-03-29 2019-07-02 北京金山安全软件有限公司 Interaction method and device and electronic equipment
CN110321497A (en) * 2019-07-09 2019-10-11 网易(杭州)网络有限公司 The method and device of interface navigation, electronic equipment, storage medium

Also Published As

Publication number Publication date
CN112908323A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN108090177B (en) Multi-round question-answering system generation method, equipment, medium and multi-round question-answering system
CN106997762A (en) The sound control method and device of household electrical appliance
CN112908323B (en) Voice control method and device of application interface and intelligent equipment
CN101788855A (en) Method, device and communication terminal for obtaining user input information
CN107832035B (en) Voice input method of intelligent terminal
KR101658920B1 (en) Input method, input system, program and recording medium
CN110019716B (en) Multi-turn question and answer method, terminal equipment and storage medium
JP2020003925A (en) Interaction system control method, interaction system and program
US11482218B2 (en) Voice control method, voice control device, and computer-executable non-volatile storage medium
KR20220113919A (en) Deep learning network determination method, apparatus, electronic device and storage medium
KR19980065342A (en) Screen Command Configuration and Recognition Method of Television Receiver
US20170068427A1 (en) Control method, information processor apparatus and storage medium
JP5701078B2 (en) SEARCH METHOD, SEARCH DEVICE, AND MOVIE EDITING DEVICE
CN104679733A (en) Voice conversation translation method, device and system
CN113778455B (en) Code conversion method and device, electronic equipment and storage medium
CN107977548B (en) Method, device, medium, and electronic device for predicting protein-protein interaction
CN112689826B (en) Method and device for generating instruction unit group
CN106331802B (en) The character input method and system of remote controler
CN111267096A (en) Robot translation skill training method and device, electronic equipment and storage medium
CN112596654B (en) Data processing method, data processing device, electronic equipment control method, device, equipment and electronic equipment
JP2019153132A (en) Device, method, and program for inputting characters
CN114047876B (en) Data sorting method and device based on columnar storage and storage medium
JPH07219591A (en) Voice processing device and method thereof
CN111401011B (en) Information processing method and device and electronic equipment
CN111081226B (en) Speech recognition decoding optimization method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant