CN111061452A - Voice control method and device of user interface - Google Patents

Voice control method and device of user interface Download PDF

Info

Publication number
CN111061452A
CN111061452A CN201911300685.0A CN201911300685A CN111061452A CN 111061452 A CN111061452 A CN 111061452A CN 201911300685 A CN201911300685 A CN 201911300685A CN 111061452 A CN111061452 A CN 111061452A
Authority
CN
China
Prior art keywords
voice
user interface
operable object
index key
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911300685.0A
Other languages
Chinese (zh)
Inventor
方彦彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Intelligent Technology Co Ltd
Original Assignee
Beijing Xiaomi Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Intelligent Technology Co Ltd filed Critical Beijing Xiaomi Intelligent Technology Co Ltd
Priority to CN201911300685.0A priority Critical patent/CN111061452A/en
Publication of CN111061452A publication Critical patent/CN111061452A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The disclosure relates to a voice control method and device of a user interface. The method relates to the intelligent device control technology, and solves the problems of complex configuration, large system resource consumption and narrow application range of the voice assistant implementation scheme. The method comprises the following steps: enabling a voice control mode of the current user interface based on a predetermined instruction; traversing the current user interface to obtain at least one operable object; generating an index key for each of the at least one actionable object; receiving a user voice instruction, determining an index key word matched with the user voice instruction, and operating an operable object corresponding to the index key word. The technical scheme provided by the disclosure is suitable for voice control operation of intelligent equipment, and realizes efficient, accurate and easy-to-use cross-App universal voice control.

Description

Voice control method and device of user interface
Technical Field
The present disclosure relates to intelligent device control technologies, and in particular, to a voice control method and apparatus for a user interface.
Background
With the development of Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) technologies, more and more speech-assisted software is beginning to appear, such as siri by iPhone, mini-ice by microsoft, and the like. The voice assistant functionality also starts with an initial simple dialog to provide more help to the user in conjunction with the terminal device functionality, such as turning on a certain App with a voice assistant, viewing the weather, playing a song, etc. Even so, still can not liberate both hands completely, release each item ability of cell-phone through pronunciation completely.
One common implementation of voice assistant is to record the User Interface (UI) layout of the App, convert the voice command into a simulated gesture, and simulate the user gesture operation to operate the designated control.
However, the layout of the App may change, and since the voice control positioning is determined according to the App layout, if the corresponding relationship between the App layout and the voice operation information is not updated in time, misoperation may occur. And different apps all need to customize relevant simulation gesture operation modes independently, and customizing simulation gestures one by the apps can cause system resource consumption to be overlarge.
Some apps also provide a special interface for the voice assistant to call, so that the voice assistant can call the voice support component of the App to control the App through voice. However, in the scheme, the App needs to customize an interface to the voice assistant, so that the App increases extra workload and has high operation pressure; if the App is only an interface providing part of the main functions, the operating capability of the voice assistant on the App is limited. And not all apps will provide an interface, again resulting in a smaller number of apps that can be operated with the voice assistant.
In conclusion, the voice assistant does not have a uniform control mode on the user interface in the App, so that the accuracy and the efficiency of voice control are low.
Disclosure of Invention
To overcome the problems in the related art, the present disclosure provides a voice control method and apparatus for a user interface.
According to a first aspect of the embodiments of the present disclosure, there is provided a method for controlling a user interface with voice, the method comprising:
enabling a voice control mode of the current user interface based on a predetermined instruction;
traversing the current user interface to obtain at least one operable object;
generating an index key for each of the at least one actionable object;
receiving a user voice instruction, determining an index key word matched with the user voice instruction, and operating an operable object corresponding to the index key word.
Preferably, the step of traversing the current user interface to obtain at least one operable object includes:
acquiring content to be displayed on the current user interface from a server;
and determining at least one operable object in the content to be displayed.
Preferably, the step of generating the index key of each of the at least one operable object includes:
generating a unique index key for each operable object, the index key comprising any one or any number of the following forms:
all the character description information of the operable object, partial character description information of the operable object and the number of the operable object.
Preferably, after the step of generating the index key of each of the at least one operable object, the method further includes:
and if the index key is the number of the operable object, marking the number at the display position of the operable object under the current user interface.
Preferably, the receiving a user voice instruction, determining an index key word matched with the user voice instruction, and the operating an operable object corresponding to the index key word includes:
recognizing a user voice instruction, wherein the user voice instruction comprises voice operation information and/or voice object information, the voice operation information indicates an operation, and the voice object information indicates an object pointed by the operation;
determining an operable object pointed by the user voice instruction according to the voice object information;
determining the operation executed on the operable object according to the voice operation information;
and executing the user voice instruction according to the operable object and/or the operation.
Preferably, the step of determining an operable object pointed by the user voice instruction according to the voice object information includes:
inquiring the index key words and determining the index key words matched with the voice object information;
and determining the operable object corresponding to the index key as the operable object pointed by the voice object information.
Preferably, the step of determining the operation performed on the operable object according to the voice operation information includes:
inquiring a preset operation list, wherein the operation list comprises a plurality of operations;
determining an operation matched with the voice operation information, wherein the operation is taken as the operation executed on the operable object
According to a second aspect of embodiments of the present disclosure, there is provided a voice control apparatus of a user interface, including:
the mode starting module is used for starting a voice control mode of the current user interface based on a preset instruction;
the operation object acquisition module is used for traversing the current user interface to acquire at least one operation object;
an index generation module, configured to generate an index key for each of the at least one actionable object;
and the instruction execution module is used for receiving a user voice instruction, determining an index key word matched with the user voice instruction and operating an operable object corresponding to the index key word.
Preferably, the operation object obtaining module includes:
a content to be displayed acquisition submodule, configured to acquire, from a server, content to be displayed on the current user interface;
and the operation object determining submodule is used for determining at least one operable object in the content to be displayed.
Preferably, the index generating module includes:
a key generating submodule, configured to generate a unique index key for each operable object determined by traversal, where the index key includes any one or any multiple of the following forms:
all the character description information of the operable object, partial character description information of the operable object and the number of the operable object.
Preferably, the index generating module further includes:
and the labeling sub-module is used for labeling the number at the display position of the operable object in the current user interface under the condition that the index key word is the number of the operable object.
Preferably, the instruction execution module includes:
the voice recognition sub-module is used for recognizing a user voice instruction, the user voice instruction comprises voice operation information and/or voice object information, the voice operation information indicates operation, and the voice object information indicates an object pointed by the operation;
the object determining submodule is used for determining an operable object pointed by the user voice instruction according to the voice object information;
the operation determining submodule is used for determining the operation executed on the operable object according to the voice operation information;
and the instruction execution submodule is used for executing the user voice instruction according to the operable object and/or the operation.
Preferably, the object determination submodule includes:
the index query unit is used for querying the index keywords and determining the index keywords matched with the voice object information;
and the pointing object determining unit is used for determining that the operable object corresponding to the index key is the operable object pointed by the voice object information.
Preferably, the operation determination submodule includes:
the device comprises a list query unit, a list selection unit and a list selection unit, wherein the list query unit is used for querying a preset operation list which comprises a plurality of operations;
an operation determination unit configured to determine an operation matching the voice operation information as an operation performed on the operable object.
According to a third aspect of embodiments of the present disclosure, there is provided a computer apparatus comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
enabling a voice control mode of the current user interface based on a predetermined instruction;
traversing the current user interface to obtain at least one operable object;
generating an index key for each of the at least one actionable object;
receiving a user voice instruction, determining an index key word matched with the user voice instruction, and operating an operable object corresponding to the index key word.
According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having instructions therein, which when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method of voice control of a user interface, the method comprising:
enabling a voice control mode of the current user interface based on a predetermined instruction;
traversing the current user interface to obtain at least one operable object;
generating an index key for each of the at least one actionable object;
receiving a user voice instruction, determining an index key word matched with the user voice instruction, and operating an operable object corresponding to the index key word.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: when voice control needs to be started, based on a preset instruction, a voice control mode of a current user interface is started, the current user interface is traversed to obtain at least one operable object, then index keywords of the at least one operable object are generated, a user voice instruction is received, index keywords matched with the user voice instruction are determined, and the operable object corresponding to the index keywords is operated. The voice control is completed based on the index keywords generated in real time, the index keywords are completely matched with the current application environment, the efficient, accurate and easy-to-use cross-App universal voice control is realized, and the problems of complicated configuration, large system resource consumption and narrow application range of a voice assistant implementation scheme are solved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow chart illustrating a method of voice control according to an exemplary embodiment.
FIG. 2 is a flow chart illustrating a method of voice control according to an exemplary embodiment.
FIG. 3 is a flow chart illustrating a method of voice control according to an exemplary embodiment.
FIG. 4 is a flow chart illustrating a method of voice control according to an example embodiment.
FIG. 5 is a flow diagram illustrating the determination of an actionable object in accordance with an exemplary embodiment.
FIG. 6 is a flowchart illustrating operations for determining execution, according to an example embodiment.
FIG. 7 is a block diagram illustrating a voice-controlled apparatus according to an exemplary embodiment.
FIG. 8 is a block diagram illustrating a voice-controlled apparatus according to an exemplary embodiment.
Fig. 9 is a block diagram illustrating the index generation module 703 according to an example embodiment.
Fig. 10 is a block diagram illustrating the index generation module 703 according to an example embodiment.
FIG. 11 is a block diagram illustrating an instruction execution module 704 according to an example embodiment.
FIG. 12 is a block diagram illustrating an object determination submodule 1102 according to an example embodiment.
Fig. 13 is a block diagram illustrating an operation determination sub-module 1103 according to an example embodiment.
Fig. 14 is a block diagram illustrating an apparatus (a general structure of a mobile terminal) according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
One common implementation of voice assistant is to record the User Interface (UI) layout of the App, convert the voice command into a simulated gesture, and simulate the user gesture operation to operate the designated control.
However, the layout of the App may change, and since the voice control positioning is determined according to the App layout, if the corresponding relationship between the App layout and the voice operation information is not updated in time, misoperation may occur. And different apps all need to customize relevant simulation gesture operation modes independently, and customizing simulation gestures one by the apps can cause system resource consumption to be overlarge.
Some apps also provide a special interface for the voice assistant to call, so that the voice assistant can call the voice support component of the App to control the App through voice. However, in the scheme, the App needs to customize an interface to the voice assistant, so that the App increases extra workload and has high operation pressure; if the App is only an interface providing part of the main functions, the operation capability of the voice assistant on the App is limited. And not all apps will provide an interface, again resulting in a smaller number of apps that can be operated with the voice assistant.
In order to solve the above-mentioned problems, exemplary embodiments of the present disclosure provide a voice control method and apparatus. The voice indexes of the apps are generated through dynamic traversal, real-time voice control configuration matched with the actual application environment is achieved, and the problems that the voice assistant implementation scheme is complex in configuration, large in system resource consumption and narrow in application range are solved.
An exemplary embodiment of the present disclosure provides a method for controlling a voice of a user interface, where a flow of implementing voice control using the method is shown in fig. 1, and the method includes:
step 101, based on a predetermined instruction, a voice control mode of a current user interface is enabled.
In an embodiment of the present disclosure, a voice control mode may be established within the system in which voice control is enabled.
The voice control mode can be started through a preset instruction. The command may take many forms, such as by a certain key/key combination, a specific voice command (e.g., "initiate voice control"), a specific gesture, and so forth. These instructions may be fixed in the system or may be user-defined.
In the embodiment of the present disclosure, the current user interface includes any one or any multiple of the following interfaces:
a system interface, a lock screen interface, and an in-application interface.
The system interface is an interface of an operating system, and the embodiment of the disclosure can be applied to any interface under the condition that the operating system interface comprises various different interfaces.
The screen locking interface is an interface under the condition of screen locking.
The in-application interface is a visual interface of different applications, software, components and the like installed in the system, and is activated when the application is started.
And 102, traversing the current user interface to obtain at least one operable object.
In the embodiment of the disclosure, the operable object may be an icon, a button, or the like under the interface. And may be anywhere in the screen. Specifically, all clickable objects are found as operable objects by traversing the current terminal page.
In this step, when voice control is required, real-time traversal is started to determine the operable object, and a configuration basis is provided for accurate voice control.
And 103, generating an index key of each operable object.
In the disclosed embodiment, each actionable object has an index key. According to the index key words, the corresponding operable objects can be determined. The user voice instruction comprises a part for indicating the index key word, and the corresponding operable object can be determined after the index key word is determined according to the recognition result of the part, so that the user voice instruction is executed.
And 104, receiving a user voice instruction, determining an index key word matched with the user voice instruction, and operating an operable object corresponding to the index key word.
In this step, a user voice instruction is executed to perform operation control on the current user interface. Specifically, the recognition of the user voice command may obtain a recognition result, where the recognition result includes information of both the object and the operation. And after determining the object according to the index key words, executing the user voice instruction.
An exemplary embodiment of the present disclosure also provides a voice control method of a user interface, where in a case where a voice control mode is activated, contents currently displayed on the user interface may be changed due to a user operation. A specific process of acquiring an operable object in this scenario is shown in fig. 2, and includes:
step 201, obtaining the content to be displayed on the current user interface from the server.
In this step, the content to be displayed on the current user interface is acquired according to the user operation.
Specifically, the content to be displayed on the current user interface may be obtained according to a user operation, such as sliding a screen, clicking a certain icon to activate a new page, and the like.
Step 202, determining at least one operable object in the content to be displayed.
In this step, the content to be displayed is traversed to obtain at least one operable object.
And after the content to be displayed is loaded to the current user interface for display, the generated index key words can be applied to the current user interface, and then the voice instruction of the user is executed.
An exemplary embodiment of the present disclosure further provides a voice control method for a user interface, where index keywords of each determined operable object are established based on related information of the operable object, and a specific flow is shown in fig. 3, where the specific flow includes:
step 301, generating a unique index key for each determined operable object through traversal.
In the embodiment of the present disclosure, the index key includes any one or any multiple of the following forms:
all the character description information of the operable object, partial character description information of the operable object and the number of the operable object.
Actionable objects typically have a textual description (e.g., App name) that is variable in length. All the literal description information of the operable object can be used as an index key word, such as 'short message'.
When the character description is long, part of the character description information can be extracted from the character description as an index key word. Or numbering the operable object, wherein the number can be a number, and the number is used as an index key of the operable object.
When the operable object has no corresponding text description, the operable object can be numbered, and the number is used as an index key of the operable object.
Preferably, when the operable object has a long text description, the text description and the number corresponding to the operable object may be set as the index key at the same time. In this way, the user can be indexed to the operable object when the user subsequently issues a voice command to point to the text description or number.
Step 302, under the condition that the index key word is the number of the operable object, marking the number at the display position of the operable object under the current user interface.
The step is an optional step, and under the condition that a number is generated for the operable object as an index key word, the number is marked at the position of the operable object of the current user interface, namely the number is displayed at the position of the operable object, and during the voice control period, the number is the index key word of the operable object. The user can directly see the number corresponding to the operable object, and then accurately send out the voice instruction.
In the embodiment of the invention, in consideration of the diversity of the language description of the same operable object in the user voice instruction, a voice index can be established based on the index key word, and the incidence relation between the index key word and the voice object information is indicated in the voice index. For voice indexing, voice recognition is typically supported.
For example, the word "short message" is associated with the voice of the "short message", and when the text converted from the voice recognition result is the "short message", the index keyword can be determined as the "short message" according to the voice index, and the operable object is further determined; or associating the index key "favorites" with the speech recognition results such as "my favorites"/"favorites", allows the same actionable object to be selected using one or more speech recognition results that are not identical to the index key.
An exemplary embodiment of the present disclosure further provides a voice control method for a user interface, where after an index keyword is generated in real time, a user voice instruction is executed based on the index keyword, and an operable object corresponding to the user voice instruction is operated, where a specific flow is shown in fig. 4, and the method includes:
step 401, recognizing a user voice instruction.
In this step, after entering a speech control mode or other scenes where speech control is started, the user speech command is recognized.
The user voice instruction comprises voice operation information and/or voice object information, the voice operation information indicates operation, and the voice object information indicates an object to which the operation points.
Step 402, determining an operable object pointed by the user voice instruction according to the voice object information.
As shown in fig. 5, the present step includes:
step 501, inquiring the index key words, and determining the index key words matched with the voice object information.
In this step, first, according to the recognition result of the voice object information, the corresponding index keyword is obtained by matching.
Step 502, determining that the operable object corresponding to the index keyword is an operable object pointed by the voice object information.
In this step, after the index key is determined, finally, the correlation is determined between the voice object information and the operable object according to the corresponding relationship of the operable object to which the index key belongs, and the operable object is selected.
Step 403, determining the operation performed on the operable object according to the voice operation information.
As shown in fig. 6, the present step includes:
step 601, inquiring a preset operation list, wherein the operation list comprises a plurality of operations.
In the embodiment of the present disclosure, an operation list may be maintained, and virtual operations that can be performed by voice operation information are maintained in the operation list, for example: point (short press), press (long press), browse (slow slide), return (return to the upper level interface), desktop (return to the desktop), sound adjustment (volume operation), input (input method input), and the like.
Step 602, determining an operation matched with the voice operation information, and taking the operation as the operation executed on the operable object.
In this step, according to the recognition result of the voice operation information, the corresponding operation is matched, and the operation executed on the operable object is further determined.
It should be noted that part of the voice operation information may be used alone, and it is not necessary to form a voice instruction together with the voice object information, that is, part of the operation is an operation of not pointing to a close-up object, for example, a "return to the previous stage".
Step 402 and step 403 are optional steps, and step 402 needs to be executed only when the voice instruction includes the voice object information, and step 403 needs to be executed only when the voice instruction includes the voice operation information. Steps 402 and 403 can be performed by different components or performed by the same component without strict timing relationship.
And step 404, executing the user voice instruction according to the operable object and/or the operation.
In this step, the voice command is executed according to the operable object and/or operation obtained by analyzing the voice command.
Specifically, in the case that the voice command includes voice operation information and voice object information, the result of the operable object + operation can be obtained after the voice command is parsed, that is, the corresponding operation is performed on the corresponding operable object.
In the case where the voice command includes only the voice operation information, the voice command may be analyzed to obtain an operation, and only the operation may be executed.
In the case where the voice instruction includes only the voice object information, if the system has previously configured a default voice operation, the default voice operation is performed on the manipulatable object determined from the voice object information. For example, if the default voice operation is "open" and the voice content of the voice object information is "WeChat", the operation of opening the WeChat is executed.
When the voice instruction is executed, because the application environment is generated by traversing the current user interface in real time, the voice instruction is only relevant to the current user interface, for example, the current foreground application is panning, and the result of the voice instruction "open shopping cart" is to open the panning shopping cart instead of other applications such as the kyoto and amazon.
In the embodiment of the disclosure, a user can also customize a special voice command and display a direct voice service supported by the current App, and the service is open to the App and customized by the App. When App is started, corresponding direct voice service can be applied.
An exemplary embodiment of the present disclosure also provides a voice control apparatus of a user interface, whose structure is shown in fig. 7, including:
a mode starting module 701, configured to start a voice control mode of a current user interface based on a predetermined instruction;
an operation object obtaining module 702, configured to traverse the current user interface to obtain at least one operation object;
an index generating module 703, configured to generate an index key for each of the at least one operable object;
the instruction execution module 704 is configured to receive a user voice instruction, determine an index keyword matched with the user voice instruction, and operate an operable object corresponding to the index keyword.
Preferably, the structure of the operation object obtaining module is as shown in fig. 8, and includes:
a to-be-displayed content obtaining sub-module 801, configured to obtain, from a server, content to be displayed on the current user interface;
an operation object determining sub-module 802, configured to determine at least one operable object in the content to be displayed.
Preferably, the structure of the index generating module 703 is as shown in fig. 9, and includes:
the keyword generation submodule 901 is configured to generate a unique index keyword for each operable object determined by traversal, where the index keyword includes any one or more of the following forms:
all the character description information of the operable object, partial character description information of the operable object and the number of the operable object.
Preferably, the structure of the index generating module 703 is as shown in fig. 10, and further includes:
and a labeling sub-module 902, configured to label, in the case that the index key is the number of the operable object, the number at the display position of the operable object in the current user interface.
Preferably, the structure of the instruction execution module 704 is shown in fig. 11, and includes:
a voice recognition submodule 1101, configured to recognize a user voice instruction, where the user voice instruction includes voice operation information and/or voice object information, the voice operation information indicates an operation, and the voice object information indicates an object to which the operation is directed;
the object determination submodule 1102 is configured to determine, according to the voice object information, an operable object to which the user voice instruction points;
an operation determining sub-module 1103, configured to determine, according to the voice operation information, an operation performed on the operable object;
and the instruction execution sub-module 1104 is used for executing the user voice instruction according to the operable object and/or the operation.
Preferably, the object determination submodule 1102 is configured as shown in fig. 12, and includes:
an index querying unit 1201, configured to query the index keyword, and determine the index keyword matched with the voice object information;
a pointed object determining unit 1202, configured to determine that the operable object corresponding to the index key is the operable object pointed to by the voice object information.
Preferably, the structure of the operation determining sub-module 1103 is shown in fig. 13, and includes:
a list query unit 1301, configured to query a preset operation list, where the operation list includes multiple operations;
an operation determining unit 1302, configured to determine an operation matching the voice operation information, with the operation as an operation performed on the operable object.
An exemplary embodiment of the present disclosure also provides a computer apparatus including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
enabling a voice control mode of the current user interface based on a predetermined instruction;
traversing the current user interface to obtain at least one operable object;
generating an index key for each of the at least one actionable object;
receiving a user voice instruction, determining an index key word matched with the user voice instruction, and operating an operable object corresponding to the index key word.
Fig. 14 is a block diagram illustrating an apparatus 1400 for speech control according to an example embodiment. For example, the apparatus 1400 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 14, apparatus 1400 may include one or more of the following components: a processing component 1402, a memory 1404, a power component 1406, a multimedia component 1408, an audio component 1410, an input/output (I/O) interface 1412, a sensor component 1414, and a communication component 1416.
The processing component 1402 generally controls the overall operation of the device 1400, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing component 1402 may include one or more processors 1420 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 1402 can include one or more modules that facilitate interaction between processing component 1402 and other components. For example, the processing component 1402 can include a multimedia module to facilitate interaction between the multimedia component 1408 and the processing component 1402.
The memory 1404 is configured to store various types of data to support operation at the device 1400. Examples of such data include instructions for any application or method operating on device 1400, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1404 may be implemented by any type of volatile or non-volatile storage device or combination of devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 1406 provide power to the various components of device 1400. Power components 1406 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for device 1400.
The multimedia component 1408 includes a screen that provides an output interface between the device 1400 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1408 includes a front-facing camera and/or a rear-facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 1400 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 1410 is configured to output and/or input audio signals. For example, the audio component 1410 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 1400 is in operating modes, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 1404 or transmitted via the communication component 1416. In some embodiments, audio component 1410 further includes a speaker for outputting audio signals.
I/O interface 1412 provides an interface between processing component 1402 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 1414 includes one or more sensors for providing various aspects of state assessment for the apparatus 1400. For example, the sensor component 1414 may detect an open/closed state of the device 1400, a relative positioning of components, such as a display and keypad of the apparatus 1400, a change in position of the apparatus 1400 or a component of the apparatus 1400, the presence or absence of user contact with the apparatus 1400, an orientation or acceleration/deceleration of the apparatus 1400, and a change in temperature of the apparatus 1400. The sensor assembly 1414 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 1414 may also include a photosensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1414 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 1416 is configured to facilitate wired or wireless communication between the apparatus 1400 and other devices. The device 1400 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 1416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1416 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 1400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as the memory 1404 that includes instructions executable by the processor 1420 of the apparatus 1400 to perform the above-described method. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method of voice control of a user interface, the method comprising:
enabling a voice control mode of the current user interface based on a predetermined instruction;
traversing the current user interface to obtain at least one operable object;
generating an index key for each of the at least one actionable object;
receiving a user voice instruction, determining an index key word matched with the user voice instruction, and operating an operable object corresponding to the index key word.
When voice control needs to be started, a voice control mode of a current user interface is started based on a preset instruction, the current user interface is traversed to obtain at least one operable object, then index keywords of the at least one operable object are generated, a user voice instruction is received, index keywords matched with the user voice instruction are determined, and the operable object corresponding to the index keywords is operated. The voice control is completed based on the index keywords generated in real time, the index keywords are completely matched with the current application environment, the efficient, accurate and easy-to-use cross-App universal voice control is realized, and the problems of complicated configuration, large system resource consumption and narrow application range of a voice assistant implementation scheme are solved.
The method can be applied to mobile terminals such as mobile phones of android systems and the like, and most operations are completed through voice control.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (16)

1. A method for voice control of a user interface, comprising:
enabling a voice control mode of the current user interface based on a predetermined instruction;
traversing the current user interface to obtain at least one operable object;
generating an index key for each of the at least one actionable object;
receiving a user voice instruction, determining an index key word matched with the user voice instruction, and operating an operable object corresponding to the index key word.
2. The method of claim 1, wherein traversing the current user interface to obtain at least one actionable object comprises:
acquiring content to be displayed on the current user interface from a server;
and determining at least one operable object in the content to be displayed.
3. The method of claim 1, wherein the step of generating an index key for each of the at least one actionable object comprises:
generating a unique index key for each operable object, the index key comprising any one or any number of the following forms:
all the character description information of the operable object, partial character description information of the operable object and the number of the operable object.
4. The method of claim 3, wherein the step of generating an index key for each of the at least one actionable object is further followed by:
and if the index key is the number of the operable object, marking the number at the display position of the operable object under the current user interface.
5. The method of claim 3, wherein the receiving a user voice command, determining an index key matching the user voice command, and operating an operable object corresponding to the index key comprises:
recognizing the user voice instruction, and extracting voice operation information and/or voice object information in the user voice instruction, wherein the voice operation information indicates operation, and the voice object information indicates an object pointed by the operation;
determining an operable object pointed by the user voice instruction according to the voice object information;
determining the operation executed on the operable object according to the voice operation information;
and executing the user voice instruction according to the operable object and/or the operation.
6. The method of claim 5, wherein the step of determining an operable object pointed to by the user voice command according to the voice object information comprises:
determining an index keyword matched with the voice object information;
and determining the operable object corresponding to the index key as the operable object pointed by the voice object information.
7. The method of claim 5, wherein the step of determining the operation performed on the operable object according to the voice operation information comprises:
inquiring a preset operation list, wherein the operation list comprises a plurality of operations;
and determining the operation matched with the voice operation information, wherein the operation is taken as the operation executed on the operable object.
8. A voice control apparatus for a user interface, comprising:
the mode starting module is used for starting a voice control mode of the current user interface based on a preset instruction;
the operation object acquisition module is used for traversing the current user interface to acquire at least one operation object;
an index generation module, configured to generate an index key for each of the at least one actionable object;
and the instruction execution module is used for receiving a user voice instruction, determining an index key word matched with the user voice instruction and operating an operable object corresponding to the index key word.
9. The voice control apparatus of a user interface according to claim 8, wherein the operation object obtaining module includes:
a content to be displayed acquisition submodule, configured to acquire, from a server, content to be displayed on the current user interface;
and the operation object determining submodule is used for determining at least one operable object in the content to be displayed.
10. The voice-controlled apparatus of a user interface of claim 8, wherein the index generation module comprises:
a key generating submodule, configured to generate a unique index key for each operable object determined by traversal, where the index key includes any one or any multiple of the following forms:
all the character description information of the operable object, partial character description information of the operable object and the number of the operable object.
11. The voice-controlled apparatus of a user interface of claim 10, wherein the index generation module further comprises:
and the labeling sub-module is used for labeling the number at the display position of the operable object in the current user interface under the condition that the index key word is the number of the operable object.
12. The voice-controlled apparatus of a user interface of claim 10, wherein the instruction execution module comprises:
the voice recognition sub-module is used for recognizing the user voice instruction, the user voice instruction comprises voice operation information and/or voice object information, the voice operation information indicates operation, and the voice object information indicates an object to which the operation points;
the object determining submodule is used for determining an operable object pointed by the user voice instruction according to the voice object information;
the operation determining submodule is used for determining the operation executed on the operable object according to the voice operation information;
and the instruction execution submodule is used for executing the user voice instruction according to the operable object and/or the operation.
13. The voice control apparatus of user interface of claim 12, wherein the object determination submodule comprises:
the index query unit is used for querying the index keywords and determining the index keywords matched with the voice object information;
and the pointing object determining unit is used for determining that the operable object corresponding to the index key is the operable object pointed by the voice object information.
14. The voice-controlled apparatus of a user interface of claim 12, wherein the operation-determining sub-module comprises:
the device comprises a list query unit, a list selection unit and a list selection unit, wherein the list query unit is used for querying a preset operation list which comprises a plurality of operations;
an operation determination unit configured to determine an operation matching the voice operation information as an operation performed on the operable object.
15. A computer device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
enabling a voice control mode of the current user interface based on a predetermined instruction;
traversing the current user interface to obtain at least one operable object;
generating an index key for each of the at least one actionable object;
receiving a user voice instruction, determining an index key word matched with the user voice instruction, and operating an operable object corresponding to the index key word.
16. A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method of voice control of a user interface, the method comprising:
enabling a voice control mode of the current user interface based on a predetermined instruction;
traversing the current user interface to obtain at least one operable object;
generating an index key for each of the at least one actionable object;
receiving a user voice instruction, determining an index key word matched with the user voice instruction, and operating an operable object corresponding to the index key word.
CN201911300685.0A 2019-12-17 2019-12-17 Voice control method and device of user interface Pending CN111061452A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911300685.0A CN111061452A (en) 2019-12-17 2019-12-17 Voice control method and device of user interface

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911300685.0A CN111061452A (en) 2019-12-17 2019-12-17 Voice control method and device of user interface

Publications (1)

Publication Number Publication Date
CN111061452A true CN111061452A (en) 2020-04-24

Family

ID=70301698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911300685.0A Pending CN111061452A (en) 2019-12-17 2019-12-17 Voice control method and device of user interface

Country Status (1)

Country Link
CN (1) CN111061452A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506245A (en) * 2020-04-27 2020-08-07 北京小米松果电子有限公司 Terminal control method and device
CN112286487A (en) * 2020-12-30 2021-01-29 智道网联科技(北京)有限公司 Voice guidance operation method and device, electronic equipment and storage medium
CN113050845A (en) * 2021-03-31 2021-06-29 联想(北京)有限公司 Processing method and processing device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101188108A (en) * 2007-12-17 2008-05-28 凯立德欣技术(深圳)有限公司 A voice control method, device and mobile terminal
CN103002138A (en) * 2012-11-21 2013-03-27 中兴通讯股份有限公司 Method and system for starting mobile phone applications
CN103869948A (en) * 2012-12-14 2014-06-18 联想(北京)有限公司 Voice command processing method and electronic device
CN104184890A (en) * 2014-08-11 2014-12-03 联想(北京)有限公司 Information processing method and electronic device
CN106504748A (en) * 2016-10-08 2017-03-15 珠海格力电器股份有限公司 A kind of sound control method and device
CN106775814A (en) * 2016-11-18 2017-05-31 上海传英信息技术有限公司 Mobile terminal and its operating method
CN107948698A (en) * 2017-12-14 2018-04-20 深圳市雷鸟信息科技有限公司 Sound control method, system and the smart television of smart television

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101188108A (en) * 2007-12-17 2008-05-28 凯立德欣技术(深圳)有限公司 A voice control method, device and mobile terminal
CN103002138A (en) * 2012-11-21 2013-03-27 中兴通讯股份有限公司 Method and system for starting mobile phone applications
CN103869948A (en) * 2012-12-14 2014-06-18 联想(北京)有限公司 Voice command processing method and electronic device
CN104184890A (en) * 2014-08-11 2014-12-03 联想(北京)有限公司 Information processing method and electronic device
CN106504748A (en) * 2016-10-08 2017-03-15 珠海格力电器股份有限公司 A kind of sound control method and device
CN106775814A (en) * 2016-11-18 2017-05-31 上海传英信息技术有限公司 Mobile terminal and its operating method
CN107948698A (en) * 2017-12-14 2018-04-20 深圳市雷鸟信息科技有限公司 Sound control method, system and the smart television of smart television

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506245A (en) * 2020-04-27 2020-08-07 北京小米松果电子有限公司 Terminal control method and device
CN112286487A (en) * 2020-12-30 2021-01-29 智道网联科技(北京)有限公司 Voice guidance operation method and device, electronic equipment and storage medium
CN113050845A (en) * 2021-03-31 2021-06-29 联想(北京)有限公司 Processing method and processing device

Similar Documents

Publication Publication Date Title
US10705780B2 (en) Method, device, and storage medium for displaying application page
CN107193606B (en) Application distribution method and device
CN109144285B (en) Input method and device
CN107562349B (en) Method and device for executing processing
CN106547547B (en) data acquisition method and device
CN105426094B (en) Information pasting method and device
EP3428790B1 (en) Method and device for displaying application interface
CN111061452A (en) Voice control method and device of user interface
CN114240882A (en) Defect detection method and device, electronic equipment and storage medium
EP3147802A1 (en) Method and apparatus for processing information
CN108803892B (en) Method and device for calling third party application program in input method
CN108766427B (en) Voice control method and device
CN104951522B (en) Method and device for searching
CN109992754B (en) Document processing method and device
CN111209195B (en) Method and device for generating test case
JP2017520877A5 (en)
CN111552688A (en) Data export method and device and electronic equipment
CN111667827B (en) Voice control method and device for application program and storage medium
CN111488267B (en) Interface test script generation method and device and electronic equipment
CN114051157A (en) Input method and device
CN113377322A (en) Page direct processing method and device and electronic equipment
CN113778385B (en) Component registration method, device, terminal and storage medium
CN110825891B (en) Method and device for identifying multimedia information and storage medium
CN112507162B (en) Information processing method, device, terminal and storage medium
CN111258436B (en) Configuration information modification method, device and readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination