CN115798469A

CN115798469A - Voice control method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN115798469A
Application number: CN202211360723.3A
Authority: CN
Inventors: 周文欢
Original assignee: Apollo Zhilian Beijing Technology Co Ltd
Current assignee: Apollo Zhilian Beijing Technology Co Ltd
Priority date: 2022-11-01
Filing date: 2022-11-01
Publication date: 2023-03-14

Abstract

The disclosure discloses a voice control method and device, electronic equipment and a computer readable storage medium, and relates to the technical field of computers, in particular to the technical field of automatic driving. The specific implementation scheme is as follows: receiving a voice control instruction under the condition that a target interface is displayed; responding to the voice control instruction, and acquiring an automatic voice recognition result of the voice control instruction; determining text content corresponding to the automatic voice recognition result in a preset word bank; the preset word stock is obtained by generalizing control information corresponding to each functional control of each interface, the interfaces are multiple and comprise the target interface, and the control information comprises the text description information of each functional control; determining a target function control corresponding to the text content; and executing the voice control instruction based on the target function control. By adopting the voice control method, the voice control method with lower manpower consumption and higher control efficiency can be provided.

Description

Voice control method and device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for controlling speech, an electronic device, and a computer-readable storage medium.

Background

In the related art, the application of voice control in a vehicle-mounted scene is more and more extensive. Exemplarily, before using voice control, APP (application), application) developers need to register functional elements of a third party APP supporting voice control to a voice engine, and associate each functional element with a corresponding function implementation interface, so that when a user performs voice control, a car machine system can send a voice control instruction of the user to the corresponding APP, and the APP calls the corresponding function implementation interface to implement voice control of a certain functional control of the APP.

Disclosure of Invention

The disclosure provides a voice control method, a voice control device, an electronic device and a computer readable storage medium.

According to an aspect of the present disclosure, there is provided a voice control method including:

receiving a voice control instruction under the condition that a target interface is displayed;

responding to the voice control instruction, and acquiring an automatic voice recognition result of the voice control instruction;

determining text content corresponding to the automatic voice recognition result in a preset word bank; the preset word stock is obtained by generalizing control information corresponding to each functional control of each interface, the interfaces are multiple and comprise the target interface, and the control information comprises the text description information of each functional control;

determining a target function control corresponding to the text content;

and executing the voice control instruction based on the target function control.

According to another aspect of the present disclosure, there is provided a voice control apparatus including:

the instruction receiving module is used for receiving a voice control instruction under the condition that a target interface is displayed;

the first acquisition module is used for responding to the voice control instruction and acquiring an automatic voice recognition result of the voice control instruction;

the first determining module is used for determining the text content corresponding to the automatic voice recognition result in a preset word bank; the preset word stock is obtained by generalizing control information corresponding to each functional control of each interface, the interfaces are multiple and comprise the target interface, and the control information comprises the text description information of each functional control;

the second determining module is used for determining a target function control corresponding to the text content;

and the execution module is used for executing the voice control instruction based on the target function control.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to any one of the aspects.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of the aspects.

According to another aspect of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps in the method according to any one of claims 1-11.

The beneficial effect of this disclosure:

in the embodiment of the disclosure, a voice control instruction is received under the condition that a target interface is displayed; responding to the voice control instruction, and acquiring an automatic voice recognition result of the voice control instruction; determining text content corresponding to an automatic voice recognition result in a preset word bank; determining a target function control corresponding to the text content; and executing the voice control instruction based on the target function control. Thus, the voice control method with low manpower consumption and high control efficiency can be provided.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become readily apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic flow chart diagram of a voice control method according to a first embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram of a voice control method according to a second embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating a process for determining a target functionality control according to a second embodiment of the present disclosure;

FIG. 4 is a schematic flow diagram of a voice-controlled apparatus according to a first embodiment of the present disclosure;

FIG. 5 is a block diagram of an electronic device used to implement the voice control method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

A voice control method, an apparatus, an electronic device, and a computer-readable storage medium according to embodiments of the present disclosure will be described below with reference to the accompanying drawings, and first, the voice control method according to embodiments of the present disclosure will be described with reference to the accompanying drawings.

Fig. 1 is a flowchart of a voice control method according to an embodiment of the present disclosure, which is applied to a car machine system, and may be executed by a controller of the car machine system, for example. As shown in fig. 1, a speech control method according to an embodiment of the present disclosure may include:

and step S110, receiving a voice control instruction under the condition that the target interface is displayed.

In the embodiment of the disclosure, when a user wants to control a certain function control element of a certain APP by voice, the user can input a voice control instruction to the car machine system. For example, after the car-mounted device system is started, a user may speak a voice control instruction to be executed in a voice speaking manner when a central control display screen of the car-mounted device system displays a certain interface (i.e., a target interface), so that the car-mounted microphone may receive the voice control instruction of the user and may transmit the voice control instruction to the car-mounted device system, so that the car-mounted device system may receive the voice control instruction when the target interface is displayed. It will be appreciated that the target interface may be an interface of a third party APP, a system desktop interface, an interface of a system APP, or the like.

Step S120, in response to the voice control command, obtaining an automatic voice recognition result of the voice control command.

In an embodiment of the present disclosure, after receiving the voice control instruction, an automatic voice recognition result of the voice control instruction may be acquired. For example, the Speech control instruction may be subjected to Speech Recognition by a Speech Recognition engine, for example, the Speech Recognition engine may perform Speech Recognition on the Speech control instruction by using an ASR (Automatic Speech Recognition) technology to obtain an ASR result of specific Speech Recognition of the Speech control instruction, that is, an Automatic Speech Recognition result.

Step S130, determining a text content corresponding to the automatic speech recognition result in a preset lexicon.

The preset word stock can be obtained by generalizing control information corresponding to each function control of each interface in advance, the interfaces are multiple and comprise target interfaces, and the control information comprises text description information of each function control. The interfaces may include an interface of any third party APP, a desktop interface, an interface of a system APP, and the like, and the multiple interfaces may include all interfaces of all third party APPs in the in-vehicle machine system, a desktop interface of the in-vehicle machine system, and all interfaces of all system APPs, for example. The textual description information may be used to indicate a primary function of a functionality control, for example, the textual description information may be "video" assuming that the primary function of the functionality control is playing video.

In the embodiment of the disclosure, after the automatic speech recognition result of the speech control instruction is obtained, the text content corresponding to the automatic speech recognition result may be determined in the preset lexicon. For example, it may be determined whether the automatic speech recognition result hits a corresponding query result in the preset lexicon, and if the automatic speech recognition result hits the corresponding query result in the preset lexicon, the hit query result may be determined as a text content corresponding to the automatic speech recognition result, that is, a text content corresponding to the automatic speech recognition result in the preset lexicon. It can be understood that, if the automatic Speech recognition result does not hit the query result corresponding To the preset lexicon, a Speech TTS (Text To Speech, from Text To Speech) prompt may be broadcasted, for example, "i have not heard yet and please speak again, or other TTS prompts that are not supported by the current Speech control instruction may be broadcasted. Therefore, the preset word stock is obtained by generalization processing in advance based on the control information corresponding to each function control of each interface including the target interface, so that the automatic recognition result of the voice control instruction of the user aiming at different function controls of different interfaces can be better covered.

Step S140, determining a target function control corresponding to the text content.

In the embodiment of the present disclosure, after determining the text content corresponding to the automatic speech recognition result in the preset lexicon, the function control corresponding to the text content corresponding to the automatic speech recognition result, that is, the target function control, may be determined. Illustratively, a view node corresponding to the text content may be obtained according to a keyword pointed by a hit query result (i.e., the text content), and the view node information may actually be a specific function control, which is a target function control. It is understood that the correspondence between the keywords of the text content and the view nodes may be preset.

And step S150, executing a voice control instruction based on the target function control.

In an embodiment of the present disclosure, after determining the target function control corresponding to the text content, the voice control instruction input by the user may be executed based on the target function control corresponding to the text content. For example, a click command simulating a user clicking the target functionality control may be input to the target functionality control, so that the target functionality control may execute the user's voice control command.

In the embodiment of the disclosure, a voice control instruction is received under the condition that a target interface is displayed; responding to the voice control instruction, and acquiring an automatic voice recognition result of the voice control instruction; and determining text contents corresponding to the automatic voice recognition result in a preset word stock, wherein the preset word stock is obtained by carrying out generalized processing on control information corresponding to each functional control of each interface, the interfaces are multiple and comprise target interfaces, and the control information comprises character description information of each functional control. And then, determining a target function control corresponding to the text content, and executing the voice control instruction based on the target function control. In this way, the text content corresponding to the automatic voice recognition result of the determined voice control instruction in the preset word stock is obtained through generalization processing of the control information of each function control based on each interface, and the target function control corresponding to the text content is determined, so that the voice control of the target function control can be realized. The voice control of the function control can be realized without calling a function realization interface, so that developers of the APPs do not need to register the function control to the voice engine in advance and perform operations such as association of the function control and the function interface, the labor consumption can be effectively reduced, the development efficiency is improved, the voice control efficiency can be improved, the user experience is improved, and the voice control method with low labor consumption, high development efficiency and high control efficiency can be provided.

In a possible implementation manner, before the step S110 is executed, that is, before the voice control method of the foregoing embodiment is executed, the following processing may also be executed:

acquiring at least one function control in each interface;

acquiring control information corresponding to each functional control in each interface;

and generating a preset word bank based on the text description information of each functional control according to a preset generalization rule.

The control information may include control information of a root function control corresponding to the function control, and control information of at least one sub-function control of the root function control information.

In the embodiment of the present disclosure, before the method embodiment is executed, control information of each function control of each interface may be obtained, and a preset lexicon is generated. For example, after the voice system of the client of the in-vehicle system is started, the barrier-free service of the client may be started, for example, the barrier-free service may be automatically started, or a prompt message may be sent to the client of the in-vehicle system, and the barrier-free service is started after the user agrees. The start-up barrier-free Service may be, for example, a VoiceAccessibilityService (voice Access Service) barrier-free Service that is inherited from an Access availability Service implementation. Therefore, after the barrier-free service is started, the vehicle-mounted computer system can capture all the functional controls and the corresponding controls of the currently displayed interface through the client. After the unobstructed service is started, a plurality of function controls in each interface can be obtained. For example, for a certain currently displayed interface (hereinafter, referred to as a current interface for short), a picture switching Event of the current interface may be monitored through a method of on accessibility Event (), and a Root view node, i.e., a Root function control, of each function control In the current interface may be obtained through a method of get Root In Active Window () of the current interface. It will be appreciated that there will typically be one or more functionality controls in each interface; the foregoing processing may be executed after each screen switching event is monitored, so as to obtain at least one function control in each interface. For example, when there is one function control in the interface, the one function control is obtained, and when there are 8 function controls in the interface, 8 function controls in the interface are obtained, so as to provide a more comprehensive and complete data basis for subsequently obtaining control information corresponding to each function control.

Then, control information corresponding to each function control in each interface may be obtained, and the control information corresponding to each function control in each interface may be stored, where the control information may at least include text description information of the function control. For example, a root function control of each interface may be obtained, and after a screen switching event is monitored, the root function control of each function control in the interface displayed after the screen switching event is obtained. Meanwhile, considering that a relationship between a parent Node and a child Node exists among a plurality of function controls, which are usually displayed through a tree structure, after a Root function control of each interface is obtained, all child function controls (which may also be a Root Node) included In the Root function control (may also be child nodes) may be obtained In a recursive traversal manner, and control information corresponding to each function control may be obtained, that is, control information corresponding to each Root function control and each child function control may be obtained, for example, control information corresponding to each Root function control and each child function control, that is, access capability Node Info (Access Node information) may be obtained by a get Root In Active Window () method. It can be understood that, in the process of acquiring the control information corresponding to each function control in each interface, the control information corresponding to the one or more function controls of one interface may be acquired each time after acquiring one or more function controls of the one interface; or, the control information corresponding to each of the function controls may be obtained after all the function controls of one or more interfaces are obtained; or, the control information corresponding to each function control may be acquired after all the function controls of all the interfaces are acquired. Therefore, a more comprehensive and complete data base can be provided for the subsequent generation of the preset word bank.

And then, generating a preset word stock based on the text description information of each functional control according to a preset generalization rule. Illustratively, the text description information in the acquired control information of each functional control may be generalized according to a preset generalization rule, so as to complete generalization of the key text (i.e., the text description information) of each functional control, and generate a generalized and fuzzy word bank of the key text, that is, the preset word bank. It can be understood that the specific execution of this embodiment may be performed in advance before the voice control method of the previous embodiment is executed for the first time, or may be performed when a certain interface is opened for the first time. Therefore, a generalized fuzzy preset word stock can be generated in advance based on the control information corresponding to each functional control in each interface, so that on one hand, a data basis can be provided for the execution of a subsequent voice control method, and the voice control efficiency is improved; on the other hand, the generalized preset word stock can be generated to improve the success rate of the automatic voice recognition result in hitting the text content in the preset word stock, so that the execution rate of the voice control instruction can be further improved, and the user experience is further improved.

In a further possible implementation manner, a specific implementation manner of generating the preset lexicon based on the text description information of each functional control according to the preset generalization rule in the above step may be as follows:

and performing generalization processing on the text description information of each functional control based on the first preset word slot and the second preset word slot to obtain a preset word bank.

The first preset word slot is used for indicating specific execution actions of the text description information of the function control, and the second preset word slot is used for indicating specific execution contents of the text description information of the function control.

In the embodiment of the disclosure, when the preset lexicon is generated based on the text description information of each function control according to the preset generalization rule, the generalization rule may be preset first, and the preset generalization rule may be, for example, generalized based on the first preset word slot and the second preset word slot. Illustratively, a first preset word slot and a second preset word slot may be obtained. The first preset word slot may be a preposed word slot of the text description information of the function control, and the content of the first preset word slot may be used to indicate a specific execution action on the text description information of the function control, such as "general opening", which may include "i want to watch", "open", "play", and the like, for example; the second preset word slot may be a post word slot of the text description information of the function control, and the content of the second preset word slot may be used to indicate specific execution content of the text description information of the function control, and may be, for example, "content", "video", and the like. Therefore, the character description information of each functional control is generalized by setting the preposed word slot and the postposition word slot for the character description information of each functional control, can be freely combined into semantic queries of various different types, and can better complete generalization of key characters (namely the character description information), so that the success rate of hitting text contents in a preset word bank by an automatic voice recognition result can be further improved, and the execution rate of a voice control instruction and the user experience are improved.

In one possible implementation, the control information may further include information indicating whether the functionality control is operable. The information indicating whether the function control is operable may be, for example, whether the current function control is clickable. Therefore, a more complete data basis can be provided for the subsequent determination of the target function control. Correspondingly, at this time, in the above step, the specific implementation manner of determining the target function control corresponding to the text content may be:

determining a first function control corresponding to the text content;

acquiring first control information corresponding to a first function control;

determining whether the first function control is operable according to the first control information;

and in response to the first function control being operable, determining the first function control as a target function control corresponding to the text content.

In the embodiment of the disclosure, in consideration of the fact that the function control corresponding to the text content may be inoperable, for example, may not be clicked, the function control corresponding to the text content and operable may be determined as the target function control. For example, when determining the target function control corresponding to the text content, the function control corresponding to the text content, that is, the first function control, may be determined first. Then, control information corresponding to the first function control, that is, first control information, may be obtained, where the first control information at least includes information for indicating whether the function control is operable, and then it is determined whether the first function control is operable according to the first control information. If the first control information indicates that the first function control is operable, the first function control may be considered to be capable of executing the voice control instruction, and at this time, in response to the first function control being operable, the first function control may be determined as a target function control corresponding to the text content. Therefore, the determined target function control can execute the voice control instruction, and the condition that the voice control instruction is executed and is not failed due to the fact that the target function control is not operable is avoided, so that the execution rate of the voice control instruction can be further improved, and the user experience is improved.

In a further possible implementation manner, the embodiment of the present disclosure may further include the following processing:

in response to the first functionality control being inoperable, determining whether a first parent functionality control of the first functionality control is operable;

and in response to the first parent functionality control being operable, determining the first parent functionality control as a target functionality control corresponding to the text content.

The parent function control can be a function control at a level above a first function control in the tree structure of the function controls; in the tree structure of the function controls, except for the root function control, each function control has a father function control, wherein the root function control is the top function control in the tree structure of the function controls. It can be understood that, in the tree structure of the function controls, for any function control, the function control at the next level may be regarded as a child function control of any function control, and any function control may be regarded as a parent function control of the child function control; in the tree structure of the function controls, the function controls with a lower level than the root function control can be considered as child function controls of the root function control.

In embodiments of the present disclosure, it is contemplated that the first functionality control may be inoperable, while the functionality of the parent functionality control of the first functionality control is generally proximate to the first functionality control. Therefore, if the first control information indicates that the first function control is inoperable, it may be considered that the first function control may not execute the voice control instruction, and at this time, in response to the first function control being inoperable, a parent function control of the first function control, that is, the first parent function control, may be determined. Then, control information of the first parent functionality control may be obtained, and similarly, the control information of the first parent functionality control may also include information for indicating whether the functionality control is operable, and then whether the first parent functionality control is operable is determined according to the control information of the first parent functionality control. If the control information of the first parent functional control indicates that the first parent functional control is operable, the first parent functional control may be considered to be capable of executing the above-mentioned voice control instruction, and at this time, in response to the first parent functional control being operable, the first parent functional control may be determined to be a target functional control corresponding to the text content, so as to execute the voice control instruction based on the first parent functional control. Therefore, under the condition that the first function control is not operable, the voice control instruction can be executed through the parent function control of the operable first function control, and therefore the execution rate and the success rate of the voice control instruction can be better improved.

In a further possible implementation, the embodiment of the present disclosure may further include the following processing:

in response to the first parent functionality control being inoperable, determining whether a second parent functionality control of the first parent functionality control is operable;

in response to the second parent function control being operable, determining the second parent function control as a target function control corresponding to the text content;

in response to the fact that the second parent function control is inoperable, determining whether the operable parent function control exists in the parent function control of the second parent function control in a traversing mode; the preset traversal mode comprises the steps that parent function controls of all the function controls are sequentially traversed from small to large according to the tree structures of the function controls until the parent function controls are traversed to the root function control;

and in response to the operable parent functional control existing in the parent functional control of the second parent functional control, determining the smallest operable parent functional control in the parent functional controls of the second parent functional control as a target functional control corresponding to the text content.

In embodiments of the present disclosure, the functionality of the parent functionality control in the same branch as the first parent functionality control is typically relatively close to the functionality of the first parent functionality control and the first functionality control, taking into account that the first parent functionality control may also be inoperable. Thus, in response to the first parent functionality control being inoperable, it may be determined whether the parent functionality control of the first parent functionality control, i.e., the second parent functionality control, is operable. It will be appreciated that the method of determining whether the second parent functionality control is operable is similar to the method of determining whether the first functionality control is operable described above and will not be described in detail herein. In response to the second parent functionality control being operable, the second parent functionality control may be determined as a target functionality control corresponding to the text content to execute the voice control instruction based on the second parent functionality control. Otherwise, in response to that the second parent functional control is inoperable, determining whether the operable parent functional control exists in the parent functional control of the second parent functional control in a traversal manner according to a preset traversal manner.

Illustratively, the parent functionality control of each functionality control may be traversed from small to large in sequence according to the tree structure of the functionality controls until the root functionality control is traversed. For example, if the first function control is not clickable, a first parent function control of the first function control is obtained, and whether the first parent function control is clickable is determined. If the first father function control can be clicked, executing the clicking action of the first father function control; if the first father function control does not support clicking, traversing a second father function control of the first father function control in a recursive mode, executing clicking behavior if the second father function control supports clicking, and judging whether the father function control of the second father function control can be clicked if the second father function control does not support clicking; and if the parent function control of the second parent function control supports clicking, executing clicking behavior, and if the parent function control does not support clicking, repeatedly executing returning operation until traversing to the maximum root function control at the top layer. And if the operable parent functional control exists in the parent functional control of the second parent functional control, determining the smallest operable parent functional control in the parent functional controls of the second parent functional control as the target functional control corresponding to the text content in response to the operable parent functional control existing in the parent functional control of the second parent functional control. It will be appreciated that it may be determined in turn whether the parent functionality control of each layer is operable, the traversal may be terminated when an operable parent functionality control is determined, and the parent functionality control may be determined to be the smallest and operable target functionality control. Therefore, under the condition that the first father function control is not operable, the smallest and operable father function control in the tree structure can be determined as the target function control to execute the voice control command, so that the execution rate and the success rate of the voice control command can be further improved, and the user experience is improved.

In a further possible embodiment, the control information may further include location information of the functionality control. For example, the position information of the functionality control can be the area information of the functionality control in the interface. Therefore, sufficient data basis can be provided for determining the target function control according to the position information of the function control subsequently.

The embodiment of the present disclosure may further include the following processing:

in response to the parent functional control of the second parent functional control not having an operable parent functional control, obtaining position information of the first functional control;

and determining the function control closest to the position information of the first function control in the target interface as the target function control corresponding to the text content.

In the embodiment of the present disclosure, if the parent functionality control of the second parent functionality control does not have an operable parent functionality control, that is, all functionality controls that are larger than the first functionality control and are in the same branch as the first functionality control are inoperable, in response to the parent functionality control of the second parent functionality control not having an operable parent functionality control, the location information of the first functionality control, that is, the location information of the first functionality control in the panel interface, may be obtained. Then, a function control closest to the position information of the first function control in the target interface may be determined, and the function control closest to the position information of the first function control may be determined as a target function control corresponding to the text content. In this way, since functions of function controls closer to each other are more likely to be closer to each other in the same interface, under the condition that the function controls are larger than the first function control and all the function controls in the same branch with the first function control are inoperable, the function control closest to the position information of the first function control in the target interface is determined as the target function control corresponding to the text content to execute the voice control instruction, so that the execution success rate of the voice control instruction can be further improved, and the user experience is further improved.

In a further possible implementation manner, in the foregoing step, the specific implementation manner of the function control determined as the target function control corresponding to the text content, where the function control is closest to the position information of the first function control in the target interface, may be as follows:

determining a second function control which is closest to the position information of the first function control in the preset direction of the first function control in the target interface;

acquiring control information corresponding to each second function control;

determining an operable third function control in the second function controls according to the information which is used for indicating whether the function controls are operable in the control information corresponding to each second function control;

and in response to that the number of the third function controls is 1, determining the third function control as a target function control corresponding to the text content.

Wherein the preset direction is at least one. For example, up, down, left, right, etc. directions of the position information of the first functionality control may be used. Accordingly, the second functionality control may also be at least one.

In the embodiment of the disclosure, when the function control closest to the position information of the first function control in the target interface is determined as the target function control corresponding to the text content, a function control closest to the position information of the first function control in the preset direction of the first function control in the target interface, that is, a second function control, may be determined. Taking the preset directions including up, down, left, and right as examples, it may be determined that the second function control 1 is above the position information of the first function control in the target interface and has a closest distance to the position information of the first function control, the second function control 2 is below the position information of the first function control in the target interface and has a closest distance to the position information of the first function control, the second function control 3 is left of the position information of the first function control in the target interface and has a closest distance to the position information of the first function control, and the second function control 4 is right of the position information of the first function control in the target interface and has a closest distance to the position information of the first function control. It is to be understood that the distance between the preset direction of the first function control and the position information of the first function control may be determined according to the position information of the two function controls. The calculation formula can be shown as formula 1.

Wherein d represents the distance between the two function controls, A, B, C is a preset parameter, the specific numerical value can be set as required, and x is ₀ 、y ₀ Coordinates of the second functionality control.

Then, control information corresponding to each second function control may be obtained, for example, at least information indicating whether the function control is operable in the control information of each second function control may be obtained, and according to information indicating whether the function control is operable in the control information corresponding to each second function control, a function control operable in the second function control, that is, a third function control, is determined. It is contemplated that the second functionality control may be one or more, and thus, the third functionality control may be one or more. If the number of the third function controls is 1, the third function control may be determined as a target function control corresponding to the text content, so as to execute the voice control instruction based on the target function control. Therefore, the execution rate and the success rate of the voice control instruction can be further improved, and the user experience is further improved.

In a further possible implementation manner, the method provided by the embodiment of the present disclosure may further include the following processing:

and in response to the fact that the number of the third function controls is larger than 1, determining the third function control with the shortest distance to the position information of the first function control in the third function controls as a target function control corresponding to the text content.

In this embodiment of the disclosure, when the number of the third function controls is greater than 1, a third function control closest to the position information of the first function control in the plurality of third function controls may be determined, and then the third function control closest to the position information of the first function control is determined as the target function control corresponding to the text content. Therefore, under the condition that the number of the third function controls is multiple, the third function control with the shortest distance to the position information of the first function control is determined as the target function control, and the function control with the shorter distance to the first function control in the interface is closer to the function of the first function control, so that the third function control with the shortest distance to the position information of the first function control is determined as the target function control and executes the voice control instruction, the matching degree of the execution result and the voice control instruction can be better improved, and the user experience can be further improved.

In order to make the voice control method provided by the embodiment of the present disclosure clearer, the following description is made with reference to fig. 2 and 3.

As shown in fig. 2, a voice control method provided by the embodiment of the present disclosure may include the following steps:

step 1, after a Voice system of a client is started, applying for a system to open the barrier-free Service right, and after a user agrees to open the barrier-free Service right, starting Voice Access barrier Service barrier-free Service realized by inheriting the Access barrier Service, so that all function controls of an interface can be captured.

And 2, after the barrier-free service is started, monitoring a picture switching Event by an on Access opportunity Event () method, and acquiring an Access opportunity Node Info of a Root function control of an interface by a get Root In Active Window () method.

Step 3, after the root function control of the interface is obtained, all the sub-function controls contained in the root function control can be obtained in a recursive traversal mode due to the adoption of the tree structure;

step 4, acquiring and storing the character description information, the position information and the control information of whether the functional control can be clicked;

and 5, generalizing and blurring all the acquired character description information to obtain a generalized and blurred preset word bank.

The specific generalization strategy can be as follows: if the current function control has the text description information with the 'three kingdoms' meaning, the generalization strategy can be processed in a mode of a word slot 1+ key words (text description information) + a word slot 2, the word slot 1 can be opened universally and comprises 'i want to see', 'open', 'play', and the like, and the word slot 2 can be 'content', 'video', and the like, and can be freely combined into semantic queries of various types through the mode, so that the generalization of the key words is completed, and a generalized and fuzzy word bank of the key words is generated.

Step 6, when a user inputs a voice control instruction, performing voice recognition through a voice recognition engine, for example, performing voice recognition through NLP (Natural Language Processing), obtaining an ASR result of the voice control instruction, judging whether the ASR result hits a corresponding query in a generalized word bank, and if so, obtaining a first function control corresponding to text content according to a keyword pointed by the hit query; if the voice is not hit, the voice tts prompt is broadcast, namely, I still can not hear the voice and please speak again, and certainly, the relevant tts prompt which is not supported by the current voice command can also be broadcast.

Step 7, after the first function control is obtained, judging whether the first function control can be clicked, if so, determining the first function control as a target function control, and directly executing a click behavior to execute a voice control instruction; and if the first function control cannot be clicked, acquiring father node information of the first function control, and judging whether the father function control of the first function control can be clicked. If the father function control can be clicked, executing the clicking behavior of the father function control; if the parent functional control does not support clicking, traversing the parent functional control of the parent functional control in a recursive manner, executing a clicking behavior if the parent functional control supports clicking, and if the parent functional control does not support clicking, repeatedly executing recursive operation until traversing to the parent functional control at the top layer, if the parent functional control does not support clicking, searching for the clickable functional control closest to the first functional control, which may be specifically shown in fig. 2.

The specific implementation can be as follows: and searching a second function control nearby the first function control in four preset directions of up, down, left and right by the acquired first function control, determining a third function control which has the shortest coordinate distance from the center point of the first function control and can be clicked, and executing the clicking behavior of the third function control so as to execute the voice control instruction. The distance between the two function controls is calculated in the following way: determining the closest linear distance from the center point of the first function control to the function control in the preset direction according to different retrieval directions (namely the preset directions), for example, when upwards retrieving, taking the lower edge of the retrieved function control as a straight line, and calculating the shortest distance through a point-to-straight line distance calculation formula; and by analogy, if the search is performed downwards, the opposite side in the opposite direction of the searched function control, namely the lower side of the function control, is taken, and then the shortest distance is calculated through a point-to-straight line distance calculation formula, wherein the point-to-straight line calculation formula can refer to formula 1.

Based on the same inventive concept, the embodiment of the disclosure also provides a voice control device. As shown in fig. 4, the voice control apparatus 400 may include:

an instruction receiving module 410, configured to receive a voice control instruction when a target interface is displayed;

a first obtaining module 420, configured to, in response to a voice control instruction, obtain an automatic voice recognition result of the voice control instruction;

a first determining module 430, configured to determine, in a preset lexicon, text content corresponding to the automatic speech recognition result; the preset word stock is obtained by generalizing control information corresponding to each functional control of each interface, the interfaces are multiple and comprise the target interface, and the control information comprises the text description information of each functional control;

a second determining module 440, configured to determine a target function control corresponding to the text content;

the executing module 450 is configured to execute the voice control instruction based on the target function control.

In one possible embodiment, the method further comprises:

the second acquisition module is used for acquiring at least one function control in each interface;

the third obtaining module is used for obtaining control information corresponding to each functional control in each interface; the control information comprises control information of a root function control corresponding to a function control and control information of at least one sub-function control of the root function control information;

and the word stock generating module is used for generating the preset word stock based on the text description information of each functional control according to a preset generalization rule.

In a possible implementation manner, the thesaurus generation module is specifically configured to:

generalizing the text description information of each functional control based on a first preset word slot and a second preset word slot to obtain a preset word library; the first preset word slot is used for indicating specific execution actions on the text description information of the function control, and the second preset word slot is used for indicating specific execution contents on the text description information of the function control.

In one possible implementation, the control information further includes information indicating whether the functionality control is operable.

In a possible implementation manner, the second determining module 440 includes:

the first determining unit is used for determining a first function control corresponding to the text content;

the first obtaining unit is used for obtaining first control information corresponding to the first function control;

a second determining unit, configured to determine whether the first function control is operable according to the first control information;

and a third determining unit, configured to determine, in response to that the first function control is operable, that the first function control is a target function control corresponding to the text content.

In one possible embodiment, the method further comprises:

a third determining module to determine whether a first parent functionality control of the first functionality control is operable in response to the first functionality control being inoperable;

and a fourth determining module, configured to determine, in response to the first parent functionality control being operable, the first parent functionality control as a target functionality control corresponding to the text content.

In one possible embodiment, the method further comprises:

a fifth determining module to determine, in response to the first parent functionality control being inoperable, whether a second parent functionality control of the first parent functionality control is operable;

a sixth determining module, configured to determine, in response to that the second parent functionality control is operable, the second parent functionality control as a target functionality control corresponding to the text content;

a seventh determining module, configured to, in response to that the second parent functional control is inoperable, traverse, according to a preset traversal manner, to determine whether an operable parent functional control exists in the parent functional control of the second parent functional control; the preset traversal mode comprises that the parent functional control of each functional control is sequentially traversed from small to large according to the tree structure of the functional control until the parent functional control is traversed to the root functional control;

an eighth determining module, configured to determine, in response to that there is an operable parent functionality control in the parent functionality control of the second parent functionality control, a smallest and operable parent functionality control in the parent functionality controls of the second parent functionality control as a target functionality control corresponding to the text content.

In one possible implementation, the control information further includes location information of the functionality control.

In one possible embodiment, the method further comprises:

a fourth obtaining module, configured to obtain, in response to that there is no operable parent function control of the second parent function control, location information of the first function control;

and the ninth determining module is configured to determine, as the target function control corresponding to the text content, the function control closest to the position information of the first function control in the target interface.

In a possible implementation manner, wherein the ninth determining module includes:

a fourth determining unit, configured to determine a second function control in the target interface, where the distance between the preset direction of the first function control and the position information of the first function control is the closest; wherein the preset direction is at least one;

the second obtaining unit is used for obtaining control information corresponding to each second function control;

a fifth determining unit, configured to determine, according to information that indicates whether a function control is operable in the control information corresponding to each second function control, a third function control that is operable in the second function control;

a sixth determining unit, configured to determine, in response to that the number of the third function controls is 1, the third function control as a target function control corresponding to the text content.

In one possible implementation, the method further includes:

and a tenth determining module, configured to determine, in response to that the number of the third function controls is greater than 1, a third function control, which is closest to the position information of the first function control, in the third function controls as a target function control corresponding to the text content.

The specific implementation manner and technical effect of each module in this embodiment are similar to those of the method embodiment described above, and are not described herein again.

The present disclosure also provides an electronic device, a computer readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the device 500 comprises a computing unit 501 which may perform various suitable actions and processes in accordance with a computer program stored in a read-only memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The computing unit 501, the ROM 502, and the RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, computing units running various machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 501 executes the respective methods and processes described above, such as the voice control method. For example, in some embodiments, the voice control method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the speech control method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the voice control method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of voice control, comprising:

determining a target function control corresponding to the text content;

2. The method of claim 1, before receiving the voice control instruction in the case that the target interface is displayed, further comprising:

acquiring at least one function control in each interface;

acquiring control information corresponding to each functional control in each interface; the control information comprises control information of a root function control corresponding to a function control and control information of at least one sub-function control of the root function control information;

and generating the preset word bank based on the text description information of each functional control according to a preset generalization rule.

3. The method of claim 2, wherein the generating the preset lexicon based on the textual description information of each of the functional controls according to a preset generalization rule comprises:

4. The method of claim 2, wherein the control information further comprises information indicating whether the functionality control is operable.

5. The method of claim 4, wherein the determining a target functionality control corresponding to the textual content comprises:

determining a first function control corresponding to the text content;

acquiring first control information corresponding to the first function control;

and responding to the operability of the first function control, and determining the first function control as a target function control corresponding to the text content.

6. The method of claim 5, further comprising:

7. The method of claim 6, further comprising:

in response to the second parent functionality control being operable, determining the second parent functionality control as a target functionality control corresponding to the textual content;

in response to the fact that the second father function control is inoperable, determining whether the operable father function control exists in the father function control of the second father function control in a traversing mode; the preset traversal mode comprises that the parent functional control of each functional control is sequentially traversed from small to large according to the tree structure of the functional controls until the parent functional control is traversed to the root functional control;

and in response to the operable parent functional control existing in the parent functional control of the second parent functional control, determining the smallest operable parent functional control in the parent functional controls of the second parent functional control as the target functional control corresponding to the text content.

8. The method of claim 7, the control information further comprising location information of the functionality control.

9. The method of claim 8, further comprising:

responding to the parent functional control of the second parent functional control without operable parent functional control, and acquiring the position information of the first functional control;

and determining the function control which is closest to the position information of the first function control in the target interface as the target function control corresponding to the text content.

10. The method of claim 9, wherein the determining a functionality control in the target interface that is closest to the position information of the first functionality control as a target functionality control corresponding to the text content comprises:

determining a second function control which is closest to the position information of the first function control in the preset direction of the first function control in the target interface; wherein the preset direction is at least one;

acquiring control information corresponding to each second function control;

and in response to that the number of the third function controls is 1, determining the third function controls as target function controls corresponding to the text content.

11. The method of claim 10, further comprising:

and in response to that the number of the third function controls is larger than 1, determining the third function control which is closest to the position information of the first function control in the third function controls as a target function control corresponding to the text content.

12. A voice control apparatus comprising:

13. The apparatus of claim 12, further comprising:

14. The apparatus of claim 13, wherein the thesaurus generation module is specifically configured to:

15. The apparatus of claim 13, wherein the control information further comprises information indicating whether the functionality control is operable.

16. The apparatus of claim 15, wherein the second determining means comprises:

17. The apparatus of claim 16, further comprising:

18. The apparatus of claim 17, further comprising:

a fifth determining module, configured to determine, in response to the first parent functionality control being inoperable, whether a second parent functionality control of the first parent functionality control is operable;

a seventh determining module, configured to, in response to that the second parent functional control is inoperable, traverse, according to a preset traversal manner, to determine whether an operable parent functional control exists in the parent functional control of the second parent functional control; the preset traversal mode comprises that the parent functional control of each functional control is sequentially traversed from small to large according to the tree structure of the functional controls until the parent functional control is traversed to the root functional control;

and an eighth determining module, configured to determine, in response to an operable parent functionality control existing in the parent functionality control of the second parent functionality control, a smallest operable parent functionality control in the parent functionality controls of the second parent functionality control as the target functionality control corresponding to the text content.

19. The apparatus of claim 18, the control information further comprising location information of the functionality control.

20. The apparatus of claim 19, further comprising:

and a ninth determining module, configured to determine, as the target function control corresponding to the text content, the function control closest to the position information of the first function control in the target interface.

21. The apparatus of claim 20, wherein the ninth determining means comprises:

a fourth determining unit, configured to determine a second function control in the target interface, where the second function control is closest to the position information of the first function control in the preset direction of the first function control; wherein the preset direction is at least one;

a fifth determining unit, configured to determine, according to information that indicates whether a function control is operable in the control information corresponding to each second function control, a third function control that is operable in the second function controls;

22. The apparatus of claim 21, further comprising:

23. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.

24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-11.

25. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps in the method according to any one of claims 1-11.