CN111145747A

CN111145747A - Voice control implementation method and device

Info

Publication number: CN111145747A
Application number: CN201911391721.9A
Authority: CN
Inventors: 佟广力; 赵江; 秦国梁; 沈海寅
Original assignee: Zhicheauto Technology Beijing Co ltd
Current assignee: Zhicheauto Technology Beijing Co ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-05-12

Abstract

The embodiment of the invention discloses a method and a device for realizing voice control, wherein the method comprises the following steps: binding a display interface window View in a display page with a supported voice instruction to generate a configuration file; performing character recognition and semantic analysis on the acquired user voice input data to obtain voice instruction data; and calling a display interface window View corresponding to the configuration file matched with the voice instruction data, and responding to the voice input data of the user. The invention provides a set of structured configuration files for binding the View of the display interface window and the voice command by constructing a set of corresponding schemes for controlling the central control of the automobile by voice, receiving the voice input of a user and carrying out semantic analysis by utilizing the input of an AI technology. Under the condition that driving safety is not influenced, the function of the central control screen can be controlled by a driver in a voice mode, and then control over the automobile can be realized, and guarantee is provided for driving safety.

Description

Voice control implementation method and device

Technical Field

The invention relates to the technical field of intelligent automobiles, in particular to a method and a device for realizing voice control.

Background

The development of computer technology and scientific and technological achievements have been integrated into the aspects of people's life, and bring a lot of convenience to people's life. Along with the popularization of vehicle-mounted intelligent central control, people often have the demand of controlling the central control in the driving process. For example: it is possible that the navigation function is activated during driving. However, driving a car is a highly demanding activity with respect to safety and attention. How to realize the control of the central control screen under the condition of not distracting the attention of the driver becomes a problem which needs to be solved urgently in the industry.

Disclosure of Invention

The embodiment of the invention aims to solve the technical problem that: the method and the device for realizing the voice control solve the potential safety hazard problems of inconvenient vehicle control and the like in the prior art.

According to an aspect of the present invention, there is provided a method for implementing voice control, the method including:

binding a display interface window View in a display page with a supported voice instruction to generate a configuration file;

performing character recognition and semantic analysis on the acquired user voice input data to obtain voice instruction data;

and calling a display interface window View corresponding to the configuration file matched with the voice instruction data, and responding to the voice input data of the user.

Preferably, the binding of the View of the display interface window in the display page and the supported voice command into the configuration file includes:

and generating the configuration files corresponding to the display interface window View and the supported voice commands one by one according to the voice commands supported by the unique identification View ID binding of the display interface window View.

Preferably, the method further comprises:

binding the configuration file with a current display interface unit;

the current display interface includes: a user interface UI component provided by the system or a self-defined display unit realizing the IVisiblePageContainer interface of the pageable container;

the user interface UI component provided by the system comprises: activity workflow, Fragment, or Dialog session.

Preferably, the method further comprises:

virtualizing a current display interface as a visible page VisiblePage, and setting whether the visible page VisiblePage is visible to a user according to the life cycle of the visible page VisiblePage;

and the visible page VisiblePage and the corresponding display interface window View are bound through the configuration file.

Preferably, the method further comprises:

generating a visible page management class VisiblePamanager according to the visible page VisiblePage and the corresponding configuration file;

the visible page management class visiblePagemanager includes a parameter indicating whether the visible page visiblePage is visible to the user.

Preferably, the method further comprises:

after receiving voice instruction data input by the user, traversing all visible pages VisiblePages in the visible page management class VisiblePagemanager;

when the visible page VisiblePage is visible to a user, traversing the display interface window View corresponding to the visible page VisiblePage;

when the View of the display interface window is visible to a user, acquiring a supported voice command according to the configuration file;

and comparing the voice command with the voice command data input by the user, and determining whether to call the display interface window View.

According to another aspect of the present invention, there is provided a speech control implementing apparatus, including:

the voice instruction binding unit is used for binding the View of the display interface window in the display page and the supported voice instruction into a configuration file;

the user voice input acquisition unit is used for carrying out character recognition and semantic analysis on the acquired user voice input data to obtain voice instruction data;

and the comparison calling unit is used for calling the display interface window View corresponding to the configuration file matched with the voice instruction data and responding to the voice input data of the user.

Preferably, the voice instruction binding unit is further configured to bind the supported voice instruction according to the unique identifier View ID of the display interface window View, and generate the configuration file in which the display interface window View and the supported voice instruction are in one-to-one correspondence.

Preferably, the voice instruction binding unit is further configured to bind the configuration file with a current display interface unit;

Preferably, the voice instruction binding unit is further configured to virtualize a current display interface as a visible page VisiblePage, and set whether the current display interface is visible to a user according to a life cycle of the visible page VisiblePage;

Preferably, the voice instruction binding unit is further configured to generate a visible page management class visiblemanager according to the visible page VisiblePage and the corresponding configuration file;

Preferably, the comparison calling unit is further configured to traverse all the visible pages visiblepages in the visible page management class visiblemanager;

According to another aspect of the present invention, there is provided an electronic apparatus including:

a memory for storing a computer program;

a processor for executing the computer program stored in the memory, and when the computer program is executed, implementing any of the methods described above.

According to another aspect of the invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the methods described above.

Based on the voice control implementation scheme provided by the embodiment of the invention, a set of corresponding scheme for controlling the automobile center by voice is established, the voice input of a user is received, the input of an AI (artificial intelligence) technology is utilized for semantic analysis, and a set of structured configuration files for binding the View of the display interface window and the voice instruction are provided. And binding the configuration file with the current display interface window through an annotation method, and automatically generating and refreshing the voice command. According to the scheme of the embodiment, the function of the center control screen can be controlled by the driver in a voice mode under the condition that driving safety is not affected, so that control over an automobile can be achieved, and guarantee is provided for driving safety.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The invention will be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

fig. 1 is a schematic flow chart of a voice control implementation method according to an embodiment of the present invention.

Fig. 2 is a flowchart of page management and voice command binding according to an embodiment of the present invention.

Fig. 3 is a flowchart of automatically generating voice commands according to an embodiment of the present invention.

Fig. 4 is a flowchart of a voice control transaction method according to an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of a speech control implementation apparatus according to an embodiment of the present invention.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

As shown in fig. 1, a schematic flow chart of a voice control implementation method provided in this embodiment is shown, wherein,

and 11, binding the View of the display interface window in the display page with the supported voice instruction to generate a configuration file.

In one embodiment of the invention, a configuration file in a JSON format is provided, which mainly binds the View displayed on an interface with the voice commands supported by the current View.

In an embodiment of the invention, the configuration files corresponding to the display interface window View and the supported voice commands one to one are generated according to the voice commands supported by the unique identification View ID binding of the display interface window View.

In one embodiment of the invention, the configuration file is bound with the display unit of the APP on the screen in an annotation mode. Binding the configuration file with a current display interface unit;

In an embodiment of the present invention, multiple views are usually bound to each display unit, and in this implementation scheme, if a View needs to support a voice-visible function, a unique identifier, i.e., View ID, needs to be configured, so as to facilitate binding of voice commands in a configuration file.

In an embodiment of the present invention, a display unit displayed on a screen in an APP is virtualized as VisiblePage, and relevant information of the display unit is stored, for example: object ID, whether to display, child View, configuration information, etc. And whether the current VisiblePage is visible to the user is set according to the life cycle of the display unit. Only visiblepages visible to the user can accept and respond to user voice commands.

Further, providing a Visiblepage management class, and when a real interface is created, calling an adding interface to put the current page into the class for management, and acquiring a voice instruction configuration file path of the page and performing analysis operation.

And step 12, performing character recognition and semantic analysis on the acquired user voice input data to obtain voice instruction data.

In one embodiment of the invention, the user only needs to make a voice input. The system acquires the voice input of the user, and the voice input data is obtained after the voice input is processed. For example, a microphone is used to receive user speech data and speech-to-text techniques are used to convert utterances spoken by the user into computer-recognizable text words. And performing semantic analysis on the text by using an AI technology, understanding the user intention and informing the currently displayed APP of the analysis result.

And step 13, calling a display interface window View corresponding to the configuration file matched with the voice instruction data, and responding to the voice input data of the user.

In one embodiment of the invention, when a user voice instruction is received, the management class traverses VisiblePage visible to the user. And if the View is visible to the user, traversing the View visible to the user in the ViewePage and acquiring the View ID, acquiring a voice command according to the acquired ID, acquiring the voice command supported by the View, and matching the voice command with the voice data of the user. And if the matching is successful, calling a View related method to execute corresponding operation, and if the matching is unsuccessful, searching the next View until the last View.

In one embodiment of the present invention, there is a more specific sub-View, ListView, in the display unit. The View presents information on the screen in a list form, and the information is usually refreshed in real time, so that the supported commands cannot be configured in a voice command configuration file form. The solution provides a method of automatically generating voice instructions. And providing a voice command generation interface, wherein the data structure bound with the ListView needs to realize the interface, and generating a voice command according to requirements and rules. When the current ListView slides and updates data, the scheme can actively acquire the item which is visible to the user at present and acquire the voice command corresponding to the item.

In one embodiment of the invention, the implementation can encapsulate the package into a common jar package (a java code sharing package) for use by the APP developer. The developer needs to configure the ID of the View in the using process, writes a voice instruction configuration file of a corresponding page, manages the display unit by using the Visiblepage management class according to the life cycle of the display unit, receives a user input instruction of S1 and calls a method of the Visiblepage management class to match the instruction with the current display page. The method is simple to use, low in development cost and free of influence on the original service.

As shown in fig. 2, a flow chart of page management and voice command binding is provided for the present embodiment, wherein,

and step 21, when the APP is started, creating a page and displaying the page on a screen.

Step 22, after the page is created, an add method similar to visiblemanager is called, and the current page is put into the visiblemanager for management.

And step 23, determining whether the display unit is a display unit, if so, turning to step 25, otherwise, turning to step 24.

And 24, determining whether the current function is a self-defined display unit realizing the IVisiblePageContainer interface of the pageable container. If yes, go to step 25; otherwise, no response is made.

And step 25, virtualizing a display unit displayed on the screen in the APP as VisiblePage, and storing the relevant information of the display unit. When a page is placed in the VisiblePagemanager, the configuration of the current page is parsed and a VisiblePage object is created. And acquiring a voice instruction configuration file of the current page, analyzing the configuration file and binding the configuration file with the sub-View of the current page in a View ID mode.

And step 26, traversing the sub-View of the current page, and judging whether the sub-View is the View of the ListView type. If so, go to step 27, otherwise, do not respond.

Step 27, scrolling and data change event listening are registered for the View.

Fig. 3 is a flow chart of the automatic generation of voice commands in the present embodiment, in which,

step 31, when the user scrolls the page or refreshes the data, the relevant interface is called to notify VisiblePage.

In step 32, visiblePage acquires the item currently displayed by ListView.

And step 33, acquiring the data object bound by the item.

And step 34, judging whether the data object realizes an ISpeakOrderText interface, if not, not processing, and if so, turning to step 35.

And step 35, obtaining the command instruction supported by the item.

And step 36, after the command supported by the item is obtained, formatting the voice command and storing the voice command to the current VisiblePage.

Fig. 4 is a flow chart of a voice control transaction method according to another embodiment of the present invention, wherein,

and step 41, receiving the voice input of the user by using a microphone, and performing semantic analysis on the voice of the user by using an AI (artificial intelligence) technology.

And 42, informing the obtained semantic analysis result to the APP displayed on the upper layer.

Step 43, after receiving the semantic result, the APP traverses all visiblepages managed in visiblemanager.

Step 44, if the VisiblePage is not visible to the user, no processing is performed. If VisiblePage is visible to the user then go to step 45.

Step 45, if the VisiblePage is visible to the user, then go through its child views.

And step 46, judging whether the View is visible to the user, if not, not processing, and if so, turning to step 47.

Step 47, if the View is visible to the user, the View ID is obtained.

And step 48, if the View ID is null, the processing is not carried out, and if the View ID is not null, the step 49 is carried out.

And 49, acquiring a corresponding voice command according to the acquired View ID.

And step 50, if the voice command is null, not processing. If the voice command is not null, go to step 51.

And step 51, comparing with the semantic result obtained in the step 41, comparing the voice command with the semantic result obtained in the step 41, and if the voice command is not consistent with the semantic result obtained in the step 41, not processing. If they are the same, go to step 52.

And step 52, calling a method corresponding to the View, and responding to the user operation.

As shown in fig. 5, a speech control implementation apparatus provided in an embodiment of the present invention is provided, wherein,

the voice instruction binding unit 61 is used for binding the View of the display interface window in the display page and the supported voice instruction into a configuration file;

a user voice input acquisition unit 62, configured to perform character recognition and semantic analysis on the acquired user voice input data to obtain voice instruction data;

and the comparison calling unit 63 is configured to call a display interface window View corresponding to the configuration file matched with the voice instruction data, and respond to the user voice input data.

In an embodiment of the present invention, the voice instruction binding unit 61 is further configured to generate the configuration file in which the display interface window View and the supported voice instructions are in one-to-one correspondence according to the voice instruction supported by the unique identification View ID binding of the display interface window View.

In an embodiment of the present invention, the voice instruction binding unit 61 is further configured to bind the configuration file with a current display interface unit;

In an embodiment of the present invention, the voice instruction binding unit 61 is further configured to virtualize a current display interface as a visible page VisiblePage, and set whether the current display interface is visible to a user according to a life cycle of the visible page VisiblePage;

In an embodiment of the present invention, the voice instruction binding unit 61 is further configured to generate a visible page management class visiblemanager according to the visible page VisiblePage and the corresponding configuration file;

The comparison calling unit 63 is further configured to traverse all the visible pages visiblepages in the visible page management class visiblemanager;

An embodiment of the present invention further provides an electronic device, including:

a memory for storing a computer program;

a processor for executing the computer program stored in the memory, and when the computer program is executed, implementing the method of any of the above embodiments.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the method described in any of the above embodiments.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The method and apparatus of the present invention may be implemented in a number of ways. For example, the methods and apparatus of the present invention may be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustrative purposes only, and the steps of the method of the present invention are not limited to the order specifically described above unless specifically indicated otherwise. Furthermore, in some embodiments, the present invention may also be embodied as a program recorded in a recording medium, the program including machine-readable instructions for implementing a method according to the present invention. Thus, the present invention also covers a recording medium storing a program for executing the method according to the present invention.

The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method for implementing voice control, the method comprising:

binding a display interface window in a display page with a supported voice instruction to generate a configuration file;

and calling a display interface window corresponding to the configuration file matched with the voice instruction data, and responding to the voice input data of the user.

2. The method of claim 1, wherein binding the display interface window and supported voice instructions in the display page into a configuration file comprises:

and binding the supported voice commands according to the unique identification of the display interface window, and generating the configuration files in one-to-one correspondence between the display interface window and the supported voice commands.

3. The method of claim 1, wherein the method further comprises:

binding the configuration file with a current display interface unit;

the current display interface includes: the user interface component provided by the system or the user-defined display unit realizing the interface of the pageable container;

the user interface component provided by the system comprises: workflow, fragmentation or conversation.

4. The method of claim 3, wherein the method further comprises:

virtualizing a current display interface as a visible page, and setting whether the current display interface is visible to a user according to the life cycle of the visible page;

and the visible page and the corresponding display interface window are bound through the configuration file.

5. The method of claim 4, wherein the method further comprises:

generating a visible page management class according to the visible page and the corresponding configuration file;

the visible page management class includes a parameter of whether the visible page is visible to a user.

6. The method of claim 5, wherein the method further comprises:

after receiving voice instruction data input by the user, traversing all visible pages in the visible page management class;

when the visible page is visible to a user, traversing the display interface window corresponding to the visible page;

when the display interface window is visible to a user, acquiring a supported voice instruction according to the configuration file;

and comparing the voice command with the voice command data input by the user to determine whether to call the display interface window.

7. A speech control implementation apparatus, comprising:

the voice instruction binding unit is used for binding the display interface window in the display page and the supported voice instruction into a configuration file;

and the comparison calling unit is used for calling a display interface window corresponding to the configuration file matched with the voice instruction data and responding to the voice input data of the user.

8. The apparatus of claim 7, wherein the voice command binding unit is further configured to bind, according to the unique identifier of the display interface window, the supported voice command, and generate the configuration file in which the display interface window and the supported voice command are in one-to-one correspondence.

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing a computer program stored in the memory, and when executed, implementing the method of any of the preceding claims 1-6.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of the preceding claims 1 to 6.