CN112634888A

CN112634888A - Voice interaction method, server, voice interaction system and readable storage medium

Info

Publication number: CN112634888A
Application number: CN202011460470.8A
Authority: CN
Inventors: 易晖; 申众; 张又亮; 赵鹏; 史小凯; 翁志伟
Original assignee: Guangzhou Xiaopeng Motors Technology Co Ltd; Guangzhou Chengxingzhidong Automotive Technology Co., Ltd
Current assignee: Guangzhou Xiaopeng Motors Technology Co Ltd; Guangzhou Chengxingzhidong Automotive Technology Co., Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2021-04-09

Abstract

The invention discloses a voice interaction method, a server, a voice interaction system and a readable storage medium. The voice interaction method comprises the following steps: acquiring a voice instruction, wherein the voice instruction comprises a target element; performing semantic understanding on the voice instruction according to a preset knowledge map to generate semantic representation information; determining that the voice instruction is ambiguous when the number of the semantic representation information is multiple; controlling an interactive interface to display query results corresponding to all semantic representation information, and carrying out voice broadcast on the query results to obtain a voice identification instruction; determining target semantic representation information according to the acquired voice identification instruction; and controlling the interactive interface to jump to a target page according to the target semantic representation information, wherein the target page displays target elements. The voice interaction method realizes the function of complete voice interaction.

Description

Voice interaction method, server, voice interaction system and readable storage medium

Technical Field

The present invention relates to the field of intelligent voice recognition, and in particular, to a voice interaction method, a server, a voice interaction system, and a readable storage medium.

Background

In the related art, voice operation and control may be implemented by an intelligent voice system. However, problems such as insufficient dialog guidance, no support of cross-application jump, etc. generally exist, and in order to solve ambiguity conflicts, either direct access of only partial page elements can be supported, or ambiguous item confirmation needs to be completed in combination with GUI touch screen operation, and it is difficult to support the function of complete voice interaction.

Disclosure of Invention

The embodiment of the invention provides a voice interaction method, a server, a voice interaction system and a readable storage medium.

The embodiment of the invention provides a voice interaction method, which is used for a vehicle, wherein the vehicle is provided with an interaction interface, and the voice interaction method comprises the following steps:

acquiring a voice instruction, wherein the voice instruction comprises a target element;

performing semantic understanding on the voice command according to a preset knowledge graph to generate semantic representation information;

determining that the voice instruction is ambiguous if the number of the semantic representation information is multiple;

controlling the interactive interface to display a query result corresponding to all the semantic representation information, and carrying out voice broadcast on the query result to obtain a voice identification instruction;

determining target semantic representation information according to the acquired voice identification instruction;

and controlling the interactive interface to jump to a target page according to the target semantic representation information, wherein the target element is displayed on the target page.

According to the voice interaction method, under the condition that a plurality of semantic representation information are inquired according to the voice instruction, all the semantic representation information is prompted in a voice conversation mode, the target semantic representation information is confirmed, then the interaction interface can be controlled to directly jump to the corresponding page and the target element is displayed, manual operation is not needed in the whole process, and the function of complete voice interaction is achieved.

In some embodiments, the voice interaction method comprises:

determining an application page and generating element information corresponding to the page;

generating an element structure table according to the element information;

and establishing the knowledge graph according to the element structure table.

In some embodiments, the element structure table includes the element information, a category, a parent node, an anchor,

generating an element structure table according to the element information, comprising:

determining the category, the father node and the anchor point corresponding to the element information according to a preset structure;

and generating the element structure table according to the element information, the determined category, the father node and the anchor point.

In some embodiments, the voice interaction method comprises:

and under the condition that the page is updated, replacing the element information in the element structure table with the updated element information corresponding to the page.

In some embodiments, semantically understanding the voice command according to a preset knowledge graph, and generating semantic representation information includes:

classifying the voice command to generate a query intention;

identifying an entity corresponding to the voice instruction in the case that the query intent is direct-to-speak;

determining corresponding entity information according to the entity and the knowledge graph;

and generating the semantic representation information according to the entity information and the knowledge graph, wherein the semantic representation information comprises intents, elements, pages, applications and anchor points corresponding to the elements, which correspond to the voice commands.

In some embodiments, in the case that the semantic representation information is multiple in number, determining that the voice instruction is ambiguous includes:

determining that the number of the semantic representation information is multiple under the condition that the number of at least one of the elements, the pages and the applications found according to the knowledge graph is multiple, wherein each semantic representation information comprises at least one corresponding ambiguity information.

In some embodiments, controlling the interactive interface to display the query result corresponding to all the semantic representation information includes:

extracting ambiguous information in the semantic representation information to generate a plurality of ambiguous text information with corresponding quantity;

and sequencing the ambiguous text information in a preset mode to generate the query result.

In some embodiments, the voice interaction method comprises:

controlling the interactive interface to jump to a target page according to the target semantic representation information, wherein the step of controlling the interactive interface to jump to the target page comprises the following steps:

under the condition that a page jump instruction is determined according to the target semantic representation information, searching in a preset page routing table according to the page jump instruction so as to determine an application corresponding to the target page and an anchor point of the target element;

and positioning the target element according to the anchor point of the target element so as to control the interactive interface to jump to the target page and highlight the target element in the target page.

The embodiment of the invention provides a server used for interactive communication with a vehicle, wherein the vehicle is provided with an interactive interface, the server comprises a control module and a voice acquisition module,

the control module is used for controlling the voice acquisition module to acquire a voice instruction, and the voice instruction comprises a target element; and

the system is used for carrying out semantic understanding on the voice instruction according to a preset knowledge graph to generate semantic representation information; and

the semantic representation information is used for determining that the voice instruction is ambiguous when the number of the semantic representation information is multiple; and

the voice recognition module is used for controlling the interactive interface to display a query result corresponding to all the semantic representation information, and performing voice broadcast on the query result to control the voice acquisition module to acquire a voice recognition instruction; and

the voice recognition device is used for determining target semantic representation information according to the acquired voice recognition instruction; and

and the interactive interface is controlled to jump to a target page according to the target semantic representation information, and the target page displays the target element.

In the server, under the condition that a plurality of semantic representation information are inquired according to the voice instruction, all the semantic representation information is prompted in a voice conversation mode, the target semantic representation information is confirmed, then the interactive interface can be controlled to directly jump to the corresponding page and the target element is displayed, manual operation is not needed in the whole process, and the function of complete voice interaction is achieved.

The embodiment of the invention provides a voice interaction system, which comprises:

the vehicle is used for acquiring voice information and voice identification instructions and is provided with an interactive interface;

the server is used for acquiring the voice instruction, and the voice instruction comprises a target element; and

the interactive interface is used for controlling the interactive interface to display the query results corresponding to all the semantic representation information, and carrying out voice broadcast on the query results to acquire the voice identification instruction; and

the voice recognition instruction is used for confirming target semantic representation information according to the voice recognition instruction; and

In the voice interaction system, under the condition that a plurality of semantic representation information are inquired according to the voice instruction, all the semantic representation information is prompted in a voice conversation mode, the target semantic representation information is confirmed, and then the interaction interface can be controlled to directly jump to the corresponding page and display the target element, manual operation is not needed in the whole process, and the function of complete voice interaction is achieved.

The embodiment of the invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the voice interaction method described in any of the above embodiments.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a voice interaction method of an embodiment of the present invention;

FIG. 2 is a block diagram of a voice interaction system in accordance with an embodiment of the present invention;

FIG. 3 is a schematic diagram of an interaction process of an embodiment of the invention;

FIG. 4 is another flow chart of a voice interaction method of an embodiment of the present invention;

FIG. 5 is a schematic diagram of a knowledge-graph according to an embodiment of the present invention;

FIG. 6 is yet another flow chart of a voice interaction method of an embodiment of the present invention;

FIG. 7 is a schematic diagram of an element structure table of an embodiment of the present invention;

FIG. 8 is yet another flow chart of a voice interaction method of an embodiment of the present invention;

FIG. 9 is a schematic diagram of the steps for generating semantic representation information according to an embodiment of the present invention;

FIG. 10 is yet another flow chart of a voice interaction method of an embodiment of the present invention;

FIG. 11 is a schematic diagram of a plurality of semantic representation information according to an embodiment of the present invention;

FIG. 12 is yet another flow chart of a voice interaction method of an embodiment of the present invention;

FIG. 13 is yet another flow chart of a voice interaction method of an embodiment of the present invention;

FIG. 14 is a schematic view of an interactive interface of an embodiment of the present invention;

FIG. 15 is a schematic diagram of a voice interaction system in accordance with an embodiment of the present invention.

Description of the main element symbols:

a voice interaction system 100;

vehicle 10, interactive interface 11;

server 20, control module 21, and voice acquisition module 23.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

In the description of the present invention, it should be noted that the terms "mounted," "connected," and "connected" are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected unless otherwise explicitly stated or limited. Either mechanically or electrically. Either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

The disclosure herein provides many different embodiments or examples for implementing different configurations of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Of course, they are merely examples and are not intended to limit the present invention. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples, such repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. In addition, the present invention provides examples of various specific processes and materials, but one of ordinary skill in the art may recognize applications of other processes and/or uses of other materials.

Referring to fig. 1 and fig. 2, a voice interaction method according to an embodiment of the present invention is provided for a vehicle 10. The vehicle 10 has an interactive interface 11. The voice interaction method comprises the following steps:

step S110: acquiring a voice instruction, wherein the voice instruction comprises a target element;

step S120: performing semantic understanding on the voice instruction according to a preset knowledge map to generate semantic representation information;

step S130: determining that the voice instruction is ambiguous when the number of the semantic representation information is multiple;

step S140: controlling the interactive interface 11 to display the query results corresponding to all the semantic representation information, and performing voice broadcast on the query results to acquire a voice identification instruction;

step S150: determining target semantic representation information according to the acquired voice identification instruction;

step S160: and controlling the interactive interface 11 to jump to a target page according to the target semantic representation information, wherein the target page displays target elements.

The voice interaction method according to the embodiment of the present invention may be implemented by the server 20 and the vehicle 10 according to the embodiment of the present invention. Referring to fig. 2, a server 20 is used with the vehicle 10. The vehicle 10 has an interactive interface 11. The server 20 includes a control module 21 and a voice acquisition module 23. The control module 21 is configured to control the voice obtaining module 23 to obtain a voice instruction, where the voice instruction includes a target element; the voice instruction processing device is used for carrying out semantic understanding on the voice instruction according to a preset knowledge map spectrum to generate semantic representation information; and determining that the voice command is ambiguous when the number of the semantic representation information is multiple; and the voice recognition module is used for controlling the interactive interface 11 to display the query results corresponding to all the semantic representation information, performing voice broadcast on the query results, and controlling the voice acquisition module 23 to acquire a voice recognition instruction; and the voice recognition module is used for determining target semantic representation information according to the acquired voice recognition instruction. The vehicle 10 is configured to control the interactive interface 11 to jump to a target page according to the target semantic representation information, where the target page displays the target element.

In the voice interaction method and the server 20, when a plurality of semantic representation information is queried according to the voice instruction, all the semantic representation information is prompted in a voice conversation mode and the target semantic representation information is confirmed, so that the interaction interface 11 can be controlled to directly jump to the corresponding page and display the target element, manual operation is not required in the whole process, and the function of complete voice interaction is realized.

In the related art, the vehicle may recognize the user intention through a voice interaction manner, and may perform a query according to the user intention, so that a search target of the user may be skipped. In practical application, the problem of insufficient operability of dialogue guidance is easily caused under the condition that the image-text display is not supported; the situation that cross-application jumping is not supported can occur when the application is inquired, so that the target application cannot be jumped to; when the query is ambiguous, only a small part of elements can be directly reached, or the user needs to perform a touch screen operation according to specific conditions to complete the confirmation of the ambiguous items, which causes inconvenience.

In order to solve the above problem, in an embodiment of the present invention, when a voice instruction including a target element is acquired, the voice instruction may be semantically understood by combining with a preset knowledge map, so as to generate corresponding semantic representation information. Under the condition that the quantity of the semantic representation information is one, the interactive interface 11 can be controlled to jump to a target page with target elements according to the semantic representation information, and the target page displays the target elements. And when the number of the semantic representation information is multiple, it can be determined that ambiguity exists, and then a query result needs to be obtained according to the multiple semantic representation information to display the query result, and a user is queried according to the query result. After a voice recognition instruction for the user to recognize the plurality of semantic representation information is acquired, corresponding target semantic representation information can be determined, and then the interactive interface 11 is controlled to perform page jump according to the target semantic representation information.

The ambiguity refers to that a plurality of results about the target element can be obtained when the query is performed according to the target element, and the results cannot be filtered according to the voice instruction. It can be understood that in practical applications, there may be cases where a user misses information when performing a voice interaction, and the recognized result (i.e. semantic representation information) may be multiple due to insufficient information, i.e. there is ambiguity. In the case where the recognized result is one, it can be confirmed that there is no ambiguity.

In addition, in the case of establishing the knowledge graph, all elements in the page of the application may be corresponded, so that the elements, the page having the elements, and the application having the page are bound to form a corresponding relationship. Under the condition that the target element is identified, the corresponding target page and the corresponding application can be obtained according to the corresponding relation in the knowledge graph. It can be understood that, according to the above correspondence, even if there is no target element in the application currently displayed on the interactive interface 11, the target page can be searched without performing manual operation.

Specifically, referring to fig. 3, in one embodiment, the voice command is "turn on volume adjustment". The number of semantic representation information including a target element "volume" obtained by semantically understanding "turn on volume adjustment" by the control module 21 according to the knowledge graph is two, wherein one semantic representation information includes "navigation" and the other semantic representation information includes "media", and further a query result including "navigation volume" and "media volume" can be obtained, and a voice message "do you want to turn on navigation volume adjustment or media volume adjustment? ". After the voice recognition instruction (such as "first") is acquired, the semantic representation information including "navigation" can be determined as the target semantic representation information, so that the vehicle 10 can control the interactive interface 11 to jump to a navigation page and display the position corresponding to "volume" in the navigation page to adjust the navigation volume.

In summary, when it is determined that there are a plurality of results obtained by querying according to the voice command, the interactive interface 11 displays the corresponding query result to the user and informs the user of the query result by means of voice query for identification. Because the user can directly send out corresponding voice information according to the query result for identification, no manual operation is needed, and the function of 'seeing and speaking' the query result is favorably realized. Under the condition of acquiring a voice recognition instruction of a user, the interactive interface 11 can be controlled to directly jump to a target page according to the target semantic representation information, so that the efficiency is improved, and the function of 'one language directly reaching' can be realized. Under the condition of combining the knowledge graph, the page jump of the cross-application can be supported, and in addition, a newly added application can be added into the knowledge graph, so that the universality and the convenience are improved. One-through refers to cross-application hopping between multiple applications based on voice information. In one embodiment, the interactive Interface 11 is a VUI (Voice User Interface).

Referring to fig. 4, in some embodiments, a voice interaction method includes:

step S210: determining an application page and generating element information of the corresponding page;

step S230: generating an element structure table according to the element information;

step S250: and establishing a knowledge graph according to the element structure table.

The voice interaction method according to the embodiment of the present invention may be implemented by the server 20 according to the embodiment of the present invention. Referring to fig. 2, the server 20 is configured to determine a page of an application, and generate element information of the corresponding page; and is used for generating an element structure table according to the element information; and the knowledge graph is established according to the element structure table.

Therefore, the corresponding application and page can be conveniently and quickly positioned according to the target element.

Specifically, in such an embodiment, when all the element information in the page is determined according to the page corresponding to the application, the obtained element information may be added through the NLU operation platform to generate the element structure table, the correspondence between each piece of information in the element information may be determined according to the element structure table, and then the knowledge graph is established according to the correspondence.

Referring to fig. 5, in the embodiment shown in fig. 5, the application includes "navigation" and "media", where the "navigation" application has a "volume" page, an element "volume adjustment" is in the "volume" page, the "media" application has a "volume" page, an element "volume adjustment" is in the "volume" page, and the application, the page, and the element form a corresponding relationship by way of a parent-child relationship. Under the condition that a plurality of applications are determined, the element structure table corresponding to each application is integrated according to the corresponding relation among the applications, the pages and the elements, and finally the knowledge graph corresponding to all the applications can be established. For the embodiments with three or more applications, the specific principle can refer to the principle of the embodiment shown in fig. 5, and will not be described herein.

Referring to FIG. 6, in some embodiments, the element structure table includes element information, categories, parent nodes, and anchors. Step S230, including:

step S231: determining the category, the father node and the anchor point of corresponding element information according to a preset structure;

step S233: and generating an element structure table according to the element information, the determined category, the father node and the anchor point.

The voice interaction method according to the embodiment of the present invention may be implemented by the server 20 according to the embodiment of the present invention. Referring to fig. 2, the server 20 is configured to determine the category, the parent node, and the anchor point of the corresponding element information according to a preset structure; and generating an element structure table according to the element information, the determined category, the father node and the anchor point.

Therefore, the target application and the target page can be conveniently and quickly positioned.

Specifically, referring to fig. 7, in the embodiment shown in fig. 7, the application is "navigation", the corresponding page is "volume page", and the element information includes "volume", "volume switch", and "volume adjustment". Referring to fig. 5, according to the mapping relationship in the element structure table, the following relationship can be determined: the category of "navigation" is application; the category of the volume is a page, the father node of the page is navigation, and the corresponding anchor point is volume; the category of the volume adjustment is element, the father node is volume, and the corresponding anchor point is volume set. According to the mapping relation, after the anchor point of the target element is determined, the anchor point of the target page can be rapidly determined, and then the interactive interface 11 is controlled to jump according to the anchor point of the target page.

In addition, in other embodiments, the page address may be generated according to the anchor point corresponding to each page, and after the anchor point is determined, the corresponding page address may be directly determined, so that the target page may be skipped according to the page address.

In some embodiments, a voice interaction method comprises:

and under the condition that the page is updated, replacing the element information in the element structure table with the updated element information of the corresponding page.

The voice interaction method according to the embodiment of the present invention may be implemented by the server 20 according to the embodiment of the present invention. Referring to fig. 2, the control module 21 is configured to replace the element information in the element structure table with the updated element information of the corresponding page when the page is updated.

Therefore, the knowledge graph can be updated correspondingly according to the updating of the page in time.

It will be appreciated that in practical situations, the application in the client of the vehicle will be updated such that the corresponding page will also be updated with corresponding changes. By replacing the previous element information in the element structure table with the element information corresponding to the updated page, the update of the knowledge graph can be ensured, and the situation that the element information corresponding to the updated page cannot be found according to the knowledge graph is avoided. The update of the application can be online update of the application, or offline update generated by adjusting and setting the application or the page according to personal habits. The update of the knowledge graph can be performed after the voice interaction operation is completed or before the voice interaction is performed. In one embodiment, the server 20 may determine updated element information based on data information transmitted by the vehicle 10 while performing voice interactions, so that the knowledge-graph may be updated offline.

Referring to fig. 8, in some embodiments, step S120 includes:

step S121: classifying the voice command according to intentions to generate query intentions;

step S123: under the condition that the query intention is direct, identifying an entity corresponding to the voice instruction;

step S125: determining corresponding entity information according to the entity and the knowledge graph;

step S127: and generating semantic representation information according to the entity information and the knowledge graph, wherein the semantic representation information comprises intents, elements, pages, applications and anchor points corresponding to the elements corresponding to the voice instructions.

The voice interaction method according to the embodiment of the present invention may be implemented by the server 20 according to the embodiment of the present invention. Please refer to fig. 2, the control module 21 is configured to perform intent classification on the voice command to generate a query intent; and an entity for identifying the corresponding voice instruction under the condition that the query intention is direct; and is used for determining corresponding entity information according to the entity and the knowledge graph; and generating semantic representation information according to the entity information and the knowledge graph, wherein the semantic representation information comprises intentions, elements, pages, applications and anchor points corresponding to the elements corresponding to the voice instructions.

As such, the voice instructions may be made computationally executable functions without loss of information.

Specifically, referring to fig. 9, in the embodiment shown in fig. 9, the voice message uttered by the user is "jump to navigation volume adjustment". And through intention classification of the voice information, determining that the intention is one-language direct of the query target element. The method comprises the steps of obtaining navigation and volume adjustment for the voice information recognition entity, carrying out knowledge reasoning according to a knowledge graph and the obtained entity to obtain corresponding entity information which is applied navigation and element volume adjustment, and sequentially determining navigation volume and anchor volume of a page so as to generate semantic representation information corresponding to the voice information. In the embodiment shown in fig. 9, the generated semantic representation information is "intention: directly reaching the first language; elements: adjusting the volume; page: navigation volume; the application comprises the following steps: navigating; anchor point: volumeset ". In some embodiments, the intent may be generated by a pre-set sentence classification model.

In addition, under the condition of determining the semantic representation information, a control instruction for controlling the interactive interface 11 to jump can be generated according to the intention in the semantic representation information, the page address of the page to be jumped of the interactive interface 11 can be determined according to the page in the semantic representation information, the application corresponding to the page to be jumped can be determined by the interactive interface 11 according to the application in the semantic representation information, and the page jump parameter can be generated according to the element and the anchor point in the semantic representation information. Because the semantic representation information completely comprises the entity in the voice command, the corresponding control command can be generated according to the voice command, and the influence on the control accuracy caused by the loss of the information can be avoided.

Referring to fig. 10, in some embodiments, step S130 includes:

step S131: and under the condition that at least one of the elements, the pages and the applications searched according to the knowledge graph is multiple, determining that the number of semantic representation information is multiple, wherein each semantic representation information comprises at least one corresponding ambiguity information.

The voice interaction method according to the embodiment of the present invention may be implemented by the server 20 according to the embodiment of the present invention. Referring to fig. 2, the control module 21 is configured to determine that the number of semantic representation information is multiple when the number of at least one of the elements, pages, and applications found according to the knowledge graph is multiple, where each semantic representation information includes at least one corresponding ambiguity information.

In this manner, all ambiguous semantic representation information can be determined.

Specifically, referring to fig. 3, in the illustrated embodiment, the number of applications found according to the knowledge-graph is two ("navigation" and "media"), so that the first semantic representation information (i.e. semantic one) and the second semantic representation information (i.e. semantic two) can be determined, where the application corresponding to semantic one is "navigation" and the application corresponding to semantic two is "media", that is: and ambiguity information between the first semantic and the second semantic is an application. In other embodiments, the ambiguity information can be determined as elements and/or pages and/or applications according to specific situations, and all semantic representation information with at least one ambiguity information can be generated, and the specific principle is similar to that of the above embodiments.

Referring to fig. 12, in some embodiments, step S140 includes:

step S141: extracting ambiguous information in the semantic representation information to generate a plurality of ambiguous text information with corresponding quantity;

step S143: and sequencing the plurality of ambiguous text information in a preset mode to generate a query result.

The voice interaction method according to the embodiment of the present invention may be implemented by the server 20 according to the embodiment of the present invention. Referring to fig. 2, the control module 21 is configured to extract ambiguous information in the semantic representation information to generate a plurality of ambiguous text information with a corresponding number; and the method is used for sequencing the ambiguous text information in a preset mode to generate a query result.

Therefore, the method can be used for prompting the ambiguous information conveniently in a mode of combining voice prompt and image-text display.

Specifically, referring to fig. 3 and 11, in the illustrated embodiment, a first ambiguous text message "navigation volume" may be generated based on ambiguous information in semantic one, and a second ambiguous text message "media volume" may be generated based on ambiguous information in semantic two, and semantic IDs of semantic one and semantic two are determined. In fig. 3, the first ambiguous text information and the second ambiguous text information are ranked according to a preset manner, and it is determined that the rank of the first ambiguous text information is the first, and the rank of the second ambiguous text information is the second, so that a query result is generated according to the ranked first ambiguous text information and the second ambiguous text information.

In the embodiment shown in fig. 11, when the semantic identification information is acquired as "first", it is determined that the semantic representation information selected by the user is semantic one, and further, the target semantic representation information can be determined from the semantic ID of semantic one.

It can be understood that, in the case of displaying the query result through the interactive interface 11, the user may be informed of the number of all ambiguous text information in the query result and the corresponding sorting position, and further, the user may be enabled to send out semantic identification information according to the corresponding sorting position. In other embodiments, in the case where the number of the plurality of ambiguous text information is a preset number, the "which is you going to open? "to prompt the user to confirm the query results. The preset number may be three or more, and other embodiments are not limited herein.

Referring to fig. 13, in some embodiments, step S160 includes:

step S161: under the condition of determining a page jump instruction according to the target semantic representation information, searching in a preset page routing table according to the page jump instruction so as to determine an application corresponding to a target page and an anchor point of a target element;

step S163: and positioning the target element according to the anchor point of the target element so as to control the interactive interface 11 to jump to the target page and highlight the target element in the target page.

The voice interaction method of the embodiment of the invention can be implemented by the vehicle 10 of the embodiment of the invention. Referring to fig. 2, the vehicle 10 is configured to search in a preset page routing table according to a page jump instruction when determining the page jump instruction according to the target semantic representation information, so as to determine an application corresponding to the target page and an anchor point of the target element; and the anchor point is used for positioning the target element according to the anchor point of the target element so as to control the interactive interface 11 to jump to the target page and highlight the target element in the target page.

Therefore, the page to be jumped and the target element can be quickly positioned, and the functions of cross-application jump and anchor point positioning are realized.

Referring specifically to fig. 2, in some embodiments, the operating system on the vehicle 10 may register with all loadable applications to generate a corresponding page routing table, where the page routing table includes logical mappings between page addresses, applications, and elements of pages to be jumped in the applications. In the event that the vehicle 10 receives a page jump instruction, the page address, the logical mapping relationship between the corresponding application and element, and the anchor point of the target element are determined by looking up the corresponding page routing table. Anchor point parameters are generated through the page jump distributor, and the vehicle 10 positions the target element according to the anchor point parameters, so that the corresponding application is determined, the interactive interface 11 jumps to the target page, and the target element is displayed. Under the condition that the application currently displayed on the interactive interface 11 is different from the application corresponding to the target page, the interactive interface 11 can be directly controlled to jump, so that the function of cross-application jumping can be realized, the target element can be quickly positioned according to the anchor point, and the function of anchor point positioning is realized.

In other embodiments, the vehicle 10 includes a system layer and an application layer, and the page routing table is stored in the system layer, and the anchor location of the applications and elements in the application layer can be performed through the page routing table in the system layer. In one embodiment, the vehicle 10 generates the page jump instruction via a voice assistant.

Referring to fig. 14, in the embodiment shown in fig. 14, the target element is "volume adjustment", the target page is "volume", and the corresponding application is "media". Specifically, after the page jump instruction is executed, the interactive interface 11 jumps to a page to be jumped, and performs corresponding adjustment so that the interactive interface 11 highlights a target element in the page. In other embodiments, the highlighting may be adjusting a display area where the target element is located, and may be scrolling up and down to display the target element at a preset position of the interactive interface 11. The adjustment of the display area may be to increase the brightness or to change the display color. The preset position may be a central region of the interactive interface 11.

In addition, in such an embodiment, the control module 21 has a semantic understanding module, a semantic execution module, an NLU operation platform module and a knowledge graph module, the semantic understanding module is used for realizing intent classification, entity identification, knowledge inference and semantic representation information generation, the semantic execution module is used for ambiguity clarification, page jump instruction generation and page jump instruction transmission, and the NLU operation platform module is used for receiving page information uploaded by a vehicle and generating an element structure table.

Referring to fig. 2 and fig. 15, a voice interaction system 100 according to an embodiment of the present invention includes a vehicle 10 and a server 20. The vehicle 10 is used to collect voice information, and the vehicle 10 has an interactive interface 11. The server 20 is used for acquiring a voice instruction, and the voice instruction comprises a target element; the voice instruction processing device is used for carrying out semantic understanding on the voice instruction according to a preset knowledge map spectrum to generate semantic representation information; and determining that the voice command is ambiguous when the number of the semantic representation information is multiple; and the voice recognition module is used for controlling the interactive interface 11 to display the query results corresponding to all the semantic representation information, and performing voice broadcast on the query results to acquire a voice recognition instruction; and is used for confirming the target semantic representation information according to the voice identification command; and the interactive interface 11 is controlled to jump to a target page according to the target semantic representation information, and the target page displays target elements.

In the voice interaction system 100, under the condition that a plurality of semantic representation information are queried according to the voice instruction, all the semantic representation information is prompted and the target semantic representation information is confirmed in a voice conversation mode, and then the interactive interface 11 can be controlled to directly jump to the corresponding page and display the target element, so that the whole process does not need to be manually operated, and the function of complete voice interaction is realized.

Specifically, referring to fig. 15, in the embodiment shown in fig. 15, the knowledge graph may be stored in the server 20. In one embodiment, where voice instructions are collected by the vehicle 10, the voice instructions may be uploaded to the server 20 such that the server 20 derives the semantic representation information from the voice instructions and the knowledge-graph. In the case that the number of the semantic representation information is multiple, it may be determined that there is ambiguity, and then a query result is generated according to the multiple semantic representation information, and the query result is transmitted to the vehicle 10, so that the interactive interface 11 displays the query result, and performs voice broadcast on the query result. Under the condition that the semantic identification information is collected, the vehicle 10 determines target semantic representation information according to the semantic identification information and generates a page jump instruction, so that the target page and the target element can be positioned, and the interactive interface 11 is controlled to jump to the target page to display the target element.

It is understood that the vehicle 10 may display the query results and the target page via the interactive interface 11 so that the results of the voice interaction may be presented. In one embodiment, the vehicle 10 may determine the page address of the target page and the anchor of the target element via the system level page routing table, and may complete the jump of the target page and implement the anchor location logic.

Because the user can directly send out corresponding voice information according to the query result for identification, no manual operation is needed, and the function of 'seeing and speaking' the query result is favorably realized. Under the condition of acquiring a voice recognition instruction of a user, the interactive interface 11 can be controlled to directly jump to a target page according to the target semantic representation information, so that the efficiency is improved, and the function of 'one language directly reaching' can be realized. Under the condition of combining the knowledge graph, the page jump of the cross-application can be supported, and in addition, a newly added application can be added into the knowledge graph, so that the universality and the convenience are improved. In one embodiment, the server 20 is a cloud.

The embodiment of the invention provides a computer readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the voice interaction method of any of the above embodiments.

For example, in the case of a computer program being executed, the following steps may be implemented:

The computer-readable storage medium may be provided in the vehicle 10, or may be provided in a terminal such as the server 20, and the vehicle 10 can communicate with the terminal to acquire the corresponding program.

It is understood that the computer-readable storage medium may include: any entity or device capable of carrying a computer program, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), software distribution medium, and the like. The computer program includes computer program code. The computer program code may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), software distribution medium, and the like.

In some embodiments of the present invention, the control module 21 may be a single chip integrated with a processor, a memory, a communication module, and the like. The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processing module-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.

In the description of the specification, references to the terms "one embodiment", "some embodiments", "certain embodiments", "illustrative embodiments", "examples", "specific examples", or "some examples", etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A voice interaction method for a vehicle having an interaction interface, the voice interaction method comprising:

2. The voice interaction method according to claim 1, wherein the voice interaction method comprises:

generating an element structure table according to the element information;

and establishing the knowledge graph according to the element structure table.

3. The voice interaction method of claim 2, wherein the element structure table comprises the element information, a category, a parent node, and an anchor point,

4. The voice interaction method according to claim 2, wherein the voice interaction method comprises:

5. The voice interaction method of claim 1,

performing semantic understanding on the voice command according to a preset knowledge graph to generate semantic representation information, wherein the semantic representation information comprises:

classifying the voice command to generate a query intention;

6. The voice interaction method of claim 5,

in the case that the number of the semantic representation information is multiple, determining that the voice instruction is ambiguous includes:

7. The voice interaction method of claim 6,

controlling the interactive interface to display the query results corresponding to all the semantic representation information, including:

8. The voice interaction method of claim 1,

9. A server for interactive communication with a vehicle, the vehicle having an interactive interface, the server comprising a control module and a voice acquisition module,

10. A voice interaction system, comprising:

the vehicle is used for acquiring voice instructions and voice recognition instructions and is provided with an interactive interface;

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method for voice interaction according to any one of claims 1 to 8.