CN113239178A

CN113239178A - Intention generation method, server, voice control system and readable storage medium

Info

Publication number: CN113239178A
Application number: CN202110775553.4A
Authority: CN
Inventors: 张又亮; 张崇宇; 申众; 杨振东; 翁志伟; 胡梓垣
Original assignee: Zhaoqing Xiaopeng New Energy Investment Co Ltd
Current assignee: Zhaoqing Xiaopeng New Energy Investment Co Ltd
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2021-08-10

Abstract

The invention discloses an intention generation method, a server, a voice control system and a readable storage medium. The intention generation method comprises the following steps: under the condition of receiving a voice request, performing entity extraction on the voice request to determine entity information; associating the preset intention knowledge graph with the entity information so as to inquire knowledge; acquiring a plurality of candidate paths through knowledge query; determining a target path according to the candidate paths and the acquired current environment state information of the vehicle; and processing the target path to generate a target intention. The intention generation method can judge different actual intentions of the user in different scenes by semantically understanding the voice request sent by the user and combining the current environment state information even if the voice request is the same, and has good applicability.

Description

Intention generation method, server, voice control system and readable storage medium

Technical Field

The present invention relates to the field of intelligent voice control, and in particular, to an intention generation method, a server, a voice control system, and a readable storage medium.

Background

With the development of intelligent automobiles, the functions of vehicle-mounted systems are more and more abundant, so that the vehicle-mounted intelligent voice conversation system needs to distinguish actual intentions in voice information of users in more and more complex actual scenes.

Disclosure of Invention

Embodiments of the present invention provide an intention generation method, a server, a voice control system, and a computer-readable storage medium.

The embodiment of the invention provides an intention generation method in voice interaction, which is used for a vehicle and comprises the following steps: under the condition of receiving a voice request, performing entity extraction on the voice request to determine entity information; associating the preset intention knowledge graph with the entity information so as to inquire knowledge; acquiring a plurality of candidate paths through knowledge query; determining a target path according to the candidate paths and the acquired current environment state information of the vehicle; and processing the target path to generate a target intention.

According to the intention generation method, the semantic understanding is carried out on the voice request sent by the user, and the current environment state information is combined, so that different actual intentions of the user can be judged under different scenes even if the same voice request is used, and the method has good applicability.

In some embodiments, the intent generation method comprises: determining a plurality of control entities of the vehicle and containment relationships between the plurality of control entities; determining an action entity corresponding to the control entity and an intention relationship between the action entity and the control entity; and establishing the intention knowledge graph according to the control entity, the containing relation, the action entity and the intention relation.

In some embodiments, in the case of receiving a voice request, performing entity extraction on the voice request, and determining entity information includes: preprocessing the voice request; carrying out entity identification according to a word list library and determining the entity information; associating the entity information according to a preset intention knowledge graph so as to perform knowledge query, wherein the method comprises the following steps: and carrying out entity linkage according to the entity information.

In some embodiments, the number of the entity information is multiple, and the entity linking according to the entity information includes: confirming a plurality of characteristics corresponding to each entity information; and sequencing the entity information according to the characteristics to obtain the entity information with the highest matching degree, and acquiring the candidate path according to the entity information with the highest matching degree.

In some embodiments, determining a target route according to the plurality of candidate routes and the acquired current environmental state information of the vehicle includes: similarity calculation is carried out according to the candidate paths and the environment state information, and probability values corresponding to the candidate paths in sequence are obtained; determining the target path according to a highest one of the plurality of probability values.

In some embodiments, the environmental status information includes at least one of: the current page information displayed by the display device on the vehicle comprises control information and a control state which are displayed corresponding to the current page; status information of entities in the current candidate path.

In some embodiments, processing the target path to generate a target intent comprises: determining a combined node according to the target path; generating combined entity information according to the combined node under the condition that the combined node meets a preset condition; generating the target intent from the combined entity.

In some embodiments, the intent generation method comprises: and sending corresponding prompt information under the condition that the combined node does not meet the preset condition.

The embodiment of the invention provides a server which is used for being in communication connection with a vehicle, and comprises a control module and a voice receiving module, wherein the voice receiving module is used for receiving a voice request sent by the vehicle, and the control module is used for performing entity extraction on the voice request and determining entity information under the condition of receiving the voice request; the system comprises a preset intention knowledge graph, entity information and a database, wherein the preset intention knowledge graph is used for being associated with the entity information so as to carry out knowledge query; and for obtaining a plurality of candidate paths through knowledge query; the target route is determined according to the candidate routes and the acquired current environment state information of the vehicle; and the system is used for processing the target path to generate a target intention.

The server can judge different actual intentions of the user in different scenes by semantically understanding the voice request sent by the user and combining the current environment state information even if the voice request is the same, and has good applicability.

The embodiment of the invention provides a voice control system, which comprises a vehicle and a server, wherein the vehicle is used for acquiring and sending a voice request; the server is used for performing entity extraction on the voice request under the condition of receiving the voice request and determining entity information; the system comprises a preset intention knowledge graph, entity information and a database, wherein the preset intention knowledge graph is used for being associated with the entity information so as to carry out knowledge query; and for obtaining a plurality of candidate paths through knowledge query; the target route is determined according to the candidate routes and the acquired current environment state information of the vehicle; and the system is used for processing the target path to generate a target intention.

The voice control system can judge different actual intentions of the user in different scenes by performing semantic understanding on the voice request sent by the user and combining the current environment state information even if the voice request is the same, and has good applicability.

The embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the intention generation method according to any one of the above-mentioned embodiments.

The computer-readable storage medium can judge different actual intentions of the user in different scenes by semantically understanding the voice request sent by the user and combining the current environment state information even if the voice request is the same, and has good applicability.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of an intent generation method of an embodiment of the present invention;

FIG. 2 is a schematic diagram of a voice control system of an embodiment of the present invention;

FIG. 3 is a block diagram of a server according to an embodiment of the invention;

FIG. 4 is another flow chart of an intent generation method of an embodiment of the present invention;

FIG. 5 is a schematic diagram of an intent knowledge graph in accordance with an embodiment of the present invention;

FIG. 6 is yet another flow chart of an intent generation method of an embodiment of the present invention;

FIG. 7 is a schematic diagram of feature information generation from a voice request according to an embodiment of the present invention;

FIG. 8 is yet another flow chart of an intent generation method in accordance with an embodiment of the present invention;

FIG. 9 is yet another flowchart of an intent generation method in accordance with an embodiment of the present invention;

FIG. 10 is a flow chart illustrating the determination of a target path according to an embodiment of the present invention;

FIG. 11 is a block schematic diagram of a processor of a control module of an embodiment of the present invention;

fig. 12 is still another flowchart of the intention generation method according to the embodiment of the invention.

Description of the main element symbols:

a voice control system 100;

the vehicle 10, the voice processing module 11, the display device 13, the server 20, the control module 21, the voice receiving module 23, the processor 25, the knowledge encoding module 26, the knowledge encoding module 27, the language encoding module 28, and the language encoding module 29.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

In the description of the present invention, it should be noted that the terms "mounted," "connected," and "connected" are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected unless otherwise explicitly stated or limited. Either mechanically or electrically. Either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

The disclosure herein provides many different embodiments or examples for implementing different configurations of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Of course, they are merely examples and are not intended to limit the present invention. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples, such repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. In addition, the present invention provides examples of various specific processes and materials, but one of ordinary skill in the art may recognize applications of other processes and/or uses of other materials.

Referring to fig. 1, an embodiment of the invention provides an intention generation method for a vehicle 10. The intention generation method comprises the following steps:

02: under the condition of receiving a voice request, carrying out entity extraction on the voice request to determine entity information;

03: associating the preset intention knowledge graph with entity information so as to inquire knowledge;

04: acquiring a plurality of candidate paths through knowledge query;

05: determining a target path according to the multiple candidate paths and the acquired current environment state information of the vehicle 10;

06: and processing the target path to generate a target intention.

The intention generation method according to the embodiment of the present invention can be implemented by the server 20 according to the embodiment of the present invention. Specifically, referring to fig. 2 and 3, the server 20 is configured to be communicatively coupled to the vehicle 10. The server 20 includes a control module 21 and a voice receiving module 23. The voice receiving module 23 is used for receiving a voice request sent by the vehicle 10. The control module 21 is configured to, in a case where a voice request is received, perform entity extraction on the voice request, and determine entity information; the system comprises a preset intention knowledge graph, entity information and a database, wherein the intention knowledge graph is used for associating with the entity information so as to inquire knowledge; and for obtaining a plurality of candidate paths through knowledge query; the target route is determined according to the plurality of candidate routes and the acquired current environment state information of the vehicle 10; and the system is used for processing the target path to generate a target intention.

The intention generation method and the server 20 can judge different actual intentions of the user in different scenes by semantically understanding the voice request sent by the user and combining the current environment state information even if the voice request is the same, and have good applicability.

Specifically, in one embodiment, when the user sends out voice information, the voice information may be confirmed, so that a corresponding voice request may be obtained, and corresponding entity information may be extracted from the voice request. After the preset intention knowledge graph is combined, entity information can be further associated, so that knowledge query of the voice request can be realized, and a candidate path corresponding to the voice request can be obtained.

It can be understood that, since the user may issue the same voice request in different actual scenes and may issue (or prefer to issue) a voice request with a short meaning in many cases, after the knowledge query is performed, a plurality of candidate paths may be generated, and on the basis that each candidate path corresponds to an intention, the actual intention (corresponding to the target intention) to be expressed by the voice request may be unclear. In this case, the environmental status information corresponding to the current environment of the vehicle 10 may be acquired, and then the target route in the current environment of the vehicle 10 corresponding to the voice request may be determined by combining the environmental status information and the plurality of candidate routes, and the target intention may be generated according to the target route, so that the vehicle 10 may further determine the control instruction corresponding to the voice request according to the target intention.

Regarding the target path, specifically, in one embodiment, a plurality of corresponding candidate paths may be matched through knowledge query, and in combination with the environmental status information, the matched candidate paths may be subjected to path ranking, so that the target path may be determined according to the candidate path with the highest matching degree after ranking.

In addition, in such an embodiment, in the case that the number of candidate paths that can be matched by knowledge query is one, it can be determined that the voice request has complete directivity without combining the environmental state information. That is, incorporating environmental status information may have the effect of ambiguously clarifying a voice request.

Furthermore, in the embodiment shown in fig. 2 and 3, the vehicle 10 may include the voice processing module 11, and specifically, the vehicle 10 may acquire the voice request sent by the user through the voice processing module 11 and may transmit the acquired voice request to the server 20, so that the server 20 may receive the voice request sent by the vehicle 10 through the voice receiving module 23.

In some embodiments, the environmental status information may be current page information displayed by the display device 13 on the vehicle 10. The current page information may include control information and control states corresponding to the current page display. It is understood that the display device 13 may display information (e.g., application information) related to the vehicle 10, and when the current page of the display screen displays the control information and the control state of the vehicle 10, the environment state information may be determined according to the displayed control information and the control state.

Specifically, in such an embodiment, in the case where the acquired voice request is "open account setting", the current page displayed on the display device 13 may be confirmed. When the current page is a page of a music application or a video application, the setting of the account with the target intention of opening the application can be determined, so that the display device 13 can be controlled to be switched to the setting page of the account corresponding to the application from the current page; when the current page is a personal center page, the setting of an account with the target intention of opening the personal center can be determined, then voice interaction can be carried out with a user according to a TTS (text to speech) technology, the setting intention of the user on the account is confirmed by voice information sent by the user in the voice interaction, and corresponding account information is modified according to the setting intention, so that the function of setting the related information of the account in a voice mode can be realized; when the current page is other pages, the account page of the personal center can be determined as the target intention, so that the display device 13 can be controlled to be switched to the account page of the vehicle-mounted personal center from the current page.

That is, in the case that the voice request sent by the user is ambiguous in intention due to the absence of the corresponding control body, the target intention of the user can be inferred by combining with the currently displayed page of the display device 13, and then corresponding control and operation can be performed.

In other embodiments, the display device 13 may be a display screen of the vehicle 10, or may be a terminal device having a display function and capable of communicating with the vehicle 10. In one embodiment, the terminal device may be a mobile phone.

In some embodiments, the environmental status information may be control status information of the entities in the current candidate path.

Specifically, in such an embodiment, in the case that the obtained voice request is "too hot", the candidate path may be determined by the intention knowledge graph, and the control entities corresponding to the candidate path may be determined as "window" and "air conditioner", so that the state information of the window and the air conditioner may be further confirmed. When the state information of the vehicle window is determined to be open and the state information of the air conditioner is determined to be closed, the target intention can be determined to be the vehicle window closing and the air conditioner opening, so that the vehicle window closing and the air conditioner opening can be executed; when the state information of the vehicle window is determined to be closed and the state information of the air conditioner is determined to be closed, the target intention can be determined to be to open the air conditioner, and therefore the operation of opening the air conditioner can be executed; when the state information of the vehicle window is determined to be closed and the state information of the air conditioner is determined to be opened, the target purpose can be determined to be that the target cooling temperature of the air conditioner needs to be reduced, and therefore the air conditioner can be controlled to further reduce the current temperature.

That is to say, in the case of determining the corresponding candidate route through the voice request, the state information of the entity corresponding to the candidate route may be further determined, and the environment state information may be determined according to the state information of the entity, so that the target intention of the user may be further determined according to the environment state information, and thus, corresponding control and operation may be performed.

In summary, by determining and combining the environmental status information, the intention corresponding to the voice request can be more clear, so that the problem that the voice request cannot clearly require the operation and action performed by the vehicle 10 due to the characteristics of fuzzy intention and missing information can be avoided, and thus, the intention generation method of the present invention can be favorably improved in applicability in different application scenarios.

In addition, in another embodiment, in the case that the acquired voice request is "i want to charge", the current location state of the vehicle 10 may be confirmed, and when it is confirmed that the vehicle 10 is far from the charging station according to the current location state, it may be determined that the user needs to navigate to the charging station, so that the display device 13 may be controlled to display a navigation page to the charging station; when it is confirmed that the vehicle 10 is far from the charging station according to the current location state, it may be determined that the user needs to open the charging port, so that the charging port on the vehicle 10 may be controlled to be opened to facilitate the charging operation by the user.

Of course, the above embodiments only list some cases related to the environmental status information, and it can be understood that, for those skilled in the art, under the teaching of the embodiments of the present invention, different environmental status information can be determined according to the corresponding specific scenarios, so that the actual intention of the user can be correspondingly determined, and finally the same or similar technical effects as those of the above embodiments can be achieved. For avoiding redundancy, the embodiment of the environmental status information in other scenarios is not limited herein.

Referring to fig. 4, in some embodiments, the intent generation method includes:

011: determining a plurality of control entities of the vehicle 10 and containment relationships between the plurality of control entities;

012: determining an action entity corresponding to the control entity and an intention relationship between the action entity and the control entity;

013: and establishing an intention knowledge graph according to the control entity, the inclusion relationship, the action entity and the intention relationship.

The intention generation method according to the embodiment of the present invention can be implemented by the server 20 according to the embodiment of the present invention. Specifically, referring to fig. 2 and 3, the control module 21 is configured to determine a plurality of control entities of the vehicle 10 and containment relationships among the plurality of control entities; the action entity corresponding to the control entity and the intention relation between the action entity and the control entity are determined; and the system is used for establishing an intention knowledge graph according to the control entity, the inclusion relation, the action entity and the intention relation.

Therefore, the corresponding candidate path can be conveniently and quickly positioned according to the voice request.

Specifically, in such an embodiment, please refer to fig. 5, the control entities in the illustrated embodiment include "air conditioner", "air conditioner temperature", "air conditioner air volume", "smart mode", "air conditioner wind direction". After determining the control entities, the relationship between the control entities, such as the relationship between "air conditioner" and "air volume of air conditioner", may be further determined. After the relationship among the control entities is determined, the inclusion relationship among the control entities can be obtained according to the relationship, for example, when the control to be controlled is determined to be an air conditioner, the air volume of the air conditioner to be adjusted is further determined. In addition, in other embodiments, the app (application) page data may be converted into data json (javascript Object notification), and since the data json may include layout data of the entire app page, the corresponding control entity (e.g., attribute) and the containment relationship between the control entities may be extracted therefrom.

In such another embodiment, please refer to fig. 5 again, in the case of determining the control entities, the action entities corresponding to each control entity may be determined, for example, the action entities of the "air conditioner" may include "on" and "off", and the action entities of the "air conditioner direction" may include "up", "down", "left" and "right". After determining the plurality of action entities, the relationship between the action entities and the corresponding control entities may be further determined, so that the intention relationship between the action entities and the corresponding control entities may be obtained according to the relationship. In other embodiments, the relationship between the control entity and the action entity may be extracted based on part-of-speech, dependency parsing, deep dive, etc., based on a corpus of predetermined intent.

On the basis, under the condition that the control entity, the inclusion relationship, the action entity and the intention relationship are determined, the intention knowledge graph shown in fig. 5 can be established, so that under the condition that a voice request is received, entity information can be associated according to the established intention knowledge graph to determine corresponding entities and relationships, corresponding candidate paths can be further determined according to the corresponding entities and relationships, and the quick positioning of the candidate paths is facilitated.

Referring to fig. 6, in some embodiments, in the case of receiving a voice request, performing entity extraction on the voice request to determine entity information includes:

021: preprocessing a voice request;

022: performing entity identification according to the word list library, and determining entity information;

associating the preset intention knowledge graph with entity information so as to perform knowledge query, wherein the method comprises the following steps:

031: and carrying out entity linkage according to the entity information.

The intention generation method according to the embodiment of the present invention can be implemented by the server 20 according to the embodiment of the present invention. Specifically, please refer to fig. 2 and fig. 3, the control module 21 is configured to preprocess the voice request; and is used for carrying on the entity recognition according to the word list storehouse, confirm the entity information; and for performing entity linking according to the entity information.

In this manner, the controls and corresponding actions of the corresponding vehicle 10 may be determined to facilitate subsequent knowledge queries.

Specifically, referring to fig. 7, in the illustrated embodiment, when a voice request (query) is acquired, shallow semantic understanding may be performed on the voice request, and word segmentation and keyword extraction may be performed on the voice request to implement preprocessing on the voice request.

In case of completing the preprocessing of the voice request, Entity Recognition is performed on the voice request by performing dependency syntax analysis and Named Entity Recognition (NER), so that Entity information corresponding to the voice request can be obtained. In one embodiment, entities in the voice request can be identified through a Bert-Lstm-CRF model, sentence components are determined as entities through dependency syntax analysis, and ELK (elastic search, Logstash and Kibana) is introduced as an auxiliary means to improve the identification precision.

Additionally, it will be appreciated that in practice, for some entities of the vehicle 10 that have lengthy, complex, and uncommon names, the user may prefer to use a short, understandable vocabulary for naming, such that the entity name is often not included in the voice request issued. In such an embodiment, please refer to fig. 7 again, entity identification may be performed by combining a preset vocabulary library, where the vocabulary library includes information such as vocabularies, near-synonyms, and synonyms related to the vehicle-mounted service, so that the method may be used for assisting understanding of the voice request, and further may conveniently determine entity information corresponding to the voice request.

When the entity information is acquired, corresponding features can be generated according to the entity information, and entity linking can be performed on the entity information through the features corresponding to the entity information, so that the candidate path can be determined according to the features of the entity information.

In some embodiments, the number of entity information is plural. Referring to fig. 8, the entity linking according to the entity information to generate the feature information includes:

0311: confirming a plurality of characteristics corresponding to each entity information;

0312: and sequencing the entity information according to the characteristics to obtain the entity information with the highest matching degree, and acquiring the candidate path according to the entity information with the highest matching degree.

The intention generation method according to the embodiment of the present invention can be implemented by the server 20 according to the embodiment of the present invention. Specifically, please refer to fig. 2 and fig. 3, the control module 21 is configured to determine a plurality of characteristics corresponding to each entity information; and the candidate path is obtained according to the entity information with the highest matching degree.

Thus, the matching precision of the candidate paths can be improved.

Specifically, please refer to fig. 7 again, in the embodiment shown in fig. 7, a plurality of corresponding features may be obtained according to the determined entity information, where in fig. 7, the plurality of features include an entity feature, a linkage feature, a problem feature, and an entity feature, the entity feature corresponds to a matching degree between the entity and the candidate, the linkage feature corresponds to a linkage degree between the entity information, the problem feature corresponds to a matching degree between a subgraph of the entity and the problem, and the entity feature corresponds to a type of the entity. Under the condition that the characteristics of the corresponding entity information are determined, comprehensively sequencing the obtained entity information through a GBDT (Gradient Boosting Decision Tree) algorithm so as to obtain entity information with the highest matching degree, and obtaining a corresponding candidate path from the entity information with the highest matching degree so as to complete entity link of the entity information.

Of course, in other embodiments, the corresponding plurality of features may be determined on a case-by-case basis, or calibrated according to actual testing. Various features of other embodiments may be the same or different in whole or in part.

Referring to fig. 9, in some embodiments, determining the target route according to the plurality of candidate routes and the acquired current environmental status information of the vehicle 10 includes:

051: similarity calculation is carried out according to the candidate paths and the environment state information to obtain probability values of the candidate paths;

052: the target path is determined based on the highest one of the plurality of probability values.

The intention generation method according to the embodiment of the present invention can be implemented by the server 20 according to the embodiment of the present invention. Specifically, please refer to fig. 2 and fig. 3, the control module 21 is configured to perform similarity calculation according to the multiple candidate paths and the environmental status information to obtain multiple probability values sequentially corresponding to the multiple candidate paths; and for determining the target path based on the highest one of the plurality of probability values.

In this manner, disambiguation of other candidate paths may be achieved.

Specifically, please refer to fig. 10, in a case where multiple candidate paths are obtained, language characterization (T-Embeddings) may be performed on the candidate paths. Wherein, according to the type of the data, the candidate path may include text data (text) and entity data (entity); the candidate path may include token information (token), header information (head), tail information (tail), and type information (type) according to the kind of information. That is, in the embodiment shown in fig. 10, each different kind of information has corresponding text data and entity data. Wherein, the position information of the candidate path can be determined according to the head information and the tail information. In other embodiments, the text data in the candidate paths may be generated by encoding using a BERT algorithm (Bidirectional Encoder retrieval from transforms).

Further, in the case that all text data and entity data in the candidate path have been determined, language representation information (text embedding) may be correspondingly generated.

It should be noted that, on the basis, by adding entity data, semantic information of text data can be enriched; by adding and encoding type information, effective association with relevant information (such as control entities) of the intention knowledge graph in subsequent model training can be conveniently formed.

Referring to fig. 10 again, when the environmental status information is obtained, knowledge representation (K-Embeddings) can be performed on the environmental status information. Wherein the environment state information comprises path data (path) and state data (status). The path data may include a candidate path code. The path data can be used to stitch together by the Complex algorithm embedding the individual entities and relationships between entities in the intent knowledge graph into fixed size vectors. The state data may include an environment information state code for the corresponding control in the candidate path.

Further, when the path data and the state data in the environment state information are determined, knowledge representation information (knowledge embedding) can be generated correspondingly.

In the case of acquiring the language representation information, language coding (T-Encoder) may be performed on the language representation information to obtain language coding information, and the obtained language coding information may be combined with knowledge representation information to perform knowledge coding (K-Encoder), so that text output information (text output) and knowledge output information (knowledge output) may be generated. The language coding can be used for extracting basic lexical and syntactic information of language representation information, and the knowledge coding can be used for fusing data information in the intention knowledge graph into a training model.

It should be noted that, in the embodiment shown in fig. 10, vectorization processing may be performed on data through self-attention (self-attention mechanism), summation and normalization processing may be performed on data through add & layerNorm, and deep learning processing is performed according to data through fnn (fed forward neural network), so that text output information and knowledge output information may be finally obtained.

Under the condition of acquiring the text output information and the knowledge output information, the text output information and the knowledge output information can be fused through an MLP (Multi Layer Perception), the similarity between the language coding information and the knowledge representation information is calculated, so that the coding information interaction vector corresponding to the language coding information and the knowledge representation information can be obtained, and the probability value (score) corresponding to the candidate path is finally output.

On the basis, as the corresponding probability value can be obtained according to each candidate path, under the condition that the corresponding multiple probability values are obtained by combining all the candidate paths with the environmental state information, the highest probability value can be obtained, and the target path can be determined according to the candidate path with the highest corresponding probability value.

In addition, referring to fig. 10 and fig. 11, in the embodiment shown in fig. 11, the control module 21 may include a processor 25, and the processor 25 may have M knowledge encoding modules 26 and N language encoding modules 28. The M knowledge encoding modules 26 are sequentially connected with each other in series, so that the result output by one knowledge encoding module 26 is used as the input of the subsequent knowledge encoding module 26, and a knowledge encoding module 27 is formed; the N language coding modules 28 are connected one after another in sequence, so that the result output by one of the language coding modules 28 is used as the input of the subsequent language coding module 28, and forms a language coding module 29. The language coding module 29 is used for coding the language representation information. The knowledge encoding module 27 is used for encoding the language encoding information and the knowledge representation information.

On the basis of the foregoing embodiment, as for the language representation information, the language representation information may be encoded sequentially by all the language encoding modules 28, so as to obtain language encoding information; for language coding information and knowledge representation information, the language coding information and the knowledge representation information are fused by inputting the information to the first knowledge coding module 26 for coding, the output data fused by the first knowledge coding module 26 is input to the subsequent knowledge coding modules 26 for coding in sequence, and the final text output information and the final knowledge output information can be output by the last knowledge coding module 26.

Additionally, the number of knowledge encoding modules 26, and the number of N language encoding modules 28, may be determined based on the processing power of encoding the data. In some embodiments, M may be 16, 32. In other embodiments, N may be 16, 32.

Further to the Processor 25, in some embodiments, the Processor 25 may be a Central Processing Unit (CPU), other general purpose Processor 25, a Digital Signal Processor 25 (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. The specific structure and type of processor 25 in other embodiments is not limited herein.

Referring to fig. 12, in some embodiments, processing the target path to generate the target intent includes:

061: determining a combined node according to the target path;

062: under the condition that the combined node meets a preset condition, generating a combined entity according to the combined node;

063: a target intent is generated from the combined entities.

The intention generation method according to the embodiment of the present invention can be implemented by the server 20 according to the embodiment of the present invention. Specifically, please refer to fig. 2 and fig. 3, the control module 21 is configured to determine a combination node according to the target path; and generating a combined entity according to the combined node under the condition that the combined node meets the preset condition; and for generating the target intent from the combined entities.

In this way, the generation of the user intent can be achieved.

In some embodiments, the composition nodes may include action nodes, entity object nodes, and constraint nodes. Specifically, in such an embodiment, the voice request made by the user is "turn on the air conditioner to twenty-three degrees", and in the case where the corresponding target route is acquired, it is possible to determine that the purpose of the voice request is intended to be "turn on the air conditioner" (corresponding to the first target route) and "set the air conditioner temperature to twenty-three degrees" (corresponding to the second target route). For the first target path, the action node may be determined to be "OpenAction", the entity object node is "air conditioner", and the constraint node is an empty set (i.e., "}"). For the second target path, the action node may be determined to be "SetAction", the entity object node is "air conditioner temperature", and the constraint node is "{ air conditioner.

Further, in a case where the action node, the entity object node, and the constraint node of each of the first target path and the second target path all satisfy the preset condition, a combined entity may be generated accordingly. The combination entity corresponding to the first target path is "{ OpenAction, { air conditioner }, { } } and the combination entity corresponding to the second target path is" { SetAction, { air conditioner temperature }, { air conditioner. Under the condition that the combined entities respectively corresponding to the first target path and the second target path are obtained, the corresponding target intentions can be finally determined, the air conditioner can be controlled to be started according to the generated target intentions, and the temperature of the air conditioner is set to twenty-three degrees.

In addition, the preset condition may be to detect whether any one of the action node, the entity object node, and the constraint node has information obtained according to the voice request.

In some embodiments, the intent generation method comprises:

and sending out corresponding prompt information under the condition that the combined node does not meet the preset condition.

The intention generation method according to the embodiment of the present invention can be implemented by the server 20 according to the embodiment of the present invention. Specifically, please refer to fig. 2 and fig. 3, the control module 21 is configured to send out corresponding prompt information when the combination node does not satisfy the preset condition.

Therefore, the problem that the corresponding operation cannot be executed because the purpose intention of the user cannot be recognized can be avoided.

Specifically, in such an embodiment, the voice request issued by the user is "air conditioner temperature set", and according to the foregoing embodiment, the corresponding action node may be determined to be "SetAction", and the entity object node may be "air conditioner". However, for the constraint node, the name "air conditioner" value "and the data type" pool "may be determined, but the value thereof cannot be determined, so that the generated combined entity is" { SetAction, { air conditioner temperature }, { air conditioner. The operating temperature requiring the air conditioner setting cannot be determined according to the voice request, wherein false indicates that the corresponding node cannot be determined through the voice request. Then, it can be determined that the constraint node cannot meet the preset condition, and it can be further determined that the corresponding target intention cannot be generated.

Further, under the condition that the constraint node is determined not to satisfy the preset condition, the server 20 may send a corresponding prompt message to the vehicle 10, so that the vehicle 10 may prompt the user according to the received prompt message, and the user may supplement the relevant information according to the prompt of the vehicle 10, and thus the constraint node may be determined according to the supplemented relevant information until it is determined that the action node, the entity object node, and the constraint node all satisfy the preset condition. On the basis of the above embodiment, in such an embodiment, the vehicle 10 may issue a voice prompt message of "how many degrees the air-conditioning temperature needs to be set to" to ask for a request "to perform voice interaction with the user, and after determining that the value of the constraint node is" twenty-three degrees "according to the voice message supplemented by the user, the constraint node may be supplemented accordingly, so that it may be determined that the constraint node satisfies the preset condition, and further, the target intention may be determined, and the air-conditioning temperature may be set to twenty-three degrees according to the target intention. Of course, in other embodiments, the vehicle 10 may also implement the function of prompting the user through other means (e.g., light, buzzer).

Referring to fig. 2, the embodiment of the invention provides a voice control system 100, and the voice control system 100 includes a vehicle 10 and a server 20. The vehicle 10 is used to obtain and send voice requests. The server 20 is used for performing entity extraction on the voice request and determining entity information under the condition that the voice request is received; the system comprises a preset intention knowledge graph, entity information and a database, wherein the intention knowledge graph is used for associating with the entity information so as to inquire knowledge; and for obtaining a plurality of candidate paths through knowledge query; the target route is determined according to the plurality of candidate routes and the acquired current environment state information of the vehicle 10; and the system is used for processing the target path to generate a target intention.

The voice control system 100 can judge different actual intentions of the user in different scenes by performing semantic understanding on the voice request sent by the user and combining the current environment state information even if the voice request is the same, and has good applicability.

It is understood that, with the speech control system 100 of the present invention, the server 20 may establish and update the intention knowledge map without being communicatively connected (offline) with the vehicle 10, and may receive a speech request transmitted by the vehicle 10 while the vehicle 10 is communicatively connected (online) and process the speech request according to the intention generation method of the present invention to determine the user's target intention, so that the vehicle 10 may be caused to perform corresponding operations and actions according to the user's target intention. The specific flow and principle of processing the voice request by the server 20 have been described in detail in the foregoing embodiments, and are not expanded in detail here for avoiding redundancy.

The embodiment of the invention provides a computer readable storage medium on which a computer program is stored. The computer program, when executed by a processor, implements the intent generation method of any of the embodiments described above.

For example, in the case where the program is executed by a processor, the steps of the following boot method are implemented:

04: acquiring a plurality of candidate paths through knowledge query;

06: and processing the target path to generate a target intention.

The computer-readable storage medium may be provided in the server 20, or may be provided in another terminal device, and the server 20 may be capable of communicating with the other terminal device to obtain the corresponding program.

It is understood that the computer-readable storage medium may include: any entity or device capable of carrying a computer program, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), software distribution medium, and the like. The computer program includes computer program code. The computer program code may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), software distribution medium, and the like.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processing module-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.

In the description of the specification, references to the terms "one embodiment", "some embodiments", "certain embodiments", "illustrative embodiments", "examples", "specific examples", or "some examples", etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. An intent generation method in voice interaction for a vehicle, the intent generation method comprising:

under the condition of receiving a voice request, performing entity extraction on the voice request to determine entity information;

associating the preset intention knowledge graph with the entity information so as to inquire knowledge;

acquiring a plurality of candidate paths through knowledge query;

determining a target path according to the candidate paths and the acquired current environment state information of the vehicle;

and processing the target path to generate a target intention.

2. The intent generation method according to claim 1, characterized in that it comprises:

determining a plurality of control entities of the vehicle and containment relationships between the plurality of control entities;

determining an action entity corresponding to the control entity and an intention relationship between the action entity and the control entity;

and establishing the intention knowledge graph according to the control entity, the containing relation, the action entity and the intention relation.

3. The intent generation method according to claim 1,

under the condition of receiving a voice request, performing entity extraction on the voice request, and determining entity information, wherein the entity extraction comprises the following steps:

preprocessing the voice request;

carrying out entity identification according to a word list library and determining the entity information;

associating the entity information according to a preset intention knowledge graph so as to perform knowledge query, wherein the method comprises the following steps:

and carrying out entity linkage according to the entity information.

4. The intention generation method according to claim 3, wherein the number of the entity information is plural,

and carrying out entity linkage according to the entity information, comprising the following steps:

confirming a plurality of characteristics corresponding to each entity information;

and sequencing the entity information according to the characteristics to obtain the entity information with the highest matching degree, and acquiring the candidate path according to the entity information with the highest matching degree.

5. The intent generation method according to claim 1,

determining a target path according to the candidate paths and the acquired current environment state information of the vehicle, wherein the determining comprises the following steps:

similarity calculation is carried out according to the candidate paths and the environment state information, and probability values corresponding to the candidate paths in sequence are obtained;

determining the target path according to a highest one of the plurality of probability values.

6. The intent generation method of claim 1, wherein the environmental status information comprises at least one of:

the current page information displayed by the display device on the vehicle comprises control information and a control state which are displayed corresponding to the current page;

status information of entities in the current candidate path.

7. The intent generation method according to claim 1, wherein processing the target path to generate a target intent comprises:

determining a combined node according to the target path;

under the condition that the combined node meets a preset condition, generating a combined entity according to the combined node;

generating the target intent from the combined entity.

8. The intent generation method according to claim 7, characterized in that it comprises:

and sending corresponding prompt information under the condition that the combined node does not meet the preset condition.

9. A server used for being in communication connection with a vehicle is characterized by comprising a control module and a voice receiving module, wherein the voice receiving module is used for receiving a voice request sent by the vehicle,

the control module is used for performing entity extraction on the voice request under the condition of receiving the voice request and determining entity information; and

the system comprises a database, a target entity information acquisition module and a target entity information acquisition module, wherein the target entity information acquisition module is used for acquiring target entity information; and

the method comprises the steps of obtaining a plurality of candidate paths through knowledge inquiry; and

the target route is determined according to the candidate routes and the acquired current environment state information of the vehicle; and

and the target path is processed to generate a target intention.

10. A voice control system, the voice control system comprising:

a vehicle to obtain and send a voice request;

the server is used for carrying out entity extraction on the voice request under the condition of receiving the voice request and determining entity information; and

and the target path is processed to generate a target intention.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the intent generation method of any of claims 1-8.