Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description will be given to the specific implementation, method, structure, features and effects of the method and system for managing dialog states of an electric vehicle according to the present invention, in conjunction with the accompanying drawings and the preferred embodiments.
Fig. 1 is a schematic flow chart of an embodiment of a dialogue state management method of an electric vehicle of the invention. Referring to fig. 1, an example of a dialog state management method for an electric vehicle according to the present invention includes:
step S11, analyzing the input requirement, and converting the input requirement information into semantic understanding representation which can be used by the dialogue state management system.
And S12, obtaining the task to be executed corresponding to the demand information through a retrieval engine and a machine learning sequencing model according to the semantic understanding expression analyzed from the demand information.
Specifically, the search can be performed in a search library according to the semantic understanding representation, a plurality of pieces of data ranked in front of the relevance of the semantic understanding representation are obtained from the search library, each piece of retrieved data is defined as a search result, the search results are reordered by using a machine learning ordering model, one search result most relevant to the requirement is obtained from the plurality of search results, the most relevant search result is defined as a reordering result, and the reordering result is output; it should be noted that each piece of data in the search library includes the preset or past semantic understanding representation and the to-be-executed task information corresponding to the preset or past semantic understanding representation, so that the search result obtained by searching the semantic understanding representation in each piece of data in the search library and the reordering result obtained by reordering can include the to-be-executed task information.
Fig. 2 is a schematic flow chart of an embodiment of a dialogue state management method for an electric vehicle according to the invention. Fig. 3 is a schematic diagram of an embodiment of the dialogue state management method of the electric vehicle. Referring to fig. 2 and fig. 3, the electric vehicle dialog state management method according to the embodiment of the present invention includes:
step S21, converting the input requirement information (Query) into a semantic understanding table (Belief table) form. It should be noted that the input mode is not limited to voice input, and may be interactive modes such as pictures, texts, touch, gestures, or a combination of multiple interactive modes.
Specifically, the semantic understanding table is composed of multidimensional semantic understanding feature fields, and includes:
a Domain field (Domain) for recording a Domain requiring information, such as navigation, call, weather inquiry, music, etc.; in fact, each domain can be considered as a sort of demand information;
an intention field (Intent) for recording specific intention information in the demand information, which may in fact be regarded as a specific description of the information identified by the field, or as a further classification of each field of the demand information, e.g. when the field is a telephone, the intention field may be a dialed call, an accepted call, etc., when the field is music, the intention field may be a search for and play music, a pause in music play, etc.;
a language Slot information List field (Slot List) for recording all Slot information (Slot) of the demand information, each Slot information recording information of a word extracted from the input demand information; the word information includes the content of the word and may also include the category of the word, specifically, the category of the word sense of the word. In fact, the category of the word may be used as the name of a slot, and the content of the word may be used as the specific parameter of the slot, so as to map an infinite word to a limited slot, and in the following searching or sorting step, the name of the slot may be searched or reordered without the parameter of the slot, for example, in the case that the word is a proper noun; or the content of a word may be taken as the name of a slot without a parameter for that slot, such as the case where the word is a part of a verb.
The input requirement information can be completely expressed by utilizing a semantic understanding table consisting of three fields, namely a field, an intention field and a language slot information list field.
Optionally, the semantic understanding table may further include a Question Type field (Question Type) for recording a Question Type of the requirement information, for example, y/n may be used to indicate whether the requirement information is a Question of Type or not, where may indicate that the requirement information is a Question of Type of query location, and how _ many indicates that the requirement information is a Question of Type of query quantity.
Further, each semantic understanding characteristic field may include a portion for indicating the accuracy of the content recorded in the field. For example, a Confidence level (Confidence) may be included in each of the field, the intention field, and the language slot information list field, where the Confidence level is a real number ranging from 0 to 1, and a higher Confidence level for a feature field indicates a higher accuracy of the recorded content in the feature field.
And obtaining each semantic understanding characteristic field of the required information through machine learning, and obtaining each slot position information in the language slot position information list field through syntactic analysis.
In a specific example, if the input requirement information is { Query = i want to go to a restaurant, and navigate me from the fifth road junction to tokyo }, the resulting semantic understanding table may be:
wherein domain, confidence =0.8 is a domain field, the domain representing the requirement is navigation (navigation), and the accuracy of the domain is navigation is 0.8.intent is used for representing the intention field, slot list is used for representing the language slot information list field, and the specific format is similar to the field. In particular, "tag": the restaurant, the from _ loc, the fifth crossing, the dest _ loc and the Tanjin are slot information (slot) respectively. Specifically, for example, in "from _ loc" and "five crossing", the word "five crossing" in the demand is recorded in the form of a slot: the "from _ loc" records the name of the slot, and in fact is a category of "five-way crossing," which records the parameters of the slot.
And step S22, updating and maintaining the semantic understanding in multiple rounds of man-machine conversation. And combining the semantic understanding required in the current round of conversation obtained in the step S21 with the semantic understanding required in each previous round, so that the semantic understanding is updated, added and deleted in a plurality of rounds of man-machine conversation one by one, and the semantic understanding information of each round of interaction is maintained to make the requirement clear gradually.
When a user performs a T-th round of interaction with the dialog state management system, combining and updating a requirement understanding (Query Belief (T)) of the T-th round and a total requirement understanding (Context (T-1)) of each round before the T-th round to generate a total requirement understanding (Context (T)) of the T-th round, wherein a concrete implementation formula is as follows: context (T-1) + Query Belief (T) = Context (T).
It should be noted that the updating and maintaining of the semantic history context can be performed with respect to the semantic understanding table (belief table), and the required semantic understanding table is updated, added, and deleted each time the user interacts with the dialog state management system.
And step S23, obtaining a reordering result through the search engine and the reordering model according to the updated maintained semantic understanding representation obtained in the step S22, wherein the reordering result comprises a task to be executed, and sending the task to a corresponding execution module or a control system so as to execute operation and reply to a user according to the task. The task information may include a Goal (Goal) field, an Action (Action) field, and a reply field. The target is the category of the task, and the task can be used for judging which execution module the task should be sent to; an action is a specific instruction or command to be performed by the task, which may be, for example, a control command to change the vehicle operating state, or an instruction to invoke a service in a vehicle control system; the reply is information that the task needs to be fed back to the user, including but not limited to a Text reply, a Natural Language Generation (NLG) based reply, such as a Speech reply (TTS), a Text to Speech (interactive) response, and the like.
It is noted that semantic understanding (Belief) is a combination of many elements, equivalent to the variable X = (X) 1 ,x 2 ,x 3 8230), and the task to be performed is a limited set of operation classes, corresponding to the preferred set y. The role of step S23 may be understood as to create a mapping from semantic understanding to the task to be performed, y = f (X), and the mapping is implemented by the search engine and the ranking model. In practice, a mapping from semantic understanding to task to be performed is established, which may also be understood as classifying semantic understanding via mapping a plurality of elements in semantic understanding to a classification set, where the elements in the classification set are limited categories of task to be performed, such as { call, listen to song, navigate }. It should be noted that, although the set of tasks to be executed is a limited set, after mapping, the finally obtained tasks to be executed will have information passed down by semantic understanding, for example, a whitish windmill of zhougeny is recorded in semantic understanding, and the obtained tasks to be executed include that a singer is a zhougeny and a singer is a whitish windmill, in addition to the requirement of "listening to songs".
The set of tasks to be executed can be split differently according to different specific fields, for example, in a vehicle-mounted system, in the navigation field, the set can be split into a Point of Interest (POI) query, a proximity search query, and the like, and in the media stream field, the set can be split into a query song, a play/pause song, and the like.
Optionally, the reordering result may be recorded in a history log, and in addition, in the case that the reordering result includes semantic understanding information, the reordering result may be used to perform semantic updating and maintaining in step S22, so that in step S22, a next round of updating and maintaining may be performed based on the semantic understanding included in the reordering result.
In one embodiment, step S23 may include the following specific steps:
in step S231, a semantic search request is generated from the semantic understanding table and other search information. The type of other retrieved information is not limited, for example, in one example, the other retrieved information may include one or some of vehicle driving status information (vehicle speed), vehicle location information, vehicle status information, user personal information (e.g., gender, age), user preference information (e.g., whether the user likes to walk at a high speed or walk a short way), and the like. Optionally, when the feature field of the semantic understanding table includes a confidence level, the semantic retrieval request may further include a confidence level (or may not include a confidence level).
In step S232, the semantic search request obtained in step S231 is searched in the search library by the search engine in an all-or manner, and a search result is obtained and a search correlation score of the search result is obtained. Each piece of data in the search library includes semantic understanding representation information and information of a task to be executed, and may also include corresponding other search information, for example, may be a semantic understanding representation of a preset or past requirement and information of a task to be executed corresponding to the preset or past requirement. Each retrieval result contains information of a task to be executed and can also contain corresponding semantic understanding representation information and other retrieval information. The information of the task to be executed can be composed of a target field, an action field and a reply field. The retrieval relevance score is a commonly used index which is obtained by a retrieval engine and used for representing the accuracy degree of a retrieval result, and can be obtained by calculation according to a known means. In one embodiment, a preset number of search results with top-ranked search relevance scores or a plurality of search results with search relevance scores greater than a preset score can be selected; in addition, the selected retrieval results can be ranked from high to low according to the retrieval relevance score.
The above-described overall or mode means: if the semantic retrieval request is { domain: "call", and intent: "search number" }, then all results including { domain: "call" } are obtained and all results including { intent: "search number" } are also obtained when the retrieval is performed in an all-or manner.
The search of the search library is adopted because the semantic understanding state can be formed by the combination of the Domain (Domain), the intention (Intent) and the Slot information (Slot), and the total number of the combination M reaches M =2 |Domain| *2 |Intent| *2 |Slot| Wherein | Domain |, | Intent |, and | Slot | respectively represent the respective set sizes of the field, intention, and Slot information, so that the number of combinations of semantic understanding states reaches an exponential level, and all semantic understanding states cannot be covered by a common processing mode. The most relevant data can be obtained from the limited data set in the search library by searching the search library, so that the problem complexity can be effectively reduced. And with the continuous expansion and growth of data sets in the search base, the relevance of the search can be obviously improved.
In the search engine, a set of tasks to be executed can be added or modified at any time, some conversation states are added in the search engine platform at any time, and existing data are deleted or modified, so that real-time or semi-real-time data synchronization is realized, and the conversation state management system is easy to maintain.
The dialogue state retrieval engine is also used for firstly establishing indexes for the semantic understanding table to the retrieval base for querying. The data for indexing is derived from historical logs or manual annotations, and each piece of data comprises fields, intents, slot information, targets, actions, replies, and other relevant fields.
Fig. 4 is a schematic diagram of data stored in the search library. Each row in fig. 4 represents a piece of data in a search library, and each column represents parameter information included in the data, and in fact, if the data in the search library is derived from a man-machine conversation history log, each piece of data may be a conversation state of past needs. Each piece of data comprises semantic understanding table information, other retrieval information and task information to be executed, wherein the semantic understanding table information specifically comprises field, intention and slot information, the other retrieval information comprises gender and age, and the task information to be executed comprises target, action and reply. In particular, the slot information may include only the category of each word in the input requirement, and not the content of the word, for example, the slot information list in the data of the first line in fig. 4 includes the names of two slots: from _ loc and dest _ loc, and the specific contents "at sanritun", "university of qinghua" of these two slots are not recorded.
When the search library is searched, searching the field, the intention, the slot position information, the gender and the age in the search library according to the semantic search request; and the returned search result is the whole data containing the target, the action and the reply, so that the corresponding information of the task to be executed can be obtained through searching the search library. Therefore, the column for recording the semantic understanding table information and the column for recording the other search information are not collectively referred to as a search column.
It is noted that although the examples in the figures do not include a confidence level, in some examples each search column may have a corresponding confidence level. The search column is not limited to those listed in the drawings, and may include other columns and may be added as needed.
Step S233, using the machine learning sequencing model to reorder the search result received from step S232, to obtain a result most relevant to semantic understanding (Belief), which is not called a reordering result, and sending the reordering result to a corresponding execution module or control system to execute the task to be executed included in the reordering result.
In a specific example of step S233, step S233 specifically includes the steps of:
step S2331, extracting sorting characteristics from all search results, semantic search requests and search relevance scores; the ranking features include, but are not limited to: the method comprises the following steps of (1) specifically extracting a plurality of ranking features, namely, a field, an intention, slot position information, confidence, retrieval relevance score, vehicle-mounted state, vehicle speed, vehicle position, user personal information, user preference and the like, wherein the specifically extracted ranking features can be selected manually, and generally, the more the extracted ranking features are, the better the extracted ranking features are;
step S2332, extracting sequencing data from preset or past dialogue state data, wherein each piece of sequencing data comprises a semantic understanding representation, a semantic retrieval request corresponding to the semantic understanding representation, a corresponding task to be executed and marking information of the sequencing data, and the sequencing data can be extracted from dialogue state history log data of man-machine dialogue for example; the specific content of the annotation information may be different according to different annotation strategies, for example, when a user feedback type annotation strategy is adopted, the annotation information may be specific feedback of the user, and when a heuristic annotation strategy is adopted, the annotation information may be information indicating the execution condition of the task to be executed in the sequencing data, such as whether the task is executed, whether the task is completed, or whether the task is denied by the user; optionally, data may also be cleaned for reviewing and verifying the data, including checking data consistency, and processing invalid values and missing values, so as to delete incomplete invalid data;
step S2333, performing machine learning training according to the sorting characteristics and the sorting data to obtain a machine learning sorting model;
step S2334, obtaining a ranking relevance score based on the ranking characteristics of each retrieval result by using the machine learning ranking model after learning training; the ranking relevance score is an index used for indicating the accuracy degree that the input retrieval result corresponds to the requirement information of the current conversation, and the higher the ranking relevance score is, the more the retrieval result corresponds to the requirement information of the current conversation, and the higher the ranking of the retrieval result is; it should be noted that, as the dialog state historical data increases, the learning ability of the ranking model is enhanced, and the accuracy of the ranking relevance score obtained by the ranking model is increased;
step S2335, one of all the search results with the highest ranking correlation score (highest ranking) is sent to the corresponding execution module or the vehicle control system as the re-ranking result, so as to execute the task in the re-ranking result.
Note that various strategies can be used to label the sorted data, not limited to the following:
manually marking a strategy, namely manually marking whether a result obtained by the dialogue state management system meets the input requirement or not;
the user feedback type marking strategy marks according to the feedback of a large number of users, and specifically can be that some 'praise' and 'step' buttons are arranged on the human-computer interaction equipment, so that the users can feed back, more 'praise' is regarded as relevant, and more 'step' is regarded as irrelevant;
the heuristic marking policy may specifically be that, in the log data, the dialog state management system returns a task to be executed to the user, and if the user rejects the task to be executed obtained by the system in the next interaction, or the task to be executed is not executed or is not completed, the data is marked as irrelevant; and if the user does not deny the task to be executed obtained by the system in the next round of interaction or the task to be executed is accepted by default or the task to be executed is completed, marking the data as relevant.
Further, in step S23, a correlation threshold detection may be performed to detect whether the reordering result passes the search correlation threshold detection and/or whether the reordering result passes the reordering correlation threshold detection. And if the retrieval correlation score of the reordering result reaches a preset first threshold value, judging that the retrieval correlation threshold value detection is passed, and if the sequencing correlation score of the reordering result reaches a preset second threshold value, judging that the reordering correlation threshold value detection is passed. Outputting the reordering result if the reordering result passes the performed threshold detection; if the reordering result fails to pass any threshold detection, the reordering result is not output, but the feedback cannot identify the input requirement. Alternatively, only reordering correlation threshold detection may be performed.
In one specific example of step S23, the input requirement is "i want to go to a restaurant, navigate me from five road junctions to beijing". According to the requirement, a semantic retrieval request can be obtained, and the semantic retrieval request can be:
{domain:"navigation",confidence=0.8;
intent:"navigate",confidence=0.9;
slot list = { "tag": restaurant ", confidence =0.9; from _ loc, crossing five, confidence =0.6; "dest _ loc": in Tanjin ", confidence =0.7} }.
And (3) putting all the characteristic fields and corresponding Confidence degrees (Confidence) in the semantic retrieval request into a retrieval library for retrieval in a full or mode to obtain N retrieval results with retrieval relevance scores ranked first. In this example, the search result may be:
search result 1 (search relevance score = 0.8) goal = "find nearby restaurants", action = "call group app, find nearby restaurants", reply = "find next restaurants for you";
search result 2 (search relevance score = 0.75) goal = "navigate", action = "invoke navigation app, inject start and destination parameters to navigation app", reply = "navigate to $ dest _ loc for you, route is being planned for you" \ 8230;
…
search result N (search relevance score = 0.70) \ 8230;.
And extracting the ranking features and the ranking data, obtaining a ranking model through machine learning training, and bringing each retrieval result into the ranking model to obtain a ranking correlation score. The extracted features include the domain, the intention, the slot position information, the vehicle-mounted state, the vehicle speed, the vehicle position, the personal information of the user, the preference of the user, the confidence degrees of the features, the retrieval relevance scores of the N retrieval results and the like. Finally, if the ranking relevance score obtained by the retrieval result 2 is the highest and is larger than the manually set relevance threshold (for example, the score interval is 0-1.0 score, and the threshold can be set to 0.7 score), the retrieval result 2 is selected as the output of the requirement of the dialog state management system for the current round of dialog. After completing multiple rounds of conversations, the vehicle-mounted system will execute the search results of the final round of selection.
Note that in this example, the original requirement information entered contains two requirements, one is to navigate and the other is to find restaurants, and the navigation is the real requirement of the user. However, the search result obtained by the search engine may not have the highest search relevance score for the actual user requirement (search result 2). In order to solve the problem, the invention adds a reordering step, and performs the sequencing based on machine learning on the retrieval result again to obtain the real requirement of the user.
Fig. 5 is a schematic block diagram of an embodiment of the dialogue state management system of the electric vehicle. Referring to fig. 5, the dialogue State management system of the electric vehicle according to the embodiment of the present invention includes a semantic understanding (Query) module 100, a Dialogue State Tracking (DST) module 200; the semantic understanding module 100 is configured to analyze an input requirement, and convert input requirement information into a semantic understanding representation that can be utilized by the dialog state management system; the dialog state tracking module 200 is configured to obtain a task to be executed corresponding to the requirement information through a search engine and a machine learning sequencing model according to the semantic understanding representation analyzed from the requirement information.
The dialog state tracking module 200 includes a retrieval sub-module 210 and a reordering sub-module 220. The retrieval submodule is used for retrieving in the retrieval base according to the semantic understanding representation, obtaining a plurality of pieces of data with the relevance ranking with the semantic understanding representation, and defining each piece of retrieved data as a retrieval result. The reordering submodule is used for reordering the retrieval results by utilizing the machine learning ordering model, obtaining a retrieval result most relevant to the requirement from a plurality of retrieval results, defining the most relevant retrieval result as the reordering result, and outputting the reordering result. It should be noted that each piece of data in the search library includes a preset or past semantic understanding representation and task information to be executed corresponding to the preset or past semantic understanding representation, so that a search result obtained by searching the semantic understanding representation in each piece of data in the search library and a reordering result obtained by reordering can include the task information to be executed.
Fig. 6 is a schematic block diagram of an embodiment of the dialogue state management system of the electric vehicle according to the invention. Referring to fig. 6 and 3, the dialog state management system of the electric vehicle according to the example of the present invention includes a semantic understanding (Query Belief) module 100, a semantic Update and maintenance (Update Belief) module 300, and a Dialog State Tracking (DST) module 200.
The semantic understanding module 100 is configured to convert the input requirement (Query) information into a semantic understanding table (Belief table) format, and send the semantic understanding table to the semantic updating and maintaining module 300. Note that the input mode is not limited to voice input, and may be an interactive mode such as a picture, a text, a touch, a gesture, or a combination of multiple interactive modes.
Specifically, the semantic understanding table is composed of multidimensional semantic understanding feature fields, and includes:
a Domain field (Domain) for recording a Domain requiring information, such as navigation, call, weather inquiry, music, etc.; in fact, each domain can be considered as a sort of demand information;
an intention field (Intent) for recording specific intention information in the requirement information, which may be regarded as a specific description of the information indicated by the field or as a further classification of each field of the requirement information, for example, when the field is a telephone, the intention field may be a telephone call, etc., and when the field is music, the intention field may be a search and play music, a pause of music play, etc.;
a language Slot information List field (Slot List) for recording all Slot information (Slot) of the demand information, each Slot information recording information of a word extracted from the input demand information; the word information includes the content of the word and may also include the category of the word, specifically, the category of the word sense of the word. In fact, the category of the word may be used as the name of a slot, and the content of the word may be used as the specific parameter of the slot, so as to map an infinite word to a limited slot, and in the dialog state tracking module 200, the name of the slot may be retrieved or sorted instead of the parameter of the slot, for example, in the case that the word is a proper noun; or the content of a word may be taken as the name of a slot without a parameter for that slot, such as the case where the word is a part of a verb.
The input requirement information can be completely expressed by utilizing a semantic understanding table consisting of three fields, namely a field, an intention field and a language slot information list field.
Optionally, the semantic understanding table may further include a Question Type field (Question Type) for indicating a Question Type of the requirement information, for example, y/n may be used to indicate whether the requirement information is a Question of a judgment Type, where the requirement information is a Question of a query location Type, and how _ many is used to indicate that the requirement information is a Question of a query quantity Type.
Further, each semantic understanding characteristic field may include a portion for indicating the accuracy of the content recorded in the field. For example, the field, the intention field, and the language slot information list field may each include a Confidence level (Confidence), which is a real number ranging from 0 to 1, and a higher Confidence level for a feature field indicates a higher accuracy level of the recorded content in the feature field.
Each semantic understanding characteristic field of the required information can be obtained through machine learning, and each slot position information in the language slot position information list field can be obtained through syntactic analysis.
In a specific example, if the input requirement information is { Query = i want to go to a restaurant, and navigate me from the fifth road junction to tokyo }, the semantic understanding table obtained by the semantic understanding module 100 may be:
wherein domain, confidence =0.8 is a domain field, the domain representing the requirement is navigation (navigation), and the accuracy of the domain is navigation is 0.8.intent is used for representing the intention field, slot list is used for representing the language slot information list field, and the specific format is similar to the field. In particular, "tag": the restaurant, the from _ loc, the fifth crossing, the dest _ loc and the Tanjin are slot information (slot) respectively. Specifically, for example, in "from _ loc" and "five crossing", the word "five crossing" in the demand is recorded in the form of a slot: the "from _ loc" records the name of the slot, and in fact is a category of "five-way crossing," which records the parameters of the slot.
The semantic Update and maintenance (Update Belief) module 300 is configured to Update and maintain semantic solutions in multiple rounds of human-computer conversations, combine semantic understanding required in the current round of conversations with semantic understanding required in previous rounds, update, add, and delete semantic solutions in rounds of human-computer conversations, and maintain semantic understanding information of each round of interaction to make the requirements clear gradually.
When a user performs a T-th round of interaction with a dialog state management system, combining and updating a requirement understanding (Query Belief (T)) of the T-th round and a total requirement understanding (Context (T-1)) of each round before the T-th round to generate a total requirement understanding (Context (T)) of the T-th round, wherein a concrete implementation formula is as follows: context (T-1) + Query Belief (T) = Context (T).
It should be noted that the update and maintenance of the semantic history context may be performed for the semantic understanding table (belief table), and the required semantic understanding table is updated, added, and deleted every time the user interacts with the dialog state management system.
The dialog state tracking module 200 is configured to obtain a reordering result through the search engine and the reordering model according to the updated maintained semantic understanding table, where the reordering result includes a task to be executed, and send the task to a corresponding execution module or a corresponding control system, so as to execute an operation according to the task and reply to a user. The task information may include a Goal (Goal) field, an Action (Action) field, and a reply field. The target is the category of the task, and the task can be used for judging which execution module the task should be sent to; an action is a specific instruction or command to be performed by the task, which may be, for example, a control command to change the vehicle operating state, or an instruction to invoke a service in a vehicle control system; the reply is information that the task needs to feed back to the user, and includes, but is not limited to, a Text reply, a Natural Language Generation (NLG) based reply, such as a Speech reply (TTS, text to Speech), a response of an interactive interface (View), and the like.
It is noted that semantic understanding (Belief) is a combination of many elements, equivalent to the variable X = (X) 1 ,x 2 ,x 3 8230), and the task to be performed is oneThe limited set of operation classes, corresponding to the preferred set y. The role of the dialog state tracking module 200 may be understood as establishing a semantic understanding to the mapping of tasks to be performed, y = f (X), which mapping is implemented by the search engine and the ranking model. In practice, a mapping from semantic understanding to task to be performed is established, which may also be understood as classifying semantic understanding, and mapping a plurality of elements in semantic understanding to a classification set, where the elements in the classification set are limited categories of task to be performed, such as { call, listen to song, navigate }. It should be noted that, although the set of tasks to be executed is a limited set, after mapping, the finally obtained tasks to be executed will have information passed down by semantic understanding, for example, a whitish windmill of zhougeny is recorded in semantic understanding, and the obtained tasks to be executed include that a singer is a zhougeny and a singer is a whitish windmill, in addition to the requirement of "listening to songs".
The set of tasks to be executed can be split differently according to different specific fields, for example, in a vehicle-mounted system, in the navigation field, the set can be split into a Point of Interest (POI) query, a proximity search query, and the like, and in the media stream field, the set can be split into a query song, a play/pause song, and the like.
Optionally, the dialog state tracking module 200 may record the reordering result in a history log, and in addition, in a case that the reordering result includes semantic understanding information, may further send the reordering result to the semantic update and maintenance module 300, so that the semantic update and maintenance module 300 may perform the next round of update and maintenance based on the semantic understanding included in the reordering result.
In some embodiments, the dialog state tracking module 200 may specifically include: a retrieval request generation sub-module 230, a retrieval sub-module 210, and a reordering sub-module 220.
The search request generation sub-module 230 is configured to generate a semantic search request according to the semantic understanding table and other search information, and send the semantic search request to the search sub-module 210. The type of other search information is not limited, for example, the other search information may include one or some of vehicle driving state information (vehicle speed), vehicle position information, vehicle-mounted state information, user personal information (such as gender and age), user preference information (such as whether the user likes to walk at a high speed or a short way), and the like. Optionally, when the feature field of the semantic understanding table includes a confidence level, the semantic retrieval request may further include a confidence level (or may not include a confidence level).
The retrieval sub-module 210 includes a dialogue state retrieval engine, which is used to perform a search on the semantic retrieval request in a search library in an all-or-nothing manner, obtain a retrieval result and obtain a retrieval correlation score of the retrieval result, and send the retrieval result to the reordering sub-module 220. Each piece of data in the search library includes semantic understanding representation information and information of a task to be executed, and may also include corresponding other search information, for example, may be a semantic understanding representation of a preset or past requirement and information of a task to be executed corresponding to the preset or past requirement. Each retrieval result comprises information of a task to be executed and can also comprise corresponding semantic understanding representation information and other retrieval information. The information of the task to be executed can be composed of a target field, an action field and a reply field. The retrieval relevance score is a commonly used index which is obtained by a retrieval engine and represents the accuracy degree of a retrieval result, and can be obtained by calculation according to a known means. In one embodiment, a preset number of search results with top-ranked search relevance scores or a plurality of search results with search relevance scores greater than a preset score may be selected and sent to the reordering sub-module 220; in addition, the selected retrieval results can be ranked from high to low according to the retrieval relevance score.
The above all or modes mean: if the semantic retrieval request is { domain: "call", and intent: "search number" }, then all results including { domain: "call" } are obtained and all results including { intent: "search number" } are also obtained when the retrieval is performed in an all-or manner.
The search mode of the search library is adopted because the semantic understanding state can be formed by the combination of the Domain (Domain), the intention (Intent) and the Slot information (Slot), and the total number M of the combination reaches to the numberTo M =2 |Domain| *2 |Intent| *2 |Slot| The semantic understanding state comprises a field information set, a Slot information set and a Slot information set, wherein the field information set, the Intent information set and the Slot information set respectively represent the size of the field information set, the Intent information set and the Slot information set, so that the combination number of the semantic understanding states reaches an exponential level, and all the semantic understanding states cannot be covered in a common processing mode. The retrieval of the retrieval base can obtain the most relevant data from the limited data set in the retrieval base, thereby effectively reducing the problem complexity. And with the continuous expansion and growth of data sets in the search base, the relevance of the search can be obviously improved.
In the search submodule 210, a set of tasks to be executed can be added or modified at any time, some session states are added and existing data are deleted or modified at any time in the search engine platform, so that real-time or semi-real-time data synchronization is realized, and the session state management system is easy to maintain.
The dialogue state retrieval engine is also used for firstly establishing indexes for the semantic understanding table to the retrieval base for querying. The data for indexing is derived from historical logs or manual annotations, and each piece of data comprises fields, intents, slot information, targets, actions, replies, and other relevant fields.
Fig. 4 is a schematic diagram of data stored in the search library. Each row in fig. 4 represents a piece of data in a search library, and each column represents parameter information included in the data, and in fact, if the data in the search library is derived from a man-machine conversation history log, each piece of data may be a conversation state of past needs. Each piece of data comprises semantic understanding table information, other retrieval information and task information to be executed, wherein the semantic understanding table information specifically comprises field, intention and slot information, the other retrieval information comprises gender and age, and the task information to be executed comprises target, action and reply. In particular, the slot information may include only the category of each word in the input requirement, and not the content of the word, for example, the slot information list in the data of the first row in fig. 4 includes the names of two slots: from _ loc and dest _ loc, and the specific contents "san ritun", "qinghua university" of these two slots are not recorded.
When the search library is searched, searching the field, intention, slot position information, gender and age in the search library according to the semantic search request; and the returned search result is the whole data containing the target, the action and the reply, so that the corresponding information of the task to be executed can be obtained through searching the search library. Therefore, the column in which the semantic understanding table information is recorded and the column in which other search information is recorded are not collectively referred to as a search column.
It is noted that although the examples in the figures do not include a confidence level, in some examples each search column may have a corresponding confidence level. The search column is not limited to those listed in the drawings, and may include other columns and may be added as needed.
The reordering sub-module 220 reorders the search result received from the search sub-module 210 by using the machine learning ordering model, obtains a result most relevant to semantic understanding (Belief), which is not called as a reordering result, and sends the reordering result to a corresponding execution module or a control system to execute a task included in the reordering result.
In an example of the reordering sub-module 220, the reordering sub-module 220 specifically includes:
the sorting feature extraction unit is used for extracting sorting features from all retrieval results, semantic retrieval requests and retrieval relevance scores; the ranking features include, but are not limited to: the method comprises the following steps of (1) specifically extracting a plurality of ranking features, namely, a field, an intention, slot position information, confidence, retrieval relevance score, vehicle-mounted state, vehicle speed, vehicle position, user personal information, user preference and the like, wherein the specifically extracted ranking features can be selected manually, and generally, the more the extracted ranking features are, the better the extracted ranking features are;
the system comprises a sequencing data extraction unit, a processing unit and a processing unit, wherein the sequencing data extraction unit is used for extracting sequencing data from preset or past conversation state data, each piece of sequencing data comprises a semantic understanding representation, a semantic retrieval request corresponding to the semantic understanding representation, a corresponding task to be executed and label information of the sequencing data, and the sequencing data can be extracted from conversation state history log data of a man-machine conversation; the specific content of the annotation information may be different according to different annotation strategies, for example, when a user feedback type annotation strategy is adopted, the annotation information may be specific feedback of the user, and when a heuristic annotation strategy is adopted, the annotation information may be information indicating the execution condition of the task to be executed in the sequencing data, such as whether the task is executed, whether the task is completed, or whether the task is denied by the user; optionally, data may also be cleaned for reviewing and verifying the data, including checking data consistency, and processing invalid values and missing values, so as to delete incomplete invalid data;
the training unit is used for performing machine learning training according to the sequencing characteristics and the sequencing data to obtain a machine learning sequencing model;
the retrieval result ordering unit is used for obtaining an ordering correlation score based on the ordering characteristics of each retrieval result by using the machine learning ordering model after learning training; the ranking relevance score is an index used for indicating the accuracy degree that the input retrieval result corresponds to the requirement information of the current conversation, and the higher the ranking relevance score is, the more the retrieval result corresponds to the requirement information of the current conversation, and the higher the ranking of the retrieval result is; it should be noted that, as the historical data of the dialog states increases, the learning ability of the ranking model is enhanced, and the accuracy of the ranking correlation score obtained by the ranking model is increased;
and the output unit is used for sending one of all the retrieval results with the highest ranking relevance score (with the highest ranking) as a ranking result to the corresponding execution module or the control system of the vehicle so as to execute the tasks in the ranking result.
Note that a variety of strategies may be employed for ordering the annotation of data, not limited to the following:
manually marking a strategy, namely manually marking whether a result obtained by the dialogue state management system meets the input requirement or not;
the user feedback type marking strategy marks according to the feedback of a large number of users, and specifically can be that some 'praise' and 'step' buttons are arranged on the human-computer interaction equipment, so that the users can feed back, more 'praise' is regarded as relevant, and more 'step' is regarded as irrelevant;
a heuristic marking policy, which may specifically be that, in the log data, the dialog state management system returns to the user a task to be executed, and if the user rejects the task to be executed obtained by the system in the next interaction, or the task to be executed is not executed or completed, the data is marked as irrelevant; and if the user does not deny the task to be executed obtained by the system in the next round of interaction or the task to be executed is accepted by default or the task to be executed is completed, marking the data as relevant.
Further, the dialog state tracking module 200 may further include a correlation threshold detection sub-module 240 for detecting whether the reordering results pass the search correlation threshold detection and/or the reordering results pass the reordering correlation threshold detection. And if the retrieval correlation score of the reordering result reaches a preset first threshold value, judging that the retrieval correlation threshold value detection is passed, and if the sequencing correlation score of the reordering result reaches a preset second threshold value, judging that the reordering correlation threshold value detection is passed. Outputting the reordering result if the reordering result passes the performed threshold detection; if the reordering result does not pass any threshold detection by the correlation threshold submodule, the reordering result is not output, but the requirement that the input can not be identified is fed back. Alternatively, the correlation threshold detection sub-module may perform only reordering correlation threshold detection.
In one specific example, the input requirement is "i want to go to a restaurant, navigate me from five road junctions to the view. The retrieval request generation sub-module 230 is configured to obtain a semantic retrieval request according to the requirement, where the semantic retrieval request may be:
{domain:"navigation",confidence=0.8;
intent:"navigate",confidence=0.9;
slot list = { "tag": restaurant ", confidence =0.9; from _ loc, crossing five, confidence =0.6; "dest _ loc": in "Tanjin", confidence =0.7} }.
The retrieval submodule 210 is configured to place all the feature fields and corresponding Confidence levels (Confidence) in the semantic retrieval request into a retrieval library to perform retrieval in a full-or manner, so as to obtain N retrieval results with the top retrieval relevance scores. In this example, the search result may be:
search result 1 (search relevance score = 0.8) goal = "find nearby restaurants", action = "call group app, find nearby restaurants", reply = "find next restaurants for you";
search result 2 (search relevance score = 0.75) goal = "navigate", action = "invoke navigation app, inject start and destination parameters to navigation app", reply = "navigate to $ dest _ loc for you, route is being planned for you" \ 8230;
…
search result N (search relevance score = 0.70) \ 8230;.
The reordering sub-module 220 is used for extracting ordering characteristics, extracting ordering data, obtaining an ordering model through machine learning training, and bringing each retrieval result into the ordering model to obtain an ordering correlation score. The extracted features include the domain, the intention, the slot position information, the vehicle-mounted state, the vehicle speed, the vehicle position, the personal information of the user, the preference of the user, the confidence degrees of the features, the retrieval relevance scores of the N retrieval results and the like. Finally, if the ranking relevance score obtained by the retrieval result 2 is the highest and is larger than the manually set relevance threshold (for example, the score interval is 0-1.0 score, and the threshold can be set to 0.7 score), the retrieval result 2 is selected as the output of the requirement of the dialog state management system for the current dialog. After completing multiple rounds of conversations, the vehicle-mounted system will execute the search results of the final round of selection.
Note that in this example, the original requirement information entered contains two requirements, one is to navigate and the other is to find restaurants, and the navigation is the real requirement of the user. However, the search result obtained by the search sub-module 210 may not have the highest search relevance score for the true user requirement (search result 2). To solve this problem, the present invention adds a reordering sub-module 220 in the dialog state tracking module 200 for performing a machine learning-based ordering again on the search result obtained by the search sub-module 210 to obtain the real requirement of the user.
To better illustrate the dialogue state management system of the electric vehicle according to the present invention, the following examples are listed, corresponding to different fields:
1. examples of the field of navigation
The input requirement is query = "i want to navigate to Qinghua university";
the semantic understanding representation obtained by the semantic understanding module 100 may be:
{domain:"navigation",intent:"navigate",
slot list [ "navigator", "dest _ loc: qinghua university" ] };
the reordering result obtained by the dialog state tracking module 200 may be:
{goal:"navigation",
action: "nav _ poi service" (invokes the map navigation service),
TTS: "you are planning a $ dest _ loc route" }.
2. Examples of the music field
The input requirement is query = 'help me play zhou jieren's blue and white porcelain ";
the semantic understanding representation obtained by semantic understanding module 100 may be:
{domain:"music",intent:"search and play music",
slot list [ "play _ music", "music _ artist: zhou Jilun", "music _ name: blue and white porcelain" ] };
the reordering result obtained by the dialog state tracking module 200 may be:
{goal:"seach music",
action: "play music service" (tune-up music service),
TTS: "search for and play $ music _ name song for you, please later" }.
3. Examples in the field of telephony
The input requirement is query = 'please help me make a call to the link number of Liqu';
the semantic understanding representation obtained by semantic understanding module 100 may be:
{domain:"call",intent:"call someone",
slot list [ "person _ name: litetra", "phone _ operator: unicom", "call" ] };
the reordering result obtained by the dialog state tracking module 200 may be:
{goal:"call",
Action:"call service",
TTS: "is calling $ phone _ operator number of $ person _ name for you" }.
Note that in practice, the obtained reordering results may differ from the results of the present example based on the difference of the data in the search library; similar results to this example can be obtained when the search pool is appropriate and the amount of data in the pool is sufficiently large.
Further, an embodiment of the present invention further provides a controller, which includes a memory and a processor, where the memory stores a computer program, and the program, when executed by the processor, can implement any of the steps of the foregoing method for managing dialog states of an electric vehicle. It should be understood that the instructions stored in the memory correspond to steps of a specific example of an electric vehicle dialog state management method that the processor is capable of implementing when executed by the processor.
Further, an embodiment of the present invention further provides a computer-readable storage medium for storing computer instructions, where the instructions, when executed by a computer or a processor, implement the steps of any one of the above-mentioned methods for managing dialog states of an electric vehicle. It should be understood that the instructions stored in the computer-readable storage medium correspond to steps of a specific example of an electric vehicle dialog state management method that can be implemented when the instructions are executed.
Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.