WO2020019686A1

WO2020019686A1 - Session interaction method and apparatus

Info

Publication number: WO2020019686A1
Application number: PCT/CN2019/071301
Authority: WO
Inventors: 周建华; 武文杰; 陈少昂; 孙谷飞; 丁薛; 邓永庆; 王德锋; 桑聪聪; 杨少文
Original assignee: 众安信息技术服务有限公司
Priority date: 2018-07-27
Filing date: 2019-01-11
Publication date: 2020-01-30
Also published as: CN109241251A; CN109241251B

Abstract

Disclosed is a session interaction method. The method comprises: obtaining a user statement; determining whether the user statement comprises a conventional question; if yes, calling, in a database, a conventional answer corresponding to the conventional question and outputting the conventional answer; if no, determining whether the user statement comprises an intention, if yes, calling, in the database, a session flow corresponding to the intention and outputting the session flow. By means of the technical solution, a user intention can be effectively identified and information guidance and scheme pushing can be more accurately performed.

Description

Method and device for conversation interaction

This application claims priority from a Chinese application with application number No. 201810841590.9, filed on July 27, 2018, the entire contents of which are incorporated herein by reference.

Technical field

The present invention relates to the field of computer technology, and in particular, to a method and a device for session interaction.

Background of the invention

Most current computer session interactions involve multiple rounds of conversation. Based on predefined multi-round conversation rules, allowing machines to understand the user's intentions and picking appropriate response data from the conversation process to give users feedback has always been the direction of human-computer interaction. .

However, the existing human-machine multi-round conversation method technical solutions are based on the user question and the standard requirements included in the requirement structure tree to map, so as to output the standard requirement content of the hit leaf nodes. This solution has shortcomings in flexibility and accuracy, cannot support flexible jumps and calls between multiple conversation processes, and dynamically updated corpus templates in real time, which makes interaction in some scenarios difficult to achieve, and the intention The accuracy of the matching model is low.

For example, in the multi-round conversation application scenario of the actual insurance industry, because the insurance itself has a certain degree of professionalism, we find that the user's problem logic is unclear, the problem is vague or do not know how to ask questions, and the insurance insurance network is complicated, and the terminology is difficult Understanding and other issues have brought about application problems for understanding the user's natural language and providing multiple rounds of conversational interactions. In addition, the conversation includes not only the user's user portrait but also user related users' portraits, such as customers, referees, Beneficiaries, etc.

Therefore, how to recommend corresponding problems based on user information, such as intent, in order to achieve better human-computer interaction, has always been one of the technical problems to be solved in related technical fields.

Summary of the Invention

In order to overcome the shortcomings of the prior art, the technical problem solved by the present invention is to provide a conversation interaction method and device that can more accurately guide customer consultation.

In order to solve the above technical problems, one aspect of the present invention provides a method for session interaction, which includes the following steps: obtaining a user sentence; determining whether the user sentence contains a conventional question; and if so, calling a routine corresponding to the conventional question in a database Answer and output; if not, determine whether the user statement contains intent, and if so, retrieve and output the conversation flow corresponding to the intent in the database.

In order to solve the above technical problems, the inventor in this technical solution uses two rounds of intent judgment to identify the intentions in the user sentence, and makes corresponding outputs respectively, so that the user's intent can be accurately identified, and the customer can be more accurately guided to complete the consultation and follow-up Services. The two intent judgments are used to determine whether the user statement directly contains the existing problems in the database, and to determine whether the user statement implies a specific intention, so as to prevent the user from missing a specific intention type and reduce the probability of recognition errors. To improve the comprehensiveness and accuracy of identification.

Preferably, the method further includes: if not, inferring the intention according to the user sentence; judging whether the value obtained in the intent guessing is greater than a preset threshold; if yes, calling and outputting a conversation flow corresponding to the intent in the database.

It should be noted that, in some preferred implementation manners, the manner of the second intent judgment includes two steps, namely, determining whether the user sentence contains the intention, and intent guessing; determining whether the user sentence contains the intention means judgment. Whether the user statement directly includes intent types, such as "car insurance" and "insurance", and then conducts multiple rounds of conversation according to the intent type; and if the user statement does not directly include these intent types. The inventor further provides an intent-guessing step in this technical solution to further determine whether the user sentence contains an intent type implicitly. For example, "how long can the car be guaranteed", it may point to the "auto insurance" intent type. Through the above two-step judgment, the user's intention can be identified more accurately, the accuracy of identifying the user's intention can be improved, and the errors and incompleteness of user sentence recognition can be avoided.

Preferably, determining whether the user sentence contains a conventional question includes: performing text processing on the obtained user sentence, and determining whether the user sentence contains a conventional question according to a result of the text processing.

More preferably, the manner of text processing includes text segmentation.

It should be noted that, in some preferred embodiments, the pre-processing step for judging a user sentence includes performing text processing on the user sentence, which can more conveniently perform processing and subsequent recognition judgment, and improve the efficiency and accuracy of recognition.

Preferably, the user sentence includes entity information; the entity information includes one or more of the following: sentence vector information for training and compiling a sequence of word vectors; general entity information for representing general information; industry entity information , Used to represent industry-related information.

It should be noted that, as a preferred embodiment, the user sentence includes entity information to distinguish and judge, and the entity information includes sentence vector information, general entity information, and industry entity information. An example of sentence vector information could be "I have a car accident in Shanghai today. May I ask if the car's off-site auto insurance claim process is the same as the local one?"; And the general entity information can be "today" and "Shanghai" and other time and place information; and the industry Entity information can be industry information such as "auto" and "auto insurance".

More preferably, the user sentence further includes user portrait information, which is used to represent user personal and social relationship information.

It should be noted that the system can obtain the user's social relationship by obtaining the user portrait information appearing in the user sentence multiple times. This method can refer to the construction method of the character relationship map in the prior art, or refer to the following Creative way of acquisition designed by the inventor. Through this step, the system can further improve the session interaction construction of the database based on the user portrait information, so as to further accurately determine the user's intention and implement subsequent information push.

Further, the user portrait information includes one or more of personal identification information, personal attribute information, and social relationship information. The method for obtaining user portrait information specifically includes: performing an association calculation on a user sentence to obtain an association relationship, and obtaining the user sentence. The syntactic dependency relationship and dependency structure are extracted, and the personal identification information, personal attribute information, and social relationship information are extracted based on the association relationship for triple-item iterative learning to obtain a user portrait knowledge map.

It should be noted that, by extracting personal identification information, personal attribute information, and social relationship information based on the association relationship for iterative learning of triples, the obtained data structure has a strong network relationship, and it is necessary to obtain or retrieve other relevant node attributes, so that it can be more Accurately obtain the user portrait knowledge map, that is, the character relationship map above.

In some implementations, the specific way of performing association calculation on the user sentence is: associating calculation through the POS-CBOW method and improved Word2vec.

It should be noted that in some more specific implementations, association calculation is performed by the POS-CBOW method and improved Word2vec. Because the entity attributes and entity distribution are integrated, the technical effect of extracting entity associations can be achieved.

Further, determine whether the user sentence contains a conventional question; if so, retrieve a conventional answer corresponding to the conventional question in the database and output: include a stitching matrix of sentence vector information, general entity information, and industry entity information with the database FAQ The data sets are matched. If there are corresponding general questions in the FAQ data set, the general answers corresponding to the general questions are output.

It should be noted that the FAQ data set refers to a database including general questions and general answers as opposed to conventional questions. A conventional question is, for example, "How much is a car insurance a year?", And the corresponding conventional answer may be "4,000 yuan".

Matching the sentence vector information, general entity information, and industry entity information obtained from user sentences in the FAQ data set can accurately match the database and improve the matching efficiency.

In some implementations, matching the stitching matrix of sentence vector information, general entity information, and industry entity information to the FAQ data set includes: replacing the general entity information and industry entity information in the stitching matrix with the encoding of the top-level entity, and then Match with the FAQ dataset.

It should be noted that in some preferred embodiments, replacing the general entity information and industry entity information with the encoding of the top-level entity can more fully match the content of the data set, such as replacing "private car" with "car" , It is replaced by the encoding of the upper layer entity, which can further accurately identify the matching question and the answer corresponding to the question.

Further, determining whether the user sentence includes an intention includes: performing a text classification through a CNN model to obtain the intention according to a stitching matrix of entity information and user portrait information.

In a more specific embodiment, the specific process of the above process is: in an independent sentence S ∈ R ^nk of the user session, represented by a k-dimensional vector of n words, and encoding the corresponding information of the entity and the user portrait into a dictionary In Word2vec, the word vector X _i ∈ R ^k is obtained after segmentation and de-stopping. Then the independent sentence can be expressed as:

Represents the connection of the word vector X _i .

The feature information is mined using the N-Gram form of independent sentences. The CNN model is used to define the text convolution kernel W ∈ R ^lk (where the convolution length is L). For the feature of text sliding in each window, f _i = f (W _i X _{i: i + l-1} + b), the obtained feature map is F = [f ₁ , f ₂ , ... f _{n−l + 1} ].

In the pooling layer, the largest one-dimensional feature vector is stored as feature information, and the n-dimensional vector is obtained from n convolutions and mapped into a global feature vector of a fixed length.

In the output layer, a fully-linked layer is established, which is mapped to the h-dimensional intent space. The binary cross-entropy loss function is optimized through supervised learning, and the probability output by softmax is mapped into the intent-confidence matrix of the h-dimensional intent space. The output is intent and confidence, and a list of entity sets is stored in memory.

In some embodiments, the type of intention includes one or more of an insurance intention, an underwriting intention, a claims intention, a renewal intention, and a surrender intention.

As a preference, retrieving and outputting a conversation flow corresponding to the intent in the database includes: judging the type of the intent; fetching the required information and obtaining the information according to the type of the intent; and outputting the corresponding scheme according to the information.

Preferably, the specific way of judging the type of intent is to perform a confidence calculation. When the confidence level is greater than a set value of a certain intent type, it is determined that the intent type belongs.

It should be noted that the confidence type calculation can more accurately identify the user's intention type. Specifically, the confidence calculation method is the softmax layer output of the fully connected layer vector z

Through the judgment of the above-mentioned confidence calculation, the judgment result of the intent type can be obtained more accurately.

Preferably, the information is obtained by obtaining one or two of entity information and user portrait information; and / or inquiring and obtaining information from the user.

It should be noted that in the second intent judgment step, after determining the type of intent, it is necessary to output the solution according to the type of intent. In this process, it may be necessary to synthesize a variety of user information, such as age, ID, etc. Some information here can be obtained from one or two of entity information and user portrait information, which can improve the efficiency of information acquisition; in addition, users can be asked for information and obtained information to improve the accuracy of information acquisition, thereby Further improve the overall accuracy of the matching output.

Preferably, the information includes one or more of gender, age, license plate number, region, and number of households.

As a further preference, the scheme is a recommended insurance scheme.

Preferably, the calculation method of the inferred value is specifically: firstly embed the training word vector into the text segmentation and words of the question text, and then convert it into a sentence vector, stitch the matrix of the sentence vector information, the entity information and the user portrait information through the LSTM model Perform training to extract features. The specific process is that the current input X _t enters the new memory block memory, and is mapped to the input gate i _t = σ (w _t X _t + W _t h _t-1 + b _q ) through the activation function, and the gate f is forgotten. _t = σ (w _f X _t + W _f h _t-1 + b _f ) controls the amount of information and updates the memory block q _t = tanh (w _q X _t + W _q h _t-1 + b _q ), output Information or conversation card o _t = σ (w _o X _t + W _o h _t-1 + b _o ) and determine what information to save in the new memory block, C _t = tanh (w _c X _t + W _c h _t-1 + b _c ),

Update and output the current hidden layer h _t = o _t * tanh (C _t ). Finally, after the linear full-link layer of the LSTM model, a Softmax layer is added to map the LSTM model to the potential intent space to obtain its probability distribution.

According to a second aspect of the present invention, there is also provided a conversation interaction device, including: an acquisition module for acquiring a user sentence; a first judgment module for judging whether the user sentence contains a conventional question; a first output module for the first time When the judgment result of a judgment module is yes, a conventional answer corresponding to a conventional question is retrieved from the database and output; the second judgment module is used to judge whether the user sentence is in the sentence when the judgment result of the first judgment module is no. Contains intent; a second output module is used to retrieve and output the conversation flow corresponding to the intent in the database when the judgment result of the second judgment module is yes.

Further, it further comprises: a guessing module, configured to make an intention inference according to a user sentence when the judgment result of the second judgment module is negative; a third judgment module, which is used to judge whether the value obtained in the intention inference is greater than a preset threshold; The third output module is configured to retrieve and output the conversation flow corresponding to the intention in the database when the judgment result of the third judgment module is yes.

Compared with the prior art, the beneficial effects of the present invention are:

1. The conversation interaction method of the present invention recognizes the intentions in user sentences through two rounds of intention judgment to make corresponding outputs respectively, so that the user's intentions can be accurately identified, and the customer can be more accurately guided to complete consultation and subsequent services.

2. In the conversation interaction method of the present invention, the second intent judgment method includes two steps, namely, judging whether the user sentence contains an intent, and intent inference. Through the two-step judgment, the user's intent can be more accurately identified. Improve the accuracy of identifying user intentions, and thus avoid errors and incompleteness of user sentence recognition.

3. In the conversation interaction method of the present invention, the preprocessing step of judging a user sentence includes text processing of the user sentence, which can more conveniently perform processing and subsequent recognition judgment, and improve the efficiency and accuracy of recognition.

4. In the conversation interaction method of the present invention, in some preferred embodiments, the general entity information and industry entity information are replaced with the encoding of the top-level entity, so that the matching question and the answer corresponding to the question can be further accurately identified.

5. The conversation interaction method of the present invention can obtain related information from one or two of entity information and user portrait information after determining the type of intent, which can improve the efficiency of information acquisition; in addition, it can also be performed to the user. Query and obtain the relevant information to improve the accuracy of the information acquisition, thereby further improving the overall accuracy of the matching output.

6. The conversation interaction method of the present invention realizes the complete extraction of all the information in the context of the user's conversation. Through entity extraction model and relationship extraction, general entities, industry entities and user portraits are extracted from sentences. Users are learned through deep learning models. The intentions and possible intentions have higher accuracy.

The above description is only an overview of the technical solution of the present invention. In order to understand the technical means of the present invention more clearly, it can be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more comprehensible. The following describes the preferred embodiments and the accompanying drawings in detail as follows.

Brief description of the drawings

FIG. 1 is a schematic flowchart of an implementation manner of a session interaction method according to the present invention.

FIG. 2 is a schematic diagram of a preferred embodiment of a user knowledge map of the conversation interaction method of the present invention.

FIG. 3 is a schematic flowchart of a preferred implementation of the second intention judgment of the conversation interaction method of the present invention.

FIG. 4 is a schematic flowchart of defining a session flow rule in the present invention.

FIG. 5 is a schematic diagram of a preferred process in the step of FIG. 4.

FIG. 6 is a schematic structural diagram of a session interaction apparatus according to an embodiment of the present invention.

Mode of Carrying Out the Invention

In order to make the purpose, technical means, and advantages of the present invention clearer, the present invention is further described in detail below with reference to the accompanying drawings.

FIG. 1 is a schematic flowchart of an implementation manner of a session interaction method according to the present invention. As shown in FIG. 1, the method 100 includes the following steps: 110. Sentence acquisition: obtaining user sentences; 120 first intention judgment: making a first intention judgment according to the user sentence, the first intention judgment is used to determine whether the user sentence contains a conventional question ; If yes, then retrieve and output the conventional answer corresponding to the conventional question in the database; if not, perform the second intent judgment; 130 the second intent judgment is used to determine whether the user sentence contains an intent, and if so, then The conversation process corresponding to the intent is retrieved from the database and output.

The above is one of the basic embodiments of the present invention. In this technical solution, the intentions in the user sentence are identified through two rounds of intention judgments to make corresponding outputs respectively, so that the user's intentions can be accurately identified, and the customer can be more accurately guided to complete the consultation and subsequent services. The two intent judgments are used to determine whether the user statement directly contains the existing problems in the database, and to determine whether the user statement implies a specific intention, so as to prevent the specific type of intention implicit from the user from being missed and reduce the recognition error rate. To improve the comprehensiveness and accuracy of identification.

With reference to the above-mentioned basic embodiment, in the second aspect, the specific process of the second intent judgment 130 is to determine whether the user sentence contains an intent, and if so, retrieve and output the conversation flow corresponding to the intent in the database; if not, Then, the intention inference is performed according to the user sentence, and it is determined whether the value obtained in the intention inference is greater than a preset threshold. If so, it is confirmed that there is an intention, and then the conversation process corresponding to the intention is retrieved from the database and output.

In this regard, the method of the second intent judgment includes two steps, namely, determining whether the user sentence contains the intention, and intent guessing; determining whether the user sentence contains the intention, refers to determining whether the user sentence directly includes the type of the intention. , Such as "car insurance", "insurance", etc., and then conduct multiple rounds of conversation according to the type of intent; and if the user statement does not directly include these types of intent, the inventor further provides intent inference in this technical solution Steps to further determine whether the user's sentence contains an intent type, for example, "how long can the car be guaranteed", it is possible to point to the "car insurance" intent type. Through the above two-step judgment, the user's intention can be identified more accurately, the accuracy of identifying the user's intention can be improved, and the errors and incompleteness of user sentence recognition can be reduced.

With reference to the above-mentioned basic embodiment, in the third aspect, the obtaining of the sentence in step 110 specifically includes: performing text processing on the obtained user sentence. In a further specific embodiment, the text processing manner includes text segmentation. The pre-processing step for judging a user sentence includes text processing on the user sentence, which can more conveniently perform processing and subsequent recognition judgment, and improve the efficiency and accuracy of recognition.

With reference to the above-mentioned basic embodiment, in a fourth aspect, the user sentence includes entity information, and accordingly, step 110 includes extracting entity information. The entity information includes one or more of the following: sentence vector information used to train and compile word vector sequences; general entity information used to represent general information; industry entity information used to represent industry-related information.

User statements include entity information to distinguish and judge, and entity information includes sentence vector information, general entity information, and industry entity information. The acquisition of the word vector is completed in the text word segmentation step. An example of sentence vector information could be "I have a car accident in Shanghai today. May I ask if the auto claims insurance process is the same?" And the general entity information can be "today", "Shanghai" and other time and place information; and industry entity information , Can be "auto", "auto insurance" and other industry information.

With reference to the above-mentioned basic embodiment, in the fifth aspect, the user sentence further includes user portrait information, which is used to represent the personal and social relationship of the user. Accordingly, step 110 also extracts user portrait information. The system can obtain the user's social relationship by obtaining the user portrait information appearing in the user sentence multiple times. This method can refer to the way of constructing the task relationship map in the prior art, or refer to the inventor design as described below. Creative way of getting it. Through this step, the system can further improve the session interaction construction of the database based on the user portrait information, so as to further accurately determine the user's intention and subsequent information push. In some more specific implementation manners, the user portrait information includes one or more of personal identification information, personal attribute information, and social relationship information.

With reference to the above-mentioned basic embodiment, in the sixth aspect, the method for obtaining user portrait information specifically includes: performing association calculations on user sentences to obtain association relationships, obtaining syntactic dependencies and dependency structures in user sentences, and extracting individuals based on the association relationships. The identification information, personal attribute information and social relationship information are subjected to triple-tuple iterative learning to obtain a user portrait knowledge map.

Extract the personal identification information, personal attribute information, and social relationship information according to the association relationship for iterative learning of the triples. The obtained data structure has strong network relationships. The formation process needs to obtain or retrieve the attributes of other related nodes, so that it can be obtained more accurately. User portrait knowledge map, which is the relationship map described above. In some more specific embodiments, the specific way of performing association calculation on the user sentence is: through the POS-CBOW method, and through the improved Word2vec association calculation. Through the POS-CBOW method and the improved Word2vec for association calculations, due to the integration of entity attributes and entity distribution, the technical effect of extracting entity association relationships can be achieved.

Figure 2 is a schematic diagram of a preferred embodiment of a user portrait knowledge map composed of user portraits. For example, by acquiring the user sentence "My dad is 66 years old this year, and bladder cancer has recovered last year. May I ask Xiaoxin for the elderly?" Is it safe against human cancer? ”, The entity extraction is performed first, including general entity information, industry entity information, and user portrait information extraction, and matrix coding is performed. Specifically, you can do text segmentation first, for example: "I | Dad | This year | 66 years old | I did it last year | Bladder cancer | Healing | I have it, can I buy it? Xiaoxin elderly cancer prevention insurance?" Secondly, use POS-CBOW method to perform correlation calculation through improved Word2vec, obtain syntactic dependencies and dependency structures, extract entities, attributes, relationships through relationships, and learn more templates through triples iteration. For example, get me and dad in the previous sentence. The age is 66 years old, and the disease is bladder cancer. By training the ontology association relationship in the insurance field corpus, a user portrait knowledge map 200 shown in FIG. 2 is obtained. In the Internet insurance application scenario, there may be problems such as agents insuring his customers, users recommending good insurance products to friends, and customers insuring themselves, parents, and children, etc., which involve related ontology. Therefore, it is necessary to establish user portrait knowledge through context. Atlas, which solves the problem of the complex relationship between many subjects and entities in the conversation. This method builds a knowledge map through user portraits, and solves scenarios such as users querying "my client's policy" or "what friends do I recommend" or "what coverage does my family's insurance cover?"

With reference to the above-mentioned basic embodiment, in the seventh aspect, the specific method of the first intent judgment includes: the database is provided with a FAQ data set; a mosaic matrix of sentence vector information, general entity information and industry entity information is performed with the FAQ data set Matching, if there is a corresponding conventional question in the FAQ data set, then a conventional answer corresponding to the conventional question is output. The FAQ data set refers to a database including general questions and general answers corresponding to the general questions. A conventional question is, for example, "How much is a car insurance a year?", And the corresponding conventional answer may be "4,000 yuan". Matching the sentence matrix information, general entity information, and industry entity information from the user's sentence to the stitching matrix and FAQ data set can accurately match the data blocks and improve the matching efficiency.

In some implementations, matching the stitching matrix of sentence vector information, general entity information, and industry entity information with the FAQ data set specifically includes the following steps: replacing the general entity information and industry entity information in the stitching matrix with the most The encoding of the upper entity is then matched with the FAQ data set. Replacing the general entity information and industry entity information with the encoding of the top-level entity can more fully match the content of the data set. For example, replacing "private car" with "car" is the encoding of the upper-level entity, which can further Accurately identify matching questions and answers corresponding to that question.

In some specific embodiments, for example, in the FAQ identification, a stitching matrix composed of sentence vectors, general entities, and industry entities is input, and the entities in the user's question sentence are replaced with the encoding of the top-level entity, such as replacing diabetes with Disease, the BMW 320Li is replaced with a car, then the codes of these top-level entities are matched with the problems in the QA, and finally the similarity comparison model is used to find problems in the QA that are greater than a certain similarity threshold. For example, the user asks "Can the BMW 320Li be insured?" And the "Can the car be insured?" Template in QA has the highest similarity, and different answers can be set through QA conditions, such as "Zhongan Auto Insurance can insure vehicles under 2 million ".

With reference to the above-mentioned basic embodiment, in the eighth aspect, the specific method for determining the second intent includes: obtaining the intent through text classification through a CNN model according to a stitching matrix of entity information and user portrait information, and FIG. 3 illustrates one of the processes. A schematic flowchart of an embodiment. The specific steps of the method 300 are: 310. An independent sentence S ∈ R ^nk of a user conversation is represented by a k-dimensional vector of n words, and corresponding information of an entity and a user portrait is encoded into In the dictionary, word2vec is used to obtain the word vector X _i ∈ R ^k after segmentation and de-stopping. Then the independent sentence can be expressed as:

Represents the connection of the word vector X _i .

320. Use the N-Gram form of the independent sentence to mine the feature information, and define the text convolution kernel W ∈ R ^lk (where the convolution length is L) through the CNN model. For the feature of the text sliding in each window For f _i = f (W _i X _{i: i + l-1} + b), the obtained feature map is F = [f ₁ , f ₂ , ... f _{n−l + 1} ].

330. In the output layer, establish a fully-linked layer, which is mapped to the h-dimensional intent space, optimize the binary cross-entropy loss function through supervised learning, and map the probability of the softmax output to the intent-confidence matrix of the h-dimensional intent space. The output is intent and confidence, and a list of entity sets is stored in memory.

As a preference, in the second intent judgment step, the specific manner of fetching and outputting the conversation process corresponding to the intent in the database includes: judging the type of intent; obtaining the required information and obtaining the information according to the type of intent; according to the Information output corresponding scheme.

Preferably, the specific way of judging the type of intent is to perform a confidence calculation. When the confidence level is greater than a set value of a certain intent type, it is determined that the intent type belongs. Confidence calculation can more accurately identify the user's intent type. Specifically, the confidence calculation method is the softmax layer output of the full link layer vector z

In some other preferred implementation manners, the way to obtain information is to obtain from one or both of entity information and user portrait information; and / or inquire the user about the information and obtain it. In the second intent judgment step, after determining the type of intent, it is necessary to output the solution according to the type of intent. In this process, it may be necessary to synthesize a variety of information of the user, such as age, ID, etc., and some information here is It can be obtained from one or two of entity information and user portrait information, which can improve the efficiency of information acquisition; in addition, users can be asked about this information and obtain it to improve the accuracy of information acquisition, thereby further Improve the overall accuracy of the matching output. As a further preferred solution, the information includes one or more of gender, age, license plate number, region, and number of households. In some other preferred schemes, the scheme is a insurance recommendation scheme.

The following describes the above acquisition and determination process in combination with a specific implementation mode:

FIG. 4 is a schematic flowchart of defining a session flow rule in the present invention. As shown in Figure 4, the conversation flow rule 400 contains (intent) triggers, nodes, conditions, and actions.

Defining a trigger condition rule 410 includes: defining a condition that an intent name and a corresponding confidence degree should satisfy, such as: intent = underwriting and confidence_degree> 0.8.

Define node rules include:

Node names 420 are defined, and each node includes corresponding conditions and actions.

Define condition 430. Among them, IF ... ELSE, IF / ELSE, IF / .. and other logical expressions are used to implement the mapping and alignment of entities. The mapping and alignment of the entity includes the mapping of entities and user portraits and the alignment between entities. For example, the condition is defined as age <55, and in the user portrait, the attribute of "my dad" who is "born 52" is obtained from "my dad was born in 52", and the age in the condition definition is mapped to the "age" of "my dad", And "52-year-old" is aligned to 66 years old. From the user portrait, it is judged that the condition does not meet age <55.

Define an action 440. There are three types defined here, namely cards, jumps, and application programming interface (API) return values. Among them: cards, including selection cards, text cards, graphic cards, graphic lists, pictures, and other cards, interact with users to obtain information and map information, the purpose is to collect structured and unstructured data External data, as well as response and feedback results; jump, you can jump to other nodes or Uniform Resource Locator (URL) or manual, etc .; API return value, return to the server through the API to the user portrait collected Information, acquisition requests, such as insurance recommendations.

In one embodiment, each node further includes a memory, and the definition node rule further includes defining a memory. Specifically, the question input by the end user will be processed in the following two steps:

In the first step, the user's question is processed by the intent recognition model to obtain the highest confidence intent and the corresponding entity. For example, the user enters the question "Can a 50-year-old man be insured?" To obtain the highest confidence intent. For "insured", the corresponding entities are "50 years old" and "male".

In the second step, the content related to the entity corresponding to the intent in the user input is mapped to the entity as part of the context information of the conversation process, and stored in a storage medium suitable for high-frequency access as a follow-up One of the data sources for the first intention judgment in the step.

See Figure 5 for a specific example. For example, a trigger triggers multiple rounds of sessions. Node 1 determines whether it has an identity card, obtains it through the API, and reads the user portrait. Node 2. The condition is judged. If an ID is obtained, the card is selected for gender, and jumps to node 3, and the card is selected for age; if no ID is obtained, jumps directly to node 4. Node 4 needs to enter the license plate, node 5 judges whether the region is in the specified region, if so, reads the user portrait and jumps to node 6, otherwise, it gives a selection card and selects the region. Node 6 selects the number of households, and node 7 recommends auto insurance. If the recommendation API fails, an error message is returned.

From the example above, multiple rounds of sessions can be judged by the logic of the nodes and complete the jump of each node. It can support the mapping of entities and user portraits and the alignment of entities. It also supports the selection of cards, text cards, graphic cards, and text. Rich interactive cards such as lists and pictures.

For the above steps, the following figure describes the execution module in detail:

Match in the preset session flow according to the acquired intent, and query the corresponding session flow configuration. A session flow configuration consists of multiple interactive step nodes, which include at least a start node and an end node.

Each node consists of a node body, a trigger, multiple sets of conditional behaviors, and a memory network.

The node body is the key value of the content that a node needs to collect. The input from the node body and the user to the body will be added to the conversation process context in the form of key-value pairs and stored in the storage medium. The structure of the context is shown in Figure 5.

The trigger determines whether the node will be executed. When the condition of the trigger is met, the machine program will push the preset content of the node to the user, and the user will continue to input and complete the user interaction at this step. A trigger consists of a trigger body and a trigger condition. There are three types of trigger bodies, which are intent type (identified with @ symbol), entity type (identified with # symbol), and data type (identified with _ symbol). The data type is defined by the user in advance and stored in the memory medium, and a specific namespace memory in the memory is allocated in advance. The user-defined data x will be stored in memory with memory.x as the key value. The memory.x key-value pair is The life cycle is equivalent to the entire conversation process, and the application range is from the machine to the user.

With reference to the foregoing embodiment, in a ninth aspect, the calculation method of the intentionally inferred value is specifically:

First, by segmenting and sub-embedding the training word vector in the question text, and then converting it into a matrix of sentence vectors, entity information, and user portraits, the feature is extracted by training through the LSTM model. The specific process is that the current input X _t enters a new memory block memory. The activation function is mapped to the input gate i _t = σ (w _t X _t + W _t h _t-1 + b _q ), and the forgetting gate f _t = σ (w _f X _t + W _f h _t-1 + b _f ) Control the amount of information, and update the memory block q _t = tanh (w _q X _t + W _q h _t-1 + b _q ), output information or session card o _t = σ (w _o X _t + W _o h _{t- 1} + b _o ), and determine what information to save in the new memory block, C _t = tanh (w _c X _t + W _c h _t-1 + b _c ),

FIG. 6 is a schematic structural diagram of a session interaction apparatus according to an embodiment of the present invention. As shown in FIG. 6, the conversation interaction device 600 includes an acquisition module 610, a first determination module 620, a first output module 630, a second determination module 640, and a second output module 650. Wherein, the obtaining module 610 is used to obtain a user sentence; the first judgment module 620 is used to judge whether the user sentence contains a conventional question; the first output module 630 is used when the judgment result of the first judgment module 620 is yes, in the database Calling a conventional answer corresponding to the conventional question and outputting it; a second judging module 640 for judging whether the user sentence includes an intention when the judging result of the first judging module 620 is no; a second output module 650 for When the judgment result of the second judgment module 640 is YES, a conversation flow corresponding to the intention is retrieved from the database and output.

In one embodiment, the session interaction device 600 further includes a speculation module 660, a third determination module 670, and a third output module 680. Among them, the guessing module 660 is used to make an intention inference according to the user sentence when the judgment result of the second judgment module 640 is negative; the third judgment module 670 is used to judge whether the value obtained from the intention guessing is greater than a preset threshold; the third output The module 680 is configured to: when the determination result of the third determination module 670 is YES, retrieve and output a conversation flow corresponding to the intention in the database.

In one embodiment, the session interaction device 600 further includes a processing module, configured to perform text processing on the user sentence acquired by the obtaining module 610. In this case, the first determining module 620 is specifically configured to determine whether the user sentence contains a conventional question according to the processing result of the processing module. The text processing here includes text segmentation.

In one embodiment, the user sentence includes entity information, and the entity information includes one or more of the following: sentence vector information for training and compiling a sequence of word vectors; general entity information for representing general information ; Industry entity information, used to represent industry-related information.

In one embodiment, the user sentence further includes user portrait information, which is used to represent personal and social relationships of the user. The user portrait information includes one or more of personal identification information, personal attribute information, and social relationship information.

The method for obtaining user portrait information includes: performing association calculations on user sentences to obtain association relationships, obtaining syntactic dependencies and dependency structures in user sentences, and extracting personal identification information, personal attribute information, and social relationship information based on the association relationship to perform ternary The group learns iteratively to get the user portrait knowledge map. The specific methods used to perform association calculation on user statements include POS-CBOW method and association calculation through improved Word2vec.

In one embodiment, the first judgment module 620 is specifically configured to match the stitching matrix of sentence vector information, general entity information, and industry entity information with the FAQ data set in the database. Preferably, the general entity information in the stitching matrix is matched. The information of the industry entity is replaced with the encoding of the top-level entity and then matched with the FAQ data set; the first output module 630 is specifically configured to output a conventional answer corresponding to the conventional question when there is a conventional question in the FAQ data set.

In one embodiment, the second judgment module 640 is specifically configured to perform text classification through a CNN model to obtain an intent according to a stitching matrix of entity information and user portrait information.

In one embodiment, the second output module 650 is specifically configured to determine the type of the intent, and the type of the intent includes one or more of an insurance intention, an underwriting intention, a claim intention, a renewal intention, and a surrender intention; according to The type of intent retrieves the required information and obtains this information, which includes one or more of gender, age, license plate number, region, and number of households; according to the information, the corresponding scheme is output, and the scheme includes insurance recommendation Program.

The specific way of judging the type of intent here is to calculate the confidence level. When the confidence level is greater than the set value of a certain intent type, it is judged that it belongs to the intent type. The method for obtaining information includes obtaining from one or two of entity information and user portrait information, and the user may also be asked to obtain the information.

The session interaction device shown in FIG. 6 may correspond to the session interaction method provided by any of the foregoing embodiments. The specific descriptions and limitations of the session interaction method described above may be applied to the session interaction device, and details are not described herein again.

The above are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.

Claims

A session interaction method includes:

Get user statements;

Determining whether the user statement contains a conventional question;

If yes, a conventional answer corresponding to the conventional question is retrieved from the database and output;

If not, determine whether the user sentence contains an intention, and if so, retrieve and output a conversation flow corresponding to the intention in the database.
The conversation interaction method according to claim 1, further comprising:

When it is determined that the user sentence does not include an intention, the intention inference is performed according to the user sentence;

It is determined whether the value obtained in the intent guess is greater than a preset threshold, and if so, a conversation process corresponding to the intent is retrieved from the database and output.
The conversation interaction method according to claim 1, further comprising: after obtaining the user sentence, performing text processing on the user sentence;

The judging whether the user sentence includes a conventional question includes:

It is determined whether the user sentence includes the conventional question according to a result of the text processing.
The conversation interaction method according to claim 3, wherein the text processing manner includes text segmentation.
The conversation interaction method according to claim 1, wherein the user sentence includes entity information; the entity information includes one or more of the following:

Sentence vector information, used to train and compile word vector sequences;

General entity information, used to represent general information;

Industry entity information, used to represent industry-related information.
The conversation interaction method according to claim 5, wherein the user sentence further includes user portrait information, which is used to represent personal and social relationships of the user.
The conversation interaction method according to claim 6, wherein the user portrait information includes one or more of personal identification information, personal attribute information, and social relationship information;

The obtaining manner of the user portrait information specifically includes:

Perform association calculations on the user sentences to obtain association relationships, obtain syntactic dependencies and dependency structures in the user sentences, and extract the personal identification information, personal attribute information, and social relationship information according to the association relationships for triple-iterative learning To get the user portrait knowledge map.
The conversation interaction method according to claim 7, wherein the specific way of performing correlation calculation on the user sentence is: POS-CBOW method and correlation calculation through improved Word2vec.
The conversation interaction method according to claim 6, wherein said judging whether the user sentence contains a conventional question; if so, retrieving and outputting a conventional answer corresponding to the conventional question in a database includes:

Matching the splicing matrix of the sentence vector information, general entity information and industry entity information with the FAQ data set in the database, and if there are corresponding general problems in the FAQ data set in the database, the output is consistent with the general The conventional answer to the question.
The conversation interaction method according to claim 9, wherein matching the stitching matrix of the sentence vector information, general entity information, and industry entity information with the FAQ data set in the database comprises:

Replace the general entity information and industry entity information in the stitching matrix with the code of the top-level entity, and then match the FAQ data set in the database.
The conversation interaction method according to claim 6, wherein the determining whether the user statement includes an intent comprises:

According to the stitching matrix of the entity information and the user portrait information, text classification is performed through a CNN model to obtain an intent.
The conversation interaction method according to claim 11, wherein the type of the intention includes one or more of an insurance intention, an underwriting intention, a claim intention, a renewal intention, and a surrender intention.
The conversation interaction method according to claim 12, wherein the retrieval and output of a conversation flow corresponding to the intent in a database comprises:

Judging the type of said intent;

Retrieve required information and obtain the information according to the type of the intent;

A corresponding scheme is output according to the information.
The conversation interaction method according to claim 13, wherein the specific way of determining the type of the intent is to perform a confidence calculation, and when the confidence degree is greater than a set value of a certain type of intent, it is determined that it belongs to Intent type.
The conversation interaction method according to claim 13, wherein the method of obtaining the information comprises obtaining from one or two of the entity information and the user portrait information; and / or from the user Ask for the information and get it.
The conversation interaction method according to claim 13, wherein the information comprises one or more of gender, age, license plate number, region, and number of households.
The conversation interaction method according to claim 16, wherein the scheme is a insurance type recommendation scheme.
The conversation interaction method according to claim 6, wherein the calculation method of the intent-predicted value specifically comprises:

According to the stitching matrix of the sentence vector information, the entity information, and the user portrait information, text classification is performed by the LSTM model to obtain the potential intent of the user's next sentence.
A conversation interaction device includes:

An acquisition module for acquiring user statements;

A first determining module, configured to determine whether the user sentence contains a conventional question;

A first output module, configured to: when the judgment result of the first judgment module is yes, retrieve and output a conventional answer corresponding to the conventional question in a database;

A second determination module, configured to determine whether the user statement includes an intention when the determination result of the first determination module is no;

A second output module is configured to: when the judgment result of the second judgment module is yes, retrieve and output a conversation flow corresponding to the intention in the database.
The conversation interaction device according to claim 19, further comprising:

A guessing module, configured to make an intention inference according to the user sentence when the judgment result of the second judgment module is no;

A third determining module, configured to determine whether the value obtained in the intent guess is greater than a preset threshold;

A third output module is configured to: when the determination result of the third determination module is yes, retrieve and output a conversation flow corresponding to the intent in the database.