CN114064873A

CN114064873A - Method and device for building FAQ knowledge base in insurance field and electronic equipment

Info

Publication number: CN114064873A
Application number: CN202111354977.XA
Authority: CN
Inventors: 邹阳
Original assignee: Huize Chengdu Network Technology Co ltd
Current assignee: Huize Chengdu Network Technology Co ltd
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2022-02-18

Abstract

The application discloses a method and a device for constructing an insurance field FAQ knowledge base and electronic equipment. And when the quality ranking processing is carried out on the answers in the extracted conversation pairs, controlling the answer quality ranking result for the similar conversation pairs in the conversation pairs through questions or objections in the similar conversation pairs. According to the method and the system, the FAQ knowledge base is established, the answer quality sequencing result is controlled through the problems or the objections in the similar session pair, support is provided for analyzing the problems and the objections of the client, the time for the insurance consultant to solve the client questions can be effectively shortened, the accuracy for solving the client questions can be improved, and the purposes of assisting the company in operating and making a better marketing strategy can be achieved correspondingly.

Description

Method and device for building FAQ knowledge base in insurance field and electronic equipment

Technical Field

The application belongs to the field of insurance and artificial intelligence, and particularly relates to a method and a device for constructing an FAQ (Frequently Asked Questions) knowledge base in the field of insurance and electronic equipment.

Background

Insurance is a means for avoiding risks, which is known to people in recent years, and the market insurance products are gladly full, and because insurance products have strong speciality, especially dangerous species such as long-term insurance or annuity insurance, common consumers can learn about specific products through corresponding insurance consultants, and the insurance consultants also play an increasingly important role, so as to effectively promote sales transformation, and require the insurance consultants to accurately solve the problems and doubts of clients.

However, the fact that the insurance advisor faces different clients every day, different clients have different problems or concerns, and the strong professional character of the insurance product and the accompanying slow growth rate of the insurance advisor present challenges to the insurance advisor in accurately solving the problems and concerns. Therefore, the technical problems to be solved urgently in the field are solved by shortening the time for solving the client question by the insurance consultant and improving the accuracy for solving the client question, and accordingly achieving the purposes of assisting the operation of a company and making a better marketing strategy.

Disclosure of Invention

In view of this, the application provides a method and a device for constructing an FAQ knowledge base in the insurance field, and an electronic device, which construct a knowledge base for client problems or objections in the insurance field, and provide support for analyzing the client problems and objections, so as to shorten the time for an insurance advisor to solve the client questions and improve the accuracy, and accordingly achieve the purposes of assisting the company in operation and making a better marketing strategy.

The specific technical scheme is as follows:

a FAQ knowledge base construction method in the insurance field comprises the following steps:

acquiring a conversation text of a client and a consultant in the insurance field;

extracting questions and answers matched with the questions from the conversation text, and/or extracting objections and answers matched with the objections to obtain conversation pairs comprising question answer pairs and/or objection answer pairs;

performing quality sorting processing on the answers in the extracted conversation pair to obtain an answer quality sorting result; for the similar conversation pairs in the conversation pairs, controlling answer quality sequencing results through questions or objections in the similar conversation pairs;

and filtering the conversation pairs with answer quality not meeting preset quality conditions in the conversation pairs based on the answer quality sequencing result, and constructing an insurance field FAQ knowledge base comprising the conversation pairs which are not filtered.

Optionally, the extracting the question and the answer matched with the question from the session text includes:

carrying out sentence pattern prediction on client sentences in the conversation text by utilizing a pre-constructed sentence pattern recognition model to obtain a sentence pattern prediction result indicating that the client sentences are question sentences or non-question sentences;

screening client sentences of which the sentence pattern prediction results are question sentences as problems;

taking the content of the first continuous conversation of the consultant corresponding to the question in the conversation text as the answer to the question to obtain a question answer pair;

the extracting of the objection and the answer matched with the objection comprises the following steps:

predicting the content of the client statement in the session text by using a pre-constructed objection model to obtain a content prediction result representing that the content of the client statement has objection or does not have objection;

extracting a client objection statement with an objection represented by a content prediction result;

and taking the content of the consultant session corresponding to the client objection sentence in the session text as an answer of the client objection sentence to obtain an objection answer pair.

Optionally, the quality ranking processing is performed on the answers in the extracted conversation pair to obtain an answer quality ranking result, and the answer quality ranking result includes:

clustering the extracted conversation pairs based on a preset clustering algorithm to obtain a plurality of groups of different similar conversation pairs;

determining the influence degree of different questions or objections in the similar session pair on different answers;

performing feature interaction on the answers in the similar session pairs and the matched questions or objections thereof based on the influence degrees of different questions or objections in the similar session pairs on different answers to obtain interaction features corresponding to the answers;

and carrying out fusion processing on the interactive features corresponding to the answers and the answer features of other answers in the similar conversation pair of the group to which the answers belong to obtain fusion features corresponding to the answers in the similar conversation pair.

And performing quality sequencing processing on the answers in the similar session pair based on the fusion characteristics corresponding to the answers in the similar session pair respectively.

Optionally, the determining the influence degrees of different questions or different questions and different answers in the similar session pair, and performing feature interaction on the answers in the similar session pair and the matched questions or different questions thereof based on the influence degrees of different questions and different answers in the similar session pair to obtain interaction features corresponding to the answers includes:

coding different problems or objections in the similar session pairs to obtain a problem vector or an objection vector; coding different answers in the similar conversation pair to obtain an answer vector;

and respectively calculating the influence degrees of different questions or objections in the similar conversation pairs on different answers and the fused question vector corresponding to the answers by using the following calculation formulas:

g(QE)＝softmax(W*QE)

wherein W ∈ R^n*nRepresenting a weight mapping matrix, n x n representing the matrix dimensions of the matrix, R representing dimension symbols, n representing the number of questions and objections in a similar session pair, softmax being a probability normalization function, g (QE) e R^n*1Representing the weight distribution of answers of different questions or disagreements in similar session pairs, and being used for measuring the influence degree of different questions or disagreements in similar session pairs on different answers; att _ QEi represents a fused question vector corresponding to the ith answer;

and performing interactive processing on the answer vectors respectively corresponding to the different answers in the similar conversation pair and the fusion question vector to obtain interactive characteristics respectively corresponding to the different answers.

Optionally, the fusing processing is performed on the interactive features corresponding to the answers and answer features of other answers in a similar session pair of a group to which the answers belong to obtain fused features corresponding to the answers in the similar session pair, and the fusing processing includes:

carrying out mean value calculation processing on the interactive features respectively corresponding to different answers in the similar conversation pair to obtain a sequencing fusion vector;

and serially connecting the sequencing fusion vector to the interactive feature of each answer in the similar conversation pair to obtain the fusion feature corresponding to each answer in the similar conversation pair.

Optionally, quality ranking processing is performed on the answers through a pre-trained multi-source answer quality ranking model;

wherein the multi-source answer quality ranking model is as follows: training a model obtained by taking the pseudo answer quality label as a model training label; the pseudo answer quality label is a quality label generated by evaluating the quality of the answers in each group of similar session pairs based on a preset business rule.

Optionally, before clustering the extracted session pairs based on a preset clustering algorithm, the method further includes:

and filtering low-quality conversation pairs in the extracted conversation pairs based on a preset filtering rule so as to perform clustering processing on the conversation pairs obtained after the low-quality conversation pairs are filtered.

Optionally, before performing quality ranking processing on the answers in the similar session pair, the method further includes:

and extracting the preset statistical characteristics corresponding to the similar session pairs in the insurance field, and performing quality ranking processing on the answers in the similar session pairs by combining the preset statistical characteristics.

An insurance domain FAQ knowledge base construction device comprises:

the system comprises a text acquisition unit, a text processing unit and a text processing unit, wherein the text acquisition unit is used for acquiring a conversation text of a client and a consultant in the insurance field;

a conversation pair extraction unit, which is used for extracting questions and answers matched with the questions from the conversation text, and/or extracting objections and answers matched with the objections to obtain conversation pairs including question answer pairs and/or objection answer pairs;

the quality sorting processing unit is used for performing quality sorting processing on the answers in the extracted conversation pairs to obtain answer quality sorting results; for the similar conversation pairs in the conversation pairs, controlling answer quality sequencing results through questions or objections in the similar conversation pairs;

and the knowledge base construction unit is used for filtering the conversation pairs of which the answer quality does not accord with the preset quality condition in the conversation pairs based on the answer quality sequencing result, and constructing an insurance field FAQ knowledge base comprising the conversation pairs which are not filtered.

An electronic device, comprising:

a memory for storing a set of computer instructions;

a processor for implementing the insurance domain FAQ repository construction method as described in any one of the above by executing the instruction set stored in the memory.

Compared with the prior art, the method has the following beneficial effects:

according to the insurance field FAQ knowledge base construction method, the insurance field FAQ knowledge base construction device and the electronic equipment, the session text of an insurance field client and a consultant is obtained, the question and the answer matched with the question are extracted from the session text, the disagreement and the answer matched with the disagreement are extracted to obtain a session pair, the answers in the session pair are subjected to quality sequencing processing and session pair filtering processing based on preset quality conditions, and finally the insurance field FAQ knowledge base including the session pair which is not filtered and meets the quality conditions is constructed. When the quality ranking processing is carried out on the answers in the extracted conversation pairs, the quality ranking result of the answers is controlled for the similar conversation pairs in the conversation pairs through questions or objections in the similar conversation pairs.

Therefore, the method provides and realizes an FAQ knowledge base construction scheme taking insurance field question answer pairs and/or objection answer pairs as knowledge pairs, provides support for analyzing client questions and objections by constructing the insurance field FAQ knowledge base, controls the quality sequencing result of the answers in the similar session pairs by the questions or the objections in the similar session pairs when constructing the FAQ knowledge base, enables the contextual characteristics embodied by the questions or the objections in the similar session pairs to be merged in the answer quality sequencing, further improves the answer quality sequencing performance, facilitates further construction of the FAQ knowledge base comprising the high-quality session pairs, provides a basis for shortening the time of solving the client questions by insurance consultants and improving the solution of the client questions, and accordingly can achieve the purposes of assisting the company to operate and formulating better marketing strategies.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a method for building an insurance domain FAQ knowledge base provided by the present application;

FIG. 2 is a training framework for a sentence recognition model provided herein;

FIG. 3 is a model structure diagram of a multi-source answer quality ranking model provided herein;

FIG. 4 is a flow diagram of a quality ranking process for answers provided herein;

fig. 5 is a schematic structural diagram of an insurance domain FAQ knowledge base construction device provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The applicant researches and discovers that currently, some types of knowledge bases exist in the insurance industry, the knowledge bases are commonly constructed insurance product knowledge maps and mainly focus on the insurance product side, and insurance consultants can search basic information of specific insurance products, such as the highest insurance amount, the affiliated insurance company and the like through the knowledge bases, however, the types of knowledge bases are only constructed for the insurance products, are not constructed with reference to the knowledge bases of client problems or objections, and cannot be used for analyzing the client problems and objections. And the knowledge bases are artificially constructed knowledge bases with small magnitude, so that the large-scale construction of the insurance field knowledge is difficult to complete.

Therefore, the application discloses a method and a device for building an FAQ knowledge base in the insurance field and electronic equipment, so as to solve the technical problems.

The processing procedure of the method for constructing the FAQ knowledge base in the insurance field disclosed by the application is shown in fig. 1, and specifically comprises the following steps:

step 101, obtaining the conversation text of the client and the consultant in the insurance field.

First, the dialogue text is obtained as the data base when the insurance consultant communicates with the client.

Specifically, a part of the obtained dialog text is used as a training sample for training a relevant model (such as a sentence recognition model, an objection model, a multi-source answer quality ranking model, and the like, which are referred to later), and another part of the obtained dialog text is used as a prediction sample of the model, so as to finally construct a required FAQ knowledge base.

Specific examples of the captured dialog text are provided below by table 1:

TABLE 1

And 102, extracting the question and the answer matched with the question from the conversation text, and/or extracting the objection and the answer matched with the objection to obtain a conversation pair comprising the question answer pair and/or the objection answer pair.

The embodiment of the application is characterized in that a question-answer pair extractor and an objection pair extractor are constructed in advance and are respectively used for extracting question answer pairs and objection answer pairs in a conversation text, wherein client objections are different from client questions, the client objections specifically mean that clients do not trust insurance companies or sales platform products to influence bargaining in a communication process, and insurance consultants need to answer the client objections correspondingly to promote bargaining.

The question-answer pair extractor comprises a sentence pattern recognition model which is a two-classification sentence pattern recognition model and is used for predicting sentence pattern labels of client sentences in the conversation text, wherein the predicted labels are as follows: an interrogative sentence or a non-interrogative sentence.

In conjunction with the training framework of the sentence pattern recognition model shown in FIG. 2, the training process of the two-class sentence pattern recognition model includes:

11) manually labeling the client speaking content in the conversation text, marking the client speaking content as an question sentence or a non-question sentence, and marking the client speaking content as null aiming at the non-client content, wherein the following table shows that:

TABLE 2

Character	Content providing method and apparatus	Label (R)
			Customer	You good	Non-question sentence
Customer	What is the cash value i want to ask?	Question sentence
			Consultant	Your good! The cash value of the so-called policy is	Air conditioner
Customer	The former is as follows	Non-question sentence

12) Performing custom word segmentation on the client role content in the step 11) and removing stop words;

the user-defined dictionary can be added to effectively segment words of words in a specific field, the user-defined dictionary can select hot words of services in the specific field, and the segmentation example is as follows:

TABLE 3

Original sentence	After word segmentation
		What is the cash value i want to ask?	I, want to ask what, is cash value?

13) Training a two-class sentence pattern recognition model by using the labeled samples in the step 12).

Wherein, frames such as FastText, TextCNN and the like can be selected to train the two-class sentence recognition model.

The objection pair extractor comprises objection models which are similar, the objection models are specifically two-classification objection models and are used for predicting the content of whether a client statement in a conversation text has objection or not, and the predicted labels are as follows: objection or non-objection.

The training process of the two-classification objection model comprises the following steps:

21) manually labeling content labels of the speaking content of the client in the conversation text, wherein the content labels comprise a dissimilarity and a non-dissimilarity;

specific examples are shown in the following table:

TABLE 4

Content providing method and apparatus	Label (R)
		I want to see this product.	Non objection to
But the insurance company that feels small is not very well chartered	Objection to
		I believe more that insurance company under line	Objection to

22) And performing two-classification objection model training by using the labeled samples.

Training of the two-class objection model can also adopt, but is not limited to, a FastText, TextCNN and other frameworks.

Based on the pre-constructed question-answer pair extractor and objection pair extractor, for the obtained dialog text (such as a prediction sample thereof) when the insurance advisor communicates with the client, the question-answer pair extractor may further be used to extract the question-answer pair of the client from the dialog text, and the objection pair extractor may be used to extract the objection answer pair of the client from the dialog text, so as to obtain a dialog pair corresponding to the dialog text, that is, the embodiment of the present application collectively refers to the question-answer pair and the objection answer pair as a dialog pair.

Specifically, the process of extracting question-answer pairs from the session text by using the question-answer pair extractor can be further implemented as follows:

31) carrying out sentence pattern prediction on client sentences in the conversation text by using a sentence pattern recognition model in the question-answer pair extractor to obtain a sentence pattern prediction result indicating that the client sentences are question sentences or non-question sentences;

32) screening client sentences of which the sentence pattern prediction results are question sentences as problems;

33) and taking the content of the first continuous conversation of the consultant corresponding to the question in the conversation text as the answer of the question to obtain a question-answer pair.

As in the example of table 1, the content of the advisor session indicated by index 4 and index 5 may be specifically merged and used as the answer to the client question indicated by index 3, and a question-answer pair is obtained accordingly, which is exemplified as follows:

TABLE 5

Problem(s)	Answer to the question
		What is the cash value meant?	Corresponding answer
What is the difference between medical risk and critical risk?	Corresponding answer

The process of extracting an objection answer pair from the conversation text by using an objection pair extractor may be further implemented as:

41) predicting the content of a client statement in a session text by using a pre-constructed objection model to obtain a content prediction result representing that the content of the client statement has objection or does not have objection;

42) extracting a client objection statement with an objection represented by a content prediction result;

43) and taking the content of the consultant session corresponding to the client objection sentence in the session text as an answer of the client objection sentence to obtain an objection answer pair.

Specific examples of objectional answer pairs are provided below by table 6:

TABLE 6

103, performing quality ranking processing on the answers in the extracted conversation pairs to obtain answer quality ranking results; and for the similar session pairs in the session pairs, controlling answer quality sequencing results through questions or objections in the similar session pairs.

The applicant researches and discovers that because the communication conversation between the insurance advisor and the client is biased to be spoken, a large amount of answers with low quality exist by simply taking the client questions or the speaking content of the advisor under the objection as answers, and further, the constructed knowledge base knowledge pair has redundancy and low quality, and cannot achieve the actual use effect. For this situation, the embodiment further proposes to perform quality ranking processing on the answers in the extracted session pairs, so as to screen out high-quality session pairs as knowledge pairs when constructing the knowledge base.

Specifically, the multi-source answer quality ranking model is provided, and the answers in the extracted conversation pairs are subjected to quality ranking processing based on the model.

The multi-source answer quality ranking model can be constructed through the following processing procedures:

rule filtering conversation pair

The step of filtering session pairs based on rules is an optional step.

Because the FAQ knowledge pairs are derived from the session texts when the insurance advisor communicates with the client, there are a lot of meaningless texts, preferably, the embodiment first filters the extracted session pairs by using a preset rule logic, and the specific steps are as follows:

51) filtering question/disagreement and answer conversation pairs without repeated vocabulary;

firstly, an open source word segmentation tool (such as jieba) is utilized to segment words of conversation pairs in a knowledge base, and if a question or an objection and an answer have no repeated words, the conversation pairs are removed.

52) Filtering answers for which a question still exists;

and carrying out sentence pattern/content prediction on the answers corresponding to the questions or the objections by using the sentence pattern recognition model or the objection model provided above, and if the prediction labels corresponding to the answers are question sentences or objection sentences, rejecting the conversation pairs.

53) Filtering the obvious meaningless text in the answer, such as: then, take part in.

Two, similar session pair aggregation

The conversation pairs obtained after the rule filtering still have a large number of similar problems, in order to further reduce redundancy and improve later-stage use efficiency, aiming at the insurance vertical field, the conversation pairs are clustered into a plurality of groups of different similar conversation pairs based on a preset clustering algorithm such as singlepass, as shown in the following;

TABLE 7

Third, pseudo answer quality label

The current answer quality judgment strictly depends on the quality of the manual annotation session pair, and different people have different judgment standards for the answer quality. The method for automatically extracting answers from the communication text is not strictly corrected manually, the quality is uneven, and in order to reduce manual intervention as much as possible, the application provides a method for generating answer quality pseudo labels based on a business angle, which specifically comprises the following steps:

61) dividing each conversation pair in each group of similar conversation pairs extracted and clustered from the conversation text into a transaction list or a non-transaction list;

62) since the sessions extracted by the communication in the deal sheet indirectly reflect the promotion of the consultant' S answers to the deal, the present embodiment defines the answer quality categories of consultant or sales monthly deal conversion > R1 and consultant/sales level > S1 in each group of similar sessions in the deal sheet as: "good";

63) the answer quality category that communicates with the extracted counselor or sales in the current month in N month but not in the deal order is defined as "normal";

the generated pseudo-tags are shown in the following table:

TABLE 8

It should be noted that, when the answer quality pseudo label is generated based on the business angle, the adopted rule is only one example of the present application, and in implementation, the answer quality pseudo label generation rule can be flexibly set according to the requirement.

Four, multi-source answer quality ranking model training

The current quality evaluation method trains an answer quality model simply by manually marking answer quality categories. The method relies heavily on manually labeled quality labels as described above, and the quality of answers of different session pairs is modeled uniformly, and the quality sequence among different answers of similar session pairs is not fully considered; based on the above, the present application provides a conditional control answer quality ranking model based on similar session pairs, i.e., a multi-source answer quality ranking model.

And step three, although the pseudo labels generated by each conversation pair in each group of similar conversation pairs cannot be used as accurate labels, the generated pseudo labels not only comprise 'good' class labels, but also possibly comprise 'common' class labels, and after the conversation pairs extracted from the traffic order reach a certain number, certain quality labels such as 'good' labels are dominant and can be used as training labels of an answer quality ranking model for model training.

The model structure of the multi-source answer quality ranking model constructed through training is shown in fig. 3, and specifically includes: the system comprises an answer coding module (encoder), a condition control module (Gate), an interaction module (interaction) and an answer sorting module, wherein the functions of the modules are explained by using the multi-source answer quality sorting model to perform quality sorting processing on the answers in the extracted conversation pair.

Based on the multi-source answer quality ranking model provided in the embodiment of the present application, referring to fig. 4, step 103 (performing quality ranking processing on the answers in the extracted conversation pair) may further be implemented as:

step 401, clustering the extracted session pairs based on a preset clustering algorithm to obtain a plurality of different groups of similar session pairs.

Specifically, but not limited to, based on a clustering algorithm such as singlepass, the session pairs used as the prediction samples are clustered into a plurality of different groups of similar session pairs, and optionally, before the clustering process, the session pairs used as the prediction samples may be filtered based on the rule filtering method provided above.

Step 402, determining the influence degree of different questions or objections in similar conversation pairs on different answers; and performing feature interaction on the answers in the similar session pairs and the matched questions or objections thereof based on the influence degrees of different questions or objections in the similar session pairs on different answers to obtain interaction features corresponding to the answers.

The applicant researches and discovers that information mining can be performed on different question information in similar session pairs to serve as context characteristics to assist in improving accuracy of answer quality ranking, and the applicant researches and discovers that different answers in the similar session pairs are different in emphasis, based on which, in order to evaluate answer quality more comprehensively and accurately, the embodiment provides a method for fully modeling question/objection texts under the similar session pairs and controlling quality ranking results of different answers in the similar session pairs through questions/objections in the similar session pairs.

First, the different questions in each set of similar session pairs are encoded at the condition control module as corresponding question vectors QE1, QE2,.., QEn, as follows:

QEi＝encoder(Qi) (1)

in the formula (1), the encoder may specifically be, but is not limited to, a pretrained model structure such as BERT, XLNet, and the like. The resulting set of similar problem vectors (QE1, QE 2.., QEn) is used as a result of gating devices controlling the quality ranking of different answers in similar pairs of sessions.

Then, introducing a weight mapping matrix W epsilon R^n*nCalculating the influence degrees of different questions in the similar conversation pair on different answers, and calculating fused question vectors respectively corresponding to the different answers in the similar conversation pair, wherein the calculation is as follows:

g(QE)＝softmax(W*QE) (2)

in the formulae (2) to (3), W.epsilon.R^n*nRepresenting a weight mapping matrix, n x n representing the matrix dimension of the matrix, R representing a dimension symbol, n representing the number of questions and disagreements in a similar question-answer pair, softmax being a probability normalization function, g (QE) e R^n*1Representing the weight distribution of answers of different questions or disagreements in similar session pairs, and being used for measuring the influence degree of different questions or disagreements in similar session pairs on different answers; att _ QEi represents a fused question vector corresponding to the ith answer.

On the basis, interactive processing is carried out on answer vectors respectively corresponding to different answers in the similar conversation pair and the fusion question vector to obtain interactive characteristics respectively corresponding to the different answers.

Specifically, the interaction module is mainly used for performing feature interaction on the questions and the answers in the similar session pair, and the interaction module can be correspondingly used for interacting the answer vector with the corresponding fusion question vector, and adding the fusion matched question vector on the basis of the answer coding vector to obtain interaction features F _ AQEi respectively corresponding to different answers in the similar session pair, which can be expressed as [ fi1,.. multidot.fin ]. The interaction processing is specifically as follows:

in equation (4), AEi represents the ith answer encoding vector in the similar conversation pair, att _ QEi represents the fused question vector corresponding to the ith answer,

representing the corresponding vector element multiplication.

And 403, fusing the interactive features corresponding to the answers with the answer features of other answers in the similar conversation pair of the group to which the answers belong to obtain fused features corresponding to the answers in the similar conversation pair.

And step 404, performing quality ranking processing on the answers in the similar conversation pairs based on the fusion features corresponding to the answers in the similar conversation pairs respectively.

The method mainly comprises a method based on point-wise, a pair-wise and a list-wise, wherein all document information cannot be fully integrated based on the point-wise and the method based on the pair-wise, and a complex calculation problem exists based on the list-wise.

The interactive features corresponding to the answers and the answer features of other answers in the similar conversation pair where the answers are located are fused, and the method can be further realized as follows: and performing mean value calculation processing on the interactive features respectively corresponding to different answers in the similar conversation pair to obtain a sequencing fusion vector, and connecting the sequencing fusion vector in series to the interactive features of each answer in the similar conversation pair to obtain the fusion features corresponding to each answer in the similar conversation pair.

Specifically, F _ AQE1,., F _ AQEn feature vectors (i.e., interactive features corresponding to different answers in a similar session pair) are averaged to obtain a sorted fusion vector FM _ AQE, which represents that the fusion vector is [ FM 1.,. fmn ], and can represent different features of the similar session pair, and the vector is connected in series to the F _ AQE1,. and F _ AQEn vectors, and the serially connected answer vectors are fused with features of other answers in the similar session pair of the group, which are represented as [ F1.,. ann, fn, FM1,..,. fmn ], and accordingly obtain a fusion feature corresponding to each answer in the similar session pair.

The following describes the concatenation process between feature vectors:

assuming that one vector is [1, 2, 3] and the other vector is [3, 4, 5], the two vectors are connected in series to obtain a series vector [1, 2, 3, 3, 4, 5 ].

On the basis, the quality ranking processing is carried out on each answer by using an answer ranking module according to the fusion feature corresponding to each answer in the similar conversation pair. The method for ranking the answers in the ranking stage is capable of further improving the quality ranking performance of the answers due to the fact that the context vectors of the groups to which the answers belong are fused in the ranking stage.

And 104, filtering the conversation pairs with answer quality not meeting preset quality conditions in the conversation pairs based on the answer quality sequencing result, and constructing an insurance field FAQ knowledge base comprising the conversation pairs which are not filtered.

On the basis of carrying out quality ranking on different answers in each group of similar conversation pairs, the conversation pairs which do not accord with the preset quality condition can be further filtered. The preset quality condition may be, but is not limited to, set that the quality prediction confidence needs to reach a set confidence threshold, or the quality rank is within a top k range (k is an integer greater than 0, and a specific value thereof may be set according to a requirement).

The insurance industry stipulates that the network marketing insurance needs to meet the compliance requirement, the advisor expression content needs to be subjected to compliance detection, and the advisor answers need to be subjected to compliance detection, so that optionally, the conversation pair can be subjected to compliance detection, and if an illegal conversation pair is detected, the illegal conversation pair is rejected.

And finally, importing each conversation pair which meets the quality condition (or meets the quality condition and is in compliance) into a corresponding knowledge base as a knowledge pair of the knowledge base so as to realize the construction of the knowledge base. Wherein different databases can be used as knowledge carriers to construct the knowledge base.

In addition, optionally, the constructed knowledge base may include quality assessment information (such as confidence value, quality ranking, and/or the like) of different answers in similar session pairs in addition to the question answer pair/objection answer pair session pairs meeting the requirements, so as to provide richer references for the insurance advisor to solve the client query.

In summary, the method of the embodiment of the present application, therefore, the present application proposes and implements an FAQ knowledge base construction scheme using the answer pairs of questions and/or the answer pairs of disagreements in the insurance field as the knowledge pairs, support is provided for analyzing customer problems and dissimilarities by constructing an insurance domain FAQ knowledge base, and, when the FAQ knowledge base is constructed, the quality ranking result of the answers in the similar session pairs is controlled through questions or objections in the similar session pairs, the method has the advantages that the contextual characteristics embodied by the problems or the objections in the similar session pairs are merged into the answer quality ranking, the answer quality ranking performance is further improved, an FAQ knowledge base comprising the high-quality session pairs is conveniently and further constructed, a foundation is provided for shortening the time of the insurance consultant for solving the client questions and improving the time for solving the client questions, and the aims of assisting the operation of a company and making a better marketing strategy can be correspondingly achieved.

Optionally, in an embodiment, before performing quality ranking processing on the answers in the similar session pair, the method for constructing an insurance field FAQ knowledge base may further include:

and extracting the preset statistical characteristics corresponding to the similar session pairs in the insurance field, and performing quality ranking processing on the answers in the similar session pairs by combining the extracted preset statistical characteristics.

In order to further improve the accuracy of answer quality judgment, in this embodiment, in addition to taking the answer features (e.g., the fusion features corresponding to the answers) as the features for judging the answer quality, services based on the insurance sales field are also provided, and specific statistical features are added to participate in the quality judgment.

For the insurance field, the introduced statistical features include but are not limited to:

a. insurance advisor level: can be defined in connection with the advisor level under a specific service;

b. conversation gender: sex male or female;

c. conversation time length t: in general, the longer the communication duration in the one-pass conversation is, the higher the customer acceptance is, and the embodiment takes

As a duration feature.

It should be noted that although it is theoretically difficult to set the credibility/weight of different genders of a male and a female differently, so that the judgment of the answer quality is affected differently based on the gender characteristics, for a specific field such as an insurance field, there may still be potential slight differences (e.g., affinity, subjective credibility feeling of people) of different genders, which may cause certain influence on the judgment of the answer quality.

According to the method and the device, based on the services in the insurance sales field, specific statistical characteristics are introduced to participate in quality ranking processing of different answers in similar conversation pairs, potential factors influencing answer quality evaluation are mined from multiple aspects and multiple dimensions as much as possible, and the accuracy of a final answer quality ranking result can be further improved.

Corresponding to the method for constructing the insurance field FAQ knowledge base, the embodiment of the present application further discloses an insurance field FAQ knowledge base constructing device, as shown in fig. 5, the device includes:

a text acquiring unit 501, configured to acquire a session text of a client and a consultant in the insurance domain;

a conversation pair extracting unit 502, configured to extract a question and an answer matching the question from the conversation text, and/or extract an objection and an answer matching the objection, to obtain a conversation pair including a question answer pair and/or an objection answer pair;

a quality ranking processing unit 503, configured to perform quality ranking processing on the answers in the extracted session pair to obtain an answer quality ranking result; for the similar conversation pairs in the conversation pairs, controlling answer quality sequencing results through questions or objections in the similar conversation pairs;

the knowledge base constructing unit 504 is configured to filter, based on the answer quality ranking result, a session pair in which answer quality in the session pair does not meet a preset quality condition, and construct an insurance field FAQ knowledge base including the session pair that is not filtered.

In one embodiment, the conversation pair extracting unit 502, when extracting the question and the answer matching the question from the conversation text, is specifically configured to:

and taking the content of the first continuous conversation of the consultant corresponding to the question in the conversation text as the answer of the question to obtain a question answer pair.

The session pair extracting unit 502, when extracting an objection and an answer matching the objection from the session text, is specifically configured to:

In an embodiment, the quality sorting processing unit 503 is specifically configured to:

In an embodiment, the quality ranking processing unit 503, when determining the influence degrees of different questions or different opinions in similar session pairs on different answers, and performing feature interaction on the answers in the similar session pairs and the matched questions or different opinions based on the influence degrees of different questions or different opinions in the similar session pairs on different answers to obtain interaction features corresponding to the answers, is specifically configured to:

g(QE)＝softmax(W*QE)

wherein W ∈ R^n*nRepresenting a weight mapping matrix, n x n representing the matrix dimensions of the matrix, R representing dimension symbols, n representing the number of questions and objections in a similar session pair, softmax being a probability normalization function, g (QE) e R^n*1Representing the weight distribution of answers of different questions or disagreements in similar session pairs, and being used for measuring the influence degree of different questions or disagreements in similar session pairs on different answers;att _ QEi represents a fused question vector corresponding to the ith answer;

In an embodiment, when the interactive feature corresponding to the answer and the answer feature of the other answer in the similar session pair of the group to which the answer belongs are fused to obtain a fused feature corresponding to the answer in the similar session pair, the quality ranking processing unit 503 is specifically configured to:

In one embodiment, the device performs quality ranking processing on the answers in the extracted conversation pair through a pre-trained multi-source answer quality ranking model;

wherein, the multi-source answer quality ranking model is as follows: training a model obtained by taking the pseudo answer quality label as a model training label; the pseudo answer quality label is a quality label generated by evaluating the quality of the answers in each group of similar session pairs based on a preset business rule.

In an embodiment, before performing the clustering process on the extracted session pairs based on the preset clustering algorithm, the quality ranking processing unit 503 is further configured to:

In an embodiment, before performing the quality ranking process on each answer in the similar session pair, the quality ranking processing unit 503 is further configured to:

For the insurance field FAQ knowledge base construction device disclosed in the embodiment of the present application, since it corresponds to the insurance field FAQ knowledge base construction method disclosed in the above method embodiment, the description is relatively simple, and for the relevant similar points, please refer to the description of the above corresponding method embodiment, and details are not described here.

The embodiment of the present application further discloses an electronic device, which specifically includes:

a memory for storing a set of computer instructions;

the set of computer instructions may be embodied in the form of a computer program.

A processor for implementing the insurance domain FAQ knowledge base construction method as disclosed in any of the above method embodiments by executing a set of computer instructions.

The processor may be a Central Processing Unit (CPU), an application-specific integrated circuit (ASIC), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device.

Besides, the electronic device may further include a communication interface, a communication bus, and the like. The memory, the processor and the communication interface communicate with each other via a communication bus.

The communication interface is used for communication between the electronic device and other devices. The communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like, and may be divided into an address bus, a data bus, a control bus, and the like.

In addition, the embodiment of the application also discloses a storage medium, wherein a computer instruction set is stored in the storage medium, and the stored computer instruction set can be used for realizing the FAQ knowledge base construction method in the insurance field disclosed by any one of the method embodiments.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.

For convenience of description, the above system or apparatus is described as being divided into various modules or units by function, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

Finally, it is further noted that, herein, relational terms such as first, second, third, fourth, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A FAQ knowledge base construction method in the insurance field is characterized by comprising the following steps:

2. The method of claim 1, wherein extracting questions and answers matching the questions from the session text comprises:

3. The method according to claim 1, wherein the quality ranking processing on the answers in the extracted conversation pair to obtain an answer quality ranking result comprises:

fusing the interactive features corresponding to the answers with the answer features of other answers in the similar conversation pair of the group to which the answers belong to obtain fused features corresponding to the answers in the similar conversation pair;

4. The method according to claim 3, wherein the determining the influence degrees of different questions or different opinions in similar session pairs on different answers, and performing feature interaction on the answers in the similar session pairs and the matched questions or different opinions based on the influence degrees of different questions or different opinions in the similar session pairs on different answers to obtain interaction features corresponding to the answers comprises:

g(QE)＝softmax(W*QE)

5. The method according to claim 4, wherein the fusing the interactive features corresponding to the answers with the answer features of other answers in a similar conversation pair of a group to which the answers belong to obtain the fused features corresponding to the answers in the similar conversation pair includes:

6. The method of claim 5, wherein answers are quality ranked by a pre-trained multi-source answer quality ranking model;

7. The method according to claim 3, before clustering the extracted conversation pairs based on the preset clustering algorithm, further comprising:

8. The method of claim 3, further comprising, prior to said quality ranking of answers in similar pairs of sessions:

9. An insurance domain FAQ knowledge base construction device is characterized by comprising:

10. An electronic device, comprising:

a memory for storing a set of computer instructions;

a processor for implementing the insurance domain FAQ repository construction method according to any one of claims 1 to 8 by executing a set of instructions stored on a memory.