CN109189931B

CN109189931B - Target statement screening method and device

Info

Publication number: CN109189931B
Application number: CN201811034021.XA
Authority: CN
Inventors: 李几鞅
Original assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Priority date: 2018-09-05
Filing date: 2018-09-05
Publication date: 2021-05-11
Anticipated expiration: 2038-09-05
Also published as: CN109189931A

Abstract

The embodiment of the application provides a method and a device for screening target sentences, which relate to the technical field of intelligent customer service and comprise the following steps: firstly, extracting a target sentence set from a dialogue record of information consultation according to sentence characteristic information, and then determining the similarity between any two target sentences according to the multi-dimensional text characteristic information of any two target sentences by using a similarity fitting model. And then clustering the target sentences in the target sentence set according to the similarity between any two target sentences, and finally screening the target sentences which meet the set conditions from the clustered target sentences. Because the characteristics of the target sentences are more comprehensively expressed by the multi-dimensional text characteristics, the similarity between the target sentences is determined according to the multi-dimensional text characteristic information between the target sentences by utilizing the similarity fitting model, the precision of determining the similarity can be effectively improved, and the efficiency and the accuracy of screening the target sentences are improved.

Description

Target statement screening method and device

Technical Field

The embodiment of the application relates to the technical field of intelligent customer service, in particular to a method and a device for screening target sentences.

Background

At present, the market scale of the whole customer service in China exceeds billions. On the aspect of user experience, the online customer service is a customer service system with the highest enterprise utilization rate, wherein the utilization rate of intelligent customer service is improved year by year. The construction of the knowledge base is the core problem of the intelligent customer service, and as long as the data of the knowledge base is sufficient and comprehensive, the answer of the intelligent customer service to the problem can be satisfied by the user. In the prior art, a knowledge base is usually edited manually, and when the knowledge base is updated, sentences are screened from consultation records of a user manually and added to the knowledge base, so that the efficiency and the accuracy are low.

Disclosure of Invention

The embodiment of the application provides a method and a device for screening target sentences, which are used for automatically extracting the target sentences meeting preset conditions from information consultation conversations and improving the efficiency and the accuracy of screening the target sentences.

In a first aspect, an embodiment of the present application provides a method for screening target statements, where the method includes:

extracting a target statement set from a dialogue record of information consultation according to statement feature information;

acquiring multi-dimensional text characteristic information of any two target sentences;

obtaining the similarity between any two target sentences according to the multi-dimensional text feature information of any two target sentences by using a similarity fitting model, wherein the similarity fitting model is obtained by adopting the multi-dimensional text feature information of the target sentences for training in advance;

clustering the target sentences in the target sentence set according to the similarity between any two target sentences;

and screening out target sentences which meet set conditions from the clustered target sentences.

Because the target statement set is extracted from the information consultation conversation record according to the statement characteristic information, the target statements are clustered according to the similarity between the target statements in the target statement set, and the target statements containing similar characteristics are clustered into one class, the target statements meeting the preset conditions can be screened from the clustered target statements according to the requirements, and compared with the manual statement screening from the information consultation conversation, the efficiency of screening the target statements is improved. Secondly, the similarity fitting model is obtained by adopting multi-dimensional text feature information of the target sentences in advance for training, so that the similarity fitting model fully learns the relation between the similarity between the target sentences and the multi-dimensional text features of the target sentences, and in addition, the multi-dimensional text features more comprehensively express the features of the target sentences, so that the similarity fitting model is utilized to obtain the similarity between any two target sentences according to the multi-dimensional text feature information of any two target sentences, the precision of determining the similarity between any two target sentences can be effectively improved, and the accuracy of screening the target sentences is improved.

Optionally, the multi-dimensional text feature information includes keyword feature information and word order feature information;

the obtaining of the similarity between any two target sentences according to the multi-dimensional text feature information of any two target sentences by using the similarity fitting model includes:

determining the text similarity between any two target sentences according to the keyword feature information of any two target sentences by using a similarity fitting model;

determining semantic similarity between any two target sentences according to the word order characteristic information of any two target sentences by using the similarity fitting model;

and determining the similarity between any two target sentences according to the text similarity and the semantic similarity between any two target sentences.

When the similarity between any two target sentences is determined, only the keywords of any two target sentences are considered, and the word order relationship of any two target sentences is considered, so that the similarity of the determined target sentences is closer to the actual similarity between the target sentences, and the precision of determining the similarity between the target sentences is improved.

judging whether the text similarity between any two target sentences is greater than a preset threshold value or not;

if so, determining the text similarity between any two target sentences as the similarity between any two target sentences;

otherwise, determining the semantic similarity between any two target sentences according to the word order characteristic information of any two target sentences by using the similarity fitting model, and determining the similarity between any two target sentences according to the text similarity and the semantic similarity between any two target sentences.

When the similarity between any two target sentences is determined, the similarity between any two target sentences is determined by taking the keyword feature information as the main feature, and when the text similarity between any two target sentences is determined to be larger than the preset threshold value according to the keyword feature information, the text similarity is directly determined as the similarity between any two target sentences, the semantic similarity between any two target sentences is not determined according to the word order feature information, and the efficiency of determining the similarity between any two target sentences is improved. When the text similarity between any two target sentences is determined to be not more than the preset threshold value according to the keyword feature information, the similarity between any two target sentences is determined by combining the text similarity and the semantic similarity between any two target sentences, and the precision of determining the similarity between any two target sentences is improved.

Optionally, the extracting a target sentence set from a dialogue record of information consultation according to the sentence characteristic information includes:

and extracting target sentences containing the query words from the dialogue records of the information consultation to form a target sentence set.

When the user needs to extract the problems in the information consultation conversation record, the user or the customer service provides the problems, the target statement is extracted by adopting a method of matching the query words to form a target statement set, and the efficiency of extracting the target statement set from the information consultation conversation record is improved.

Optionally, the screening out target sentences meeting the set condition from the clustered target sentences includes:

and determining the target category and the target sentences in the target category from all the categories according to the quantity of the clustered target sentences in all the categories and the similarity between the target sentences.

Optionally, the dialog record is a dialog record between the customer service and the user, and the method further includes:

and updating the target sentences meeting the set conditions into the intelligent customer service knowledge base.

The target sentences with high occurrence frequency and high similarity are determined according to the number of the clustered target sentences of each category and the similarity between the target sentences, the target sentences with high occurrence frequency and high similarity are added to the intelligent customer service knowledge base, the applicability of the knowledge base can be improved, meanwhile, the accuracy of answering user consultation questions by the intelligent customer service is improved, and therefore user experience is improved.

In a second aspect, an embodiment of the present application provides an apparatus for screening target sentences, including:

the extraction module is used for extracting a target statement set from the dialogue record of the information consultation according to the statement feature information;

the processing module is used for acquiring multi-dimensional text characteristic information of any two target sentences; obtaining the similarity between any two target sentences according to the multi-dimensional text feature information of any two target sentences by using a similarity fitting model, wherein the similarity fitting model is obtained by adopting the multi-dimensional text feature information of the target sentences for training in advance;

the clustering module is used for clustering the target sentences in the target sentence set according to the similarity between any two target sentences;

and the screening module is used for screening out the target sentences which accord with the set conditions from the clustered target sentences.

the processing module is specifically configured to:

Optionally, the extracting module is specifically configured to:

Optionally, the screening module is specifically configured to:

Optionally, the dialog record is a dialog record between the customer service and the user, and the filtering module is further configured to: and updating the target sentences meeting the set conditions into the intelligent customer service knowledge base.

In a third aspect, an embodiment of the present application provides an intelligent customer service system, which includes at least one processing unit and at least one storage unit, where the storage unit stores a computer program, and when the program is executed by the processing unit, the processing unit is caused to execute the steps of the method of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program executable by an intelligent customer service system, the program causing the intelligent customer service system to perform the steps of the method of the first aspect when the program runs on the intelligent customer service system.

In the embodiment of the application, the target statement set is extracted from the information consultation conversation record according to the statement feature information, then the target statements are clustered according to the similarity between the target statements in the target statement set, and the target statements containing similar features are clustered into one class, so that the target statements meeting preset conditions can be screened out from the clustered target statements according to requirements, and compared with the manual statement screening from the information consultation conversation, the efficiency of screening the target statements is improved. Secondly, the similarity fitting model is obtained by adopting multi-dimensional text feature information of the target sentences in advance for training, so that the similarity fitting model fully learns the relation between the similarity between the target sentences and the multi-dimensional text features of the target sentences, and in addition, the multi-dimensional text features more comprehensively express the features of the target sentences, so that the similarity fitting model is utilized to obtain the similarity between any two target sentences according to the multi-dimensional text feature information of any two target sentences, the precision of determining the similarity between any two target sentences can be effectively improved, and the accuracy of screening the target sentences is improved. In addition, the target category is determined from each category according to the number of the target sentences of each category after clustering and the similarity of the target sentences, and the target sentences in the target category are updated to the intelligent customer service knowledge base, so that the coverage of the intelligent customer service knowledge base on the questions asked by the user is improved, and the accuracy of the intelligent customer service for answering the questions asked by the user is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is an application scenario diagram provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of an advisory window provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of an advisory window provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of an intelligent customer service system according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a method for screening target sentences according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a BiMPM model provided in an embodiment of the present application;

FIG. 7 is a flowchart illustrating a method for determining similarity between target sentences according to an embodiment of the present disclosure;

FIG. 8 is a flowchart illustrating a method for determining similarity between target sentences according to an embodiment of the present disclosure;

FIG. 9 is a flowchart illustrating a method for updating an intelligent customer service knowledge base according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a target sentence screening apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an intelligent customer service system according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clearly apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

For convenience of understanding, terms referred to in the embodiments of the present application are explained below.

Intelligent customer service system: the intelligent customer service system is developed on the basis of large-scale knowledge processing, is applied to the industry and is suitable for the technical industries of large-scale knowledge processing, natural language understanding, knowledge management, automatic question and answer systems, reasoning and the like, and the intelligent customer service system not only provides a fine-grained knowledge management technology for enterprises, but also establishes a quick and effective technical means based on natural language for communication between the enterprises and mass users; meanwhile, statistical analysis information required by fine management can be provided for enterprises.

A knowledge base: one refers to the set of rules applied by the expert system design, including facts and data associated with the rules, all of which form a knowledge base. The knowledge base is related to a specific expert system, and the sharing problem of the knowledge base does not exist; the other refers to a knowledge base with consulting properties, which is shared and not unique to a family.

In a specific practice process, the inventor of the present application finds that, after a merchant applies for an intelligent customer service in the prior art, a question that a user frequently asks is added to an intelligent customer service knowledge base according to experience, and then the question is associated with a corresponding answer. When the user submits the question to the intelligent customer service, the intelligent customer service searches the associated answer from the intelligent customer service knowledge base according to the question submitted by the user and feeds the answer back to the user. However, when the user actually consults the intelligent customer service, the questions asked and the way of asking the questions are related to personal habits and actual needs, so that the situation that the intelligent customer service knowledge base does not cover some questions consulted by the user occurs, and the intelligent customer service cannot answer the questions asked by the user. In order to improve the coverage degree of the intelligent customer service knowledge base on the questions provided by the user, the merchants extract the questions which cannot be answered by the intelligent customer service from the information consultation conversation records and add the questions to the knowledge base, but the method needs to screen out target sentences from the information consultation conversation records by manpower, and the efficiency and the accuracy are low.

For this reason, the inventor of the present application considers that it is possible to acquire a dialogue record that a merchant has served with a user for a certain period of time, and then extract a target sentence set from the dialogue record of information consultation according to sentence feature information. For example, when a problem raised by a user needs to be screened out from a conversation record of information consultation, the conversation record of information consultation can be matched with a preset query word, and a target sentence containing the query word is extracted to form a target sentence set.

In order to improve the applicability of the target sentences added into the intelligent customer service knowledge base, the target sentences screened from the dialogue records of the information consultation should be proposed by most users, so that the target sentences in the target sentence set can be clustered according to the similarity between any two target sentences, and then the target categories and the target sentences in the target categories are determined from the various categories according to the quantity of the target sentences in the various categories after clustering. Specifically, the categories may be sorted in the order from high to low in the number of target sentences, the category ranked in the top N is selected, then the target sentences ranked in the top N categories are pushed to the merchant, and the merchant stores the pushed target sentences to the intelligent customer service knowledge base according to actual requirements. The problem that the user frequently puts forward is not required to be screened out from the information consultation conversation records, so that the efficiency of updating the intelligent customer service knowledge base is improved.

Because the target sentences in the target sentence set are clustered according to the similarity between any two target sentences, the similarity between any two target sentences determines the clustering effect, thereby further influencing the accuracy of screening the target sentences from the information consultation conversation records. Therefore, after the multi-dimensional text feature information of each target sentence is obtained, the similarity between any two target sentences is obtained according to the multi-dimensional text feature information of any two target sentences by using a similarity fitting model, wherein the similarity fitting model is obtained by adopting the multi-dimensional text feature information of the target sentences for training in advance. The similarity fitting model is obtained by adopting multi-dimensional text feature information of the target sentences for training in advance, so that the similarity fitting model fully learns the relation between the similarity between the target sentences and the multi-dimensional text features of the target sentences, and in addition, the multi-dimensional text features more comprehensively express the features of the target sentences.

The method for screening target statements in the embodiment of the present application may be applied to an application scenario as shown in fig. 1, where the application scenario includes a user terminal 101, a customer service terminal 102, and an intelligent customer service system 103.

The user terminal 101 and the customer service terminal 102 are electronic devices with network communication capability, and the electronic devices may be smart phones, tablet computers, portable personal computers, and the like. The user terminal 101 is connected with the intelligent customer service system 103 through a wireless network, and the customer service terminal 102 is connected with the intelligent customer service system 103 through a wireless network. The intelligent customer service system 103 comprises a knowledge base and a target statement screening device, and the intelligent customer service system 103 is a server cluster or a cloud computing center formed by one or a plurality of servers.

The user clicks on the merchant website in the user terminal 101 and then enters the merchant web page. When the user clicks the customer service icon in the merchant webpage, the user terminal 101 sends a consultation request to the intelligent customer service system 103, and the intelligent customer service system 103 pops up a consultation window on the user terminal 101 after responding, as shown in fig. 2. When the user inputs the question "what is the price of asking for the commodity a? "time, the user terminal 101 submits the question input by the user to the intelligent customer service system 103. The intelligent customer service system 103 searches for a knowledge base corresponding to the merchant, and acquires an answer corresponding to the question input by the user from the knowledge base if the knowledge base contains the question input by the user or a question similar to the question input by the user. If the intelligent customer service system 103 queries the knowledge base to determine the question "what is the price of asking for the commodity a? If the answer is "100 yuan", the answer is sent to the user terminal 101, and the consultation window on the merchant web page displays "100 yuan", as shown in fig. 3. If the intelligent customer service system 103 queries the knowledge base to determine the question "what is the price of asking for the commodity a? When there is no corresponding answer, the intelligent customer service system 103 sends the question to the customer service terminal 102. Optionally, a merchant may include a plurality of manual customer services, before the intelligent customer service system 103 sends the problem to the customer service terminal 102, the intelligent customer service system determines the customer service terminal 102 in an idle state in the merchant, and then sends the problem to the customer service terminal 102 in the idle state. The customer service terminal 102 pops up a consultation interface or displays reminding information, and manual customer service inputs an answer of 100 yuan in the consultation interface popped up by the customer service terminal 102 and submits the answer to the consultation interface. The customer service terminal 102 sends the answer to the intelligent customer service system 103, and the intelligent customer service system 103 sends the answer to the user terminal 101. The consultation window on the merchant web page of the user terminal 101 displays "100 yuan" as shown in detail in fig. 3.

The intelligent customer service system 103 may provide intelligent customer service for a plurality of merchants, each merchant corresponding to a knowledge base in the intelligent customer service system 103. For example, the intelligent customer service system 103 provides intelligent customer service for the merchant a, the merchant B, and the merchant C, and the merchant a, the merchant B, and the merchant C respectively correspond to the knowledge base 1, the knowledge base 2, and the knowledge base 3 in the intelligent customer service system 103. When the intelligent customer service system 103 updates the knowledge base for any one merchant, for example, the merchant a, the intelligent customer service system 103 may obtain a conversation record between the artificial customer service and the user after the merchant a uses the intelligent customer service for a period of time, and then screen out a target sentence meeting a preset condition from the conversation record and add the target sentence to the knowledge base 1. The intelligent customer service system 103 may also obtain a conversation record between the artificial customer service and the user before the merchant a uses the intelligent customer service, and then screen out a target statement meeting a preset condition from the conversation record and add the target statement to the knowledge base 1. The intelligent customer service system 103 may also obtain a conversation record between the artificial customer service and the user before the merchant a uses the intelligent customer service and after the merchant a uses the intelligent customer service for a period of time, and then screen out a target statement meeting a preset condition from the conversation record and add the target statement to the knowledge base 1. After the intelligent customer service system 103 screens out the target statements meeting the preset conditions, the target statements are sent to the customer service terminal 102 corresponding to the customer service manager in the merchant a, and the customer service terminal 102 displays a target statement recommendation interface. Whether to add the recommendation target sentence to the knowledge base 1 is determined by the customer service manager. And if the customer service manager selects and submits all recommended target sentences in the target sentence recommendation interface. The intelligent customer service system 103 adds all recommended target statements to the knowledge base 1. If the recommended target sentence is a question provided by the user, the customer service manager selects the target sentence on the target sentence recommendation interface, fills in an answer corresponding to the target sentence, and submits the answer, and the intelligent customer service system 103 adds the target sentence selected by the customer service manager and the answer corresponding to the target sentence to the knowledge base 1.

Further, in the application scenario diagram shown in fig. 1, a schematic structural diagram of the intelligent customer service system 103 is shown in fig. 4, and the intelligent customer service system 103 includes: an extraction module 1031, a similarity fitting model 1032, a clustering module 1033, a screening module 1034, a knowledge base 1035, and a memory 1036.

The extraction module 1031 acquires the dialogue record of information consultation between the artificial customer service and the user from the memory 1036 after the merchant uses the intelligent customer service for a period of time, and then extracts the target sentence set from the dialogue record of information consultation according to the sentence characteristic information. The extraction module 1031 inputs the target sentence set into the similarity fitting model 1032. The similarity fitting model 1032 acquires the multi-dimensional text feature information of each target sentence, and determines the similarity between any two target sentences according to the multi-dimensional text feature information of any two target sentences. The extraction module 1031 sends the target sentence set in which the similarity between any two target sentences is determined to the clustering module 1033. The clustering module 1033 clusters the target sentences in the target sentence set according to the similarity between any two target sentences. The filtering module 1034 filters the target sentences meeting the set conditions from the clustered target sentences, and then sends the target sentences meeting the set conditions to the customer service terminal 102. The filtering module 1034 adds the target sentence selected by the customer service manager to the knowledge base 1035 when receiving the target sentence selection instruction sent by the customer service terminal 102.

Based on the application scenario diagram shown in fig. 1 and the schematic structural diagram of the intelligent customer service system shown in fig. 4, an embodiment of the present application provides a flow of a target statement screening method, where the flow of the method may be executed by a target statement screening apparatus, as shown in fig. 5, the method includes the following steps:

step S501, extracting a target statement set from the dialogue records of information consultation according to the statement feature information.

The conversation record of the information consultation can be a conversation record between the customer service and the user, a conversation record between the consultation institution and the enterprise, and a conversation record between the consultation consultant and the client. The sentence characteristic information can be set according to actual requirements, and can be a word, a sentence or a symbol, and the like. And matching the sentence characteristic information with sentences in the information consultation dialogue records, and extracting target sentences containing the sentence characteristic information to form a target sentence set.

Step S502, multi-dimensional text feature information of any two target sentences is obtained.

The multidimensional text characteristic information at least comprises keyword characteristic information and word sequence characteristic information, wherein the keyword characteristic information comprises continuous hit word ratio, keyword repetition number, keyword similarity, Biterm similarity, Levenshtein distance and the like.

Step S503, obtaining the similarity between any two target sentences according to the multi-dimensional text feature information of any two target sentences by using the similarity fitting model.

The similarity fitting model is obtained by adopting multi-dimensional text characteristic information training of the target sentence in advance.

Step S504, clustering the target sentences in the target sentence set according to the similarity between any two target sentences.

Alternatively, when clustering is performed on the target sentences in the target sentence set, clustering algorithms such as K-Means clustering, mean shift clustering, density-based clustering, hierarchical clustering, and the like may be used.

Step S505 is to screen out target sentences meeting the set conditions from the clustered target sentences.

Specifically, in step S501, the extracting the target sentence set at least includes the following embodiments:

in one possible implementation, the sentence characteristic information can be a query word, and the sentences containing the query word are extracted from the dialogue records of the information consultation to form a target sentence set. Illustratively, the query words "do", "woolen", "what", "why" and "how much" are preset, a dialog record between the customer service and the user is obtained, each sentence in the dialog record is matched with the query words, and the sentence containing the at least one query word is determined. Set statement 1 "what is the price of article a? "contains the query word" how many ", sentence 2" what the function of article a is "contains the query word" what ", sentence 1 and sentence 2 are made into the target sentence set.

In one possible implementation, the sentence characteristic information may be a target sentence set including a keyword and a keyword weight greater than a threshold, and the target sentence set including the keyword and the keyword weight greater than the threshold is extracted from the dialogue record of the information consultation. Illustratively, a section of conversation record between customer service and a user is obtained, a TF-IDF (term frequency-inverse document frequency) is adopted to determine a keyword contained in each statement in the conversation record and the weight of the keyword, and if the statement 3 'price of commodity A' contains the keyword 'price' and the weight of the keyword is greater than a preset threshold value, the statement 3 'price of commodity A' is formed into a target statement set.

In the above steps S502 and S503, the similarity fitting model includes a text similarity fitting model, a semantic similarity fitting model, and a fusion model. The text similarity fitting model can be an XGboost model, and the semantic similarity fitting model can be a BiMPM (binary Multi-Peractive Matching) model, a BilSTM model and the like. The fusion model may be a Logistic Regression model (LR for short).

And when the multi-dimensional text characteristic information of the target sentence is adopted to train the similarity fitting model, performing combined training on the text similarity fitting model, the semantic similarity fitting model and the fusion model. Exemplarily, the text similarity fitting model is set as an XGBoost model, the semantic similarity fitting model is set as a BiMPM model, and the fusion model is set as an LR model. 30 ten thousand pairs of sentences are obtained in advance as training samples, and the 30 ten thousand pairs of sentences comprise semantically related sentences and also comprise semantically unrelated sentences. And after the text similarity of the statement pairs in the training samples is marked manually, inputting the text similarity into an XGboost model for training. In the training process, the XGboost model extracts keyword feature information of the statement pair, wherein the keyword feature information specifically comprises continuous hit word ratio, keyword repetition number, keyword similarity, Biterm similarity, Levenshtein distance and the like.

The semantic similarity of the statement pairs in the training samples is labeled manually and then input into a BiMPM model for training, and the structure of the BiMPM model is shown in FIG. 6 and comprises a Word Representation Layer (Word Representation Layer), a Word order Representation Layer (Context Representation Layer), a Matching Layer (Matching Layer), an Aggregation Layer (Aggregation Layer) and a Prediction Layer (Prediction Layer). For any one sentence pair P, Q, the sentences in the sentence pair are segmented separately. In particular, the sentence P is represented at the word representation level as words P1, P2, P3, … … pM. Sentence Q is represented as words Q1, Q2, Q3, … … qN. The word presentation layer inputs the sentence P, Q into the word order presentation layer after word segmentation. The word order representation layer extracts the word order relation of any two adjacent words in the sentence and uses the word order representation vector to represent. Specifically, the word order relationship of any two adjacent words may be a word order relationship from beginning to end of a sentence, i.e., an arrow pointing to the right side in the word order representation layer of fig. 6. The word order relationship of any two adjacent words can also be the word order relationship from the end of a sentence to the beginning of a sentence, i.e. the arrow pointing to the left side in the word order representation layer of fig. 6. And the matching layer matches the word sequence expression vector of the statement P with the word sequence expression vectors of any two adjacent words in the statement Q and outputs a matching vector. And the matching layer matches the word sequence expression vectors of any two adjacent words in the sentence P with the word sequence expression vector of the sentence Q and outputs a matching vector. The matching layer inputs the matching vectors into the aggregation layer, the aggregation layer aggregates the matching vectors according to the word order relationship, and an arrow pointing to the right in the aggregation layer shown in fig. 6 indicates that the matching vectors are aggregated according to the word order relationship from the beginning to the end of a sentence to obtain an aggregated vector. The arrow pointing to the left indicates that the matching vectors are aggregated according to the word order relationship from the end of the sentence to the beginning of the sentence to obtain an aggregated vector. The aggregate vector is input to a prediction layer, which predicts semantic similarity Pr (y | P, Q) between the sentence P and the sentence Q from the 4 aggregate vectors.

In the training process, the text similarity output by the XGboost model and the semantic similarity output by the BiMPM model are input into the LR model, and the LR model outputs the similarity of statement pairs. And when the target function of the model formed by the XGboost model, the BiMPM model and the LR model meets the preset condition, finishing training to obtain a final similarity fitting model.

In step S503, when the similarity between any two target sentences is obtained according to the multidimensional text feature information of any two target sentences by using the similarity fitting model, at least the following embodiments exist:

in one possible embodiment, as shown in fig. 7, the following steps are included:

and step S701, determining the text similarity between any two target sentences according to the keyword feature information of any two target sentences by using a similarity fitting model.

Optionally, the trained XGBoost model is used to determine the text similarity between any two target sentences according to the keyword feature information of any two target sentences in the target sentence set.

Step S702, determining semantic similarity between any two target sentences according to the word order characteristic information of any two target sentences by using a similarity fitting model.

Optionally, the trained BiMPM model is used to determine semantic similarity between any two target sentences according to the word order feature information of any two target sentences in the target sentence set.

Step S703, determining the similarity between any two target sentences according to the text similarity and semantic similarity between any two target sentences.

Optionally, the similarity between any two target sentences is determined according to the text similarity and the semantic similarity between any two target sentences by using the trained LR model.

In one possible embodiment, as shown in fig. 8, the following steps are included:

step S801, determining text similarity between any two target sentences according to the keyword feature information of any two target sentences by using a similarity fitting model.

Step S802, determining whether the text similarity between any two target sentences is greater than a preset threshold, if so, performing step S803, otherwise, performing step S804.

Step S803, the text similarity between any two target sentences is determined as the similarity between any two target sentences.

Step S804, a similarity fitting model is used for determining semantic similarity between any two target sentences according to word order feature information of any two target sentences.

Step S805, determining the similarity between any two target sentences according to the text similarity and the semantic similarity between any two target sentences.

In step S505, at least the following methods are included to filter the target sentences that meet the set conditions:

in a possible implementation manner, the target category and the target sentences in the target category are determined from each category according to the number of the target sentences of each category after clustering. Illustratively, 20 categories are obtained after target sentence clustering is set, and then the 20 categories are sorted from large to small according to the number of target sentences, and the target category arranged at the top 10 and the target sentences in the target category are determined.

In a possible implementation manner, the target category and the target sentences in the target category are determined from the various categories according to the similarity between the clustered target sentences of the various categories. Illustratively, 20 categories are obtained after the target sentence sets are clustered, and then the average similarity between the target sentences in any one category is calculated. And sorting the 20 categories according to the sequence of the average similarity from large to small, and determining the top 10 target categories and the target sentences in the target categories.

In a possible implementation manner, the target category and the target sentences in the target category are determined from each category according to the number of the target sentences of each category after clustering and the similarity between the target sentences. Illustratively, 20 categories are obtained after the target sentence clusters are set, the 20 categories are sorted from large to small according to the number of the target sentences, and the category which is arranged at the top 15 is determined. Then, average similarity among the target sentences in the 15 categories is respectively calculated, the 15 categories are sorted according to the sequence of the average similarity from large to small, and the target category arranged at the top 10 and the target sentences in the target category are determined.

When the conversation record is the conversation record between the customer service and the user, after the step S505, the target sentence meeting the set condition is updated to the intelligent customer service knowledge base. Illustratively, the target sentence is a sentence containing a query word in a dialogue record between the customer service and the user. And determining 10 target categories from each category according to the quantity of the clustered target sentences of each category and the similarity between the target sentences. And respectively randomly selecting 3 target sentences from each category, and pushing the selected target sentences to the customer service terminal. And selecting target sentences added into the intelligent customer service knowledge base from the pushed target sentences by customer service management personnel. After the customer service manager selects the target sentence at the customer service terminal, the customer service manager fills in the answer corresponding to the target sentence, and then submits the selected target sentence and the corresponding answer to the intelligent customer service knowledge base.

For better explaining the embodiment of the present application, a method for updating an intelligent customer service knowledge base provided by the embodiment of the present application is described below with reference to a specific implementation scenario, where a process of the method may be executed by an intelligent customer service system, as shown in fig. 9, and the method includes the following steps:

step S901 obtains a session record between the customer service and the user.

The conversation record between the customer service and the user is the conversation record between the manual customer service and the user collected by the intelligent customer service system.

Step S902, extracting target sentences containing the query words from the dialogue records to form a target sentence set.

If the sentence containing the question word is not included in the dialogue record between the customer service and the user, the longest sentence which contains the keyword and has the keyword weight larger than the threshold value is selected from the dialogue record as the target sentence.

Step S903, inputting the target sentences in the target sentence set into the XGboost model, and outputting the text similarity between any two target sentences.

The XGboost model is obtained by training in advance according to the keyword feature information of the target statement with the similarity marked manually.

Step S904, determining whether the text similarity between any two target sentences is not greater than a preset threshold, if so, performing step S905, otherwise, performing step S912.

Step S905, inputting the target sentences in the target sentence set into the BiMPM model, and outputting the semantic similarity between any two target sentences.

The BiMPM model is obtained by training in advance according to the word order characteristic information of the target sentences marked with the similarity manually.

Step S906, the text similarity and the semantic similarity of the target sentences in the target sentence set are input into an LR model.

And step S907, determining the similarity between any two target sentences according to the text similarity and the semantic similarity between any two target sentences by using the trained LR model.

Step S908, performing hierarchical clustering on the target sentences in the target sentence set according to the similarity between any two target sentences.

In step S909, the number of target sentences in each category and the average similarity of the target sentences are calculated.

Step S910, determining a target category from each category according to the number of the target sentences and the average similarity of the target sentences.

And step S911, selecting a target sentence from the target category and updating the target sentence to the intelligent customer service knowledge base.

In step S912, the text similarity between any two target sentences is determined as the similarity between any two target sentences.

Because the target statement set is extracted from the information consultation conversation record according to the statement characteristic information, the target statements are clustered according to the similarity between the target statements in the target statement set, and the target statements containing similar characteristics are clustered into one class, the target statements meeting the preset conditions can be screened from the clustered target statements according to the requirements, and compared with the manual statement screening from the information consultation conversation, the efficiency of screening the target statements is improved. Secondly, the similarity fitting model is obtained by adopting multi-dimensional text feature information of the target sentences in advance for training, so that the similarity fitting model fully learns the relation between the similarity between the target sentences and the multi-dimensional text features of the target sentences, and in addition, the multi-dimensional text features more comprehensively express the features of the target sentences, so that the similarity fitting model is utilized to obtain the similarity between any two target sentences according to the multi-dimensional text feature information of any two target sentences, the precision of determining the similarity between any two target sentences can be effectively improved, and the accuracy of screening the target sentences is improved. In addition, the target category is determined from each category according to the number of the target sentences of each category after clustering and the similarity of the target sentences, and the target sentences in the target category are updated to the intelligent customer service knowledge base, so that the coverage of the intelligent customer service knowledge base on the questions asked by the user is improved, and the accuracy of the intelligent customer service for answering the questions asked by the user is improved.

Based on the same technical concept, the present application provides an apparatus for screening target sentences, as shown in fig. 10, the apparatus 1000 includes: an extraction module 1001, a processing module 1002, a clustering module 1003, and a screening module 1004.

An extracting module 1001, configured to extract a target statement set from a dialogue record of information consultation according to statement feature information;

the processing module 1002 is configured to obtain multi-dimensional text feature information of any two target sentences; obtaining the similarity between any two target sentences according to the multi-dimensional text feature information of any two target sentences by using a similarity fitting model, wherein the similarity fitting model is obtained by adopting the multi-dimensional text feature information of the target sentences for training in advance;

a clustering module 1003, configured to cluster the target sentences in the target sentence set according to a similarity between any two target sentences;

and the screening module 1004 is configured to screen out target sentences meeting the set conditions from the clustered target sentences.

the processing module 1002 is specifically configured to: determining the text similarity between any two target sentences according to the keyword feature information of any two target sentences by using a similarity fitting model; determining semantic similarity between any two target sentences according to the word order characteristic information of any two target sentences by using the similarity fitting model; and determining the similarity between any two target sentences according to the text similarity and the semantic similarity between any two target sentences.

the processing module 1002 is specifically configured to: determining the text similarity between any two target sentences according to the keyword feature information of any two target sentences by using a similarity fitting model; judging whether the text similarity between any two target sentences is greater than a preset threshold value or not; if so, determining the text similarity between any two target sentences as the similarity between any two target sentences; otherwise, determining the semantic similarity between any two target sentences according to the word order characteristic information of any two target sentences by using the similarity fitting model, and determining the similarity between any two target sentences according to the text similarity and the semantic similarity between any two target sentences.

Optionally, the extracting module 1001 is specifically configured to: and extracting target sentences containing the query words from the dialogue records of the information consultation to form a target sentence set.

Optionally, the screening module 1004 is specifically configured to: and determining the target category and the target sentences in the target category from all the categories according to the quantity of the clustered target sentences in all the categories and the similarity between the target sentences.

Optionally, the dialog record is a dialog record between the customer service and the user, and the filtering module 1004 is further configured to: and updating the target sentences meeting the set conditions into the intelligent customer service knowledge base.

Based on the same technical concept, the embodiment of the present application provides an intelligent customer service system, as shown in fig. 11, including at least one processor 1101, and a memory 1102 connected to the at least one processor, where a specific connection medium between the processor 1101 and the memory 1102 is not limited in the embodiment of the present application, and the processor 1101 and the memory 1102 are connected through a bus in fig. 11 as an example. The bus may be divided into an address bus, a data bus, a control bus, etc.

In the embodiment of the present application, the memory 1102 stores instructions executable by the at least one processor 1101, and the at least one processor 1101 may execute the steps included in the foregoing method for filtering target statements by executing the instructions stored in the memory 1102.

The processor 1101 is a control center of the intelligent customer service system, and may connect various parts of the intelligent customer service system by using various interfaces and lines, and filter the target statements by executing or executing the instructions stored in the memory 1002 and calling the data stored in the memory 1102. Optionally, the processor 1101 may include one or more processing units, and the processor 1101 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1101. In some embodiments, the processor 1101 and the memory 1102 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 1101 may be a general purpose processor such as a Central Processing Unit (CPU), a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, configured to implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present Application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

Memory 1102, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 1102 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 1102 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1102 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function to store program instructions and/or data.

The intelligent customer service system further includes an input unit 1103, a display unit 1104, a radio frequency unit 1105, a power supply 1106, an external interface 1107, and the like.

The input unit 1103 may include a touch screen 11031 and other input devices 11032. The touch screen 11031 may collect touch operations of a user (e.g., operations of the user on or near the touch screen 11031 using any suitable object such as a finger, a joint, a stylus, etc.), i.e., the touch screen 11031 may be used to detect a touch pressure and a touch input position and a touch input area, and drive corresponding connection devices according to a preset program. The touch screen 11031 may detect a touch operation of the touch screen 11031 by a user, convert the touch operation into a touch signal and transmit the touch signal to the processor 1101, or may be understood as transmitting touch information of the touch operation to the processor 1101, and may receive and execute a command transmitted from the processor 1101. The touch information may include at least one of pressure magnitude information and pressure duration information. The touch screen 11031 may provide an input interface and an output interface between the smart customer service system and the user. In addition, the touch screen 11031 may be implemented in various types, such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 1103 may include other input devices 11032 in addition to the touch screen 11031. For example, other input devices 111032 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 1104 may be used to display the screened target sentence. Further, the touch screen 11031 may cover the display unit 1104, and when the touch screen 11031 detects a touch operation thereon or nearby, the touch screen 11031 may transmit the pressure information of the touch operation to the processor 1101 to be determined. In the embodiment of the present application, the touch screen 11031 and the display unit 1104 may be integrated into one component to implement the input, output and display functions of the smart customer service system. For convenience of description, the embodiment of the present application is schematically illustrated by taking the touch screen 11031 as an example of the functional set of the touch screen 11031 and the display unit 1104, but in some embodiments, the touch screen 11031 and the display unit 1104 may also be taken as two separate components.

When the display unit 1104 and the touch panel are superimposed on each other in the form of layers to form the touch screen 11031, the display unit 1104 can function as an input device and an output device, and when functioning as an output device, can be used to display an image. The Display unit 1104 may include at least one of a Liquid Crystal Display (LCD), a Thin Film Transistor Liquid Crystal Display (TFT-LCD), an Organic Light Emitting Diode (OLED) Display, an Active Matrix Organic Light Emitting Diode (AMOLED) Display, an In-Plane Switching (IPS) Display, a flexible Display, a 3D Display, and the like. Some of these displays may be configured to be transparent to allow a user to view from the outside, which may be referred to as transparent displays, and the intelligent customer service system may include two or more display units, depending on the particular desired implementation.

The rf unit 1105 may be used for receiving and transmitting information or signals during a call. Typically, the radio frequency circuitry includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. Further, the radio frequency unit 1005 may also communicate with a network device and other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), etc.

The smart customer service system may also include a power supply 1106 (e.g., a battery) for receiving external power to power the various components within the smart customer service system. Preferably, the power supply 1106 may be logically connected to the processor 1106 through a power management system, so that functions of managing charging, discharging, and power consumption management are implemented through the power management system.

The intelligent customer service system may further include an external interface 1107, where the external interface 1107 may include a standard Micro USB interface, or may include a multi-pin connector, and may be used to connect the intelligent customer service system to communicate with other devices, or may be used to connect a charger to charge the intelligent customer service system.

Based on the same inventive concept, an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and when the computer instructions are run on an intelligent customer service system, the intelligent customer service system executes the steps of the method for screening target statements as described above.

It should be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for screening target sentences, comprising:

extracting a target statement set from a dialogue record of information consultation according to statement feature information, wherein the statement feature information at least comprises a query word, and the dialogue record is a dialogue record between customer service and a user;

and screening out target sentences meeting set conditions from the clustered target sentences, and updating the target sentences meeting the set conditions and answers corresponding to the target sentences meeting the set conditions into an intelligent customer service knowledge base, wherein the answers corresponding to the target sentences meeting the set conditions are filled in a target sentence recommendation interface by customer service management personnel.

2. The method of claim 1, wherein the multi-dimensional text feature information includes keyword feature information and word order feature information;

3. The method of claim 1, wherein the multi-dimensional text feature information includes keyword feature information and word order feature information;

4. The method of claim 1, wherein the extracting a target sentence set from a dialogue record of information consultation according to the sentence feature information comprises:

5. The method of claim 1, wherein the selecting the target sentences meeting the set condition from the clustered target sentences comprises:

6. An apparatus for filtering a target sentence, comprising:

the system comprises an extraction module, a query module and a query module, wherein the extraction module is used for extracting a target statement set from a dialogue record of information consultation according to statement feature information, the statement feature information at least comprises query words, and the dialogue record is a dialogue record between customer service and a user;

and the screening module is used for screening out target sentences meeting set conditions from the clustered target sentences, updating the target sentences meeting the set conditions and answers corresponding to the target sentences meeting the set conditions into the intelligent customer service knowledge base, wherein the answers corresponding to the target sentences meeting the set conditions are filled in a target sentence recommendation interface by customer service management personnel.

7. The apparatus of claim 6, wherein the multi-dimensional text feature information comprises keyword feature information and word order feature information;

the processing module is specifically configured to:

8. The apparatus of claim 6, wherein the multi-dimensional text feature information comprises keyword feature information and word order feature information;

the processing module is specifically configured to:

9. The apparatus of claim 6, wherein the extraction module is specifically configured to:

10. The apparatus of claim 6, wherein the screening module is specifically configured to:

11. An intelligent customer service system, comprising at least one processing unit and at least one memory unit, wherein the memory unit stores a computer program which, when executed by the processing unit, causes the processing unit to carry out the steps of the method according to any one of claims 1 to 5.

12. A computer-readable storage medium, characterized in that it stores a computer program executable by an intelligent customer service system, which program, when run on the intelligent customer service system, causes the intelligent customer service system to perform the steps of the method according to any one of claims 1 to 5.