CA3166079C

CA3166079C - A processing method, device and electronic device for a question-and-answer statement

Info

Publication number: CA3166079C
Application number: CA3166079A
Authority: CA
Inventors: Tie XIE; Mengying Yang
Original assignee: 10353744 Canada Ltd
Current assignee: 10353744 Canada Ltd
Priority date: 2021-06-29
Filing date: 2022-06-29
Publication date: 2024-04-02
Anticipated expiration: 2042-06-29
Also published as: CA3166079A1; CN113342955A

Abstract

The present application discloses a questioning/answering (hereinafter referred to as "Q&A") statement processing method, and corresponding device and electronic equipment, of which the method comprises splitting the session record into corresponding Q&A groups according to a preset Q&A splitting rule, wherein the Q&A groups include at least one questioning statement and at least one answering statement; determining a processing rule to which the Q&A groups correspond; splitting the Q&A groups into corresponding statement pairs according to the processing rule; and updating a knowledge base of a Q&A system according to the statement pairs. The present application achieves update of the knowledge base of the Q&A system according to historical Q&A records, and solves prior-art problems that questioning statements and answering statements included in human session data cannot be analyzed and mined, whereby update of the knowledge base is slow, and success rate of response is adversely affected.

Description

A PROCESSING METHOD, DEVICE AND ELECTRONIC DEVICE FORA
QUESTION-AND-ANSWER STATEMENT
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the field of natural language processing technology, and more particularly to a Q&A statement processing method, and corresponding device and electronic equipment.
Description of Related Art

[0002] In the traditional service industry, as a labor-intensive post, human customer service is a highly intensive and highly repetitive job over the entire time period.
Accordingly, in order to reduce manpower cost and enhance efficiency, more and more enterprises have introduced the automatic Q&A system enabling automatic response with corresponding answering statements to questions raised by users, alleviating the pressure of human customer service to a certain degree, and enhancing accuracy, standardization, and stability of enterprise services.

[0003] However, in order to guarantee that the automatic Q&A system can accurately respond to users, it is needed to maintain a colossal knowledge base system therefor.
The knowledge base contains great quantities of standard questions and corresponding answers, while the Q&A process of the intelligent customer service of the Q&A
system is mainly employed to match questions of users with standard questions in the knowledge base, if matching succeeds, answers corresponding to the standard questions are returned. Accordingly, the comprehensive degree of the knowledge base is the deciding factor affecting the response effect of the intelligent customer service system.
However, users' questions have never been immutable, and it is frequent for users to raise new questions not subsumed in the knowledge base due to diversified reasons, so it is of absolute necessity to maintain and update the knowledge base. In addition, the traditional role played by humans would not disappear in such Q&A systems as intelligent customer service, as humans would usually make certain supplementary corrections on questions unanswerable by or wrongly answered by intelligent customer Date Regue/Date Received 2022-06-29 service.

[0004] Therefore, there is an urgent need to propose a Q&A statement processing method, and corresponding device and electronic equipment capable of analyzing and mining human session data to generate Q&A pairs highly effectively, so as to solve the above technical problems pending in the state of the art.
SUMMARY OF THE INVENTION

[0005] In order to address deficiencies prevailing in prior-art technology, an main objective of the present invention is to provide a Q&A statement processing method, and corresponding device and electronic equipment, so as to solve the technical problems in the state of the art.

[0006] To achieve the above objective, according to one aspect, the present invention provides a Q&A statement processing method that comprises:

[0007] obtaining a session record to be processed, wherein the session record includes at least two statements, and the statements include questioning statements sent by questioners and answering statements sent by answerers;

[0008] splitting the session record into corresponding Q&A groups according to a preset Q&A
splitting rule, wherein the Q&A groups include at least one questioning statement and at least one answering statement;

[0009] determining a processing rule to which the Q&A groups correspond according to the number of the questioning statement(s) and the number of the answering statement(s) included in the Q&A groups;

[0010] splitting the Q&A groups into corresponding statement pairs according to the processing rule to which the Q&A groups correspond; and

[0011] updating a knowledge base of a Q&A system according to the statement pairs.

[0012] In some embodiments, each statement has a corresponding generation time, and the step of splitting the session record into corresponding Q&A groups according to a preset Date Regue/Date Received 2022-06-29 Q&A splitting rule includes:

[0013] sequentially traversing the session record according to the generation time of each statement;

[0014] judging, when the statement traversed is a questioning statement, whether the traversed questioning statement and the antecedent questioning statement of the traversed questioning statement belong to the same Q&A group according to a sentence pattern of the antecedent answering statement of the traversed questioning statement and/or according to an interval time to the antecedent questioning statement of the traversed questioning statement; and

[0015] determining, when the statement traversed is an answering statement, that the traversed answering statement belongs to the Q&A group to which the antecedent questioning statement of the traversed answering statement corresponds.

[0016] In some embodiments, the step of splitting the Q&A groups into corresponding statement pairs according to the processing rule to which the Q&A groups correspond includes:

[0017] splitting, when the number of the questioning statements included in the Q&A group does not exceed a first preset threshold, the questioning statements each into at least two text segments according to preset signs included in the questioning statements;

[0018] predicting whether two adjacent text segments belong to the same question by employing a preset binary classifier;

[0019] generating corresponding questioning statements respectively according to text segments predicted to belong to the same question; and

[0020] generating corresponding statement pairs according to all the questioning statements as generated and the answering statements included in the Q&A group.

[0021] In some embodiments, before the step of predicting whether two adjacent text segments belong to the same question by employing a preset binary classifier, the method further Date Regue/Date Received 2022-06-29 comprises:

[0022] traversing the text segments, and merging the traversed text segments with corresponding posterior text segments when the number of characters of the traversed text segments is smaller than a second preset threshold; and/or

[0023] merging the traversed text segments with corresponding posterior text segments by employing a preset classifier algorithm when the traversed text segments and the corresponding posterior text segments belong to the same intent class or when the traversed text segments belong to a preset merging intent class.

[0024] In some embodiments, the step of splitting the Q&A groups into corresponding statement pairs according to the processing rule to which the Q&A groups correspond includes:

[0025] combining, when the numbers of the questioning statements and the answering statements included in the Q&A group both exceed the first preset threshold, the questioning statements and the answering statements included in the Q&A group, and generating the corresponding statement pairs.

[0026] In some embodiments, the step of splitting the Q&A groups into corresponding statement pairs according to the processing rule to which the Q&A groups correspond includes:

[0027] employing, when the number of the answering statements included in the Q&A group does not exceed the first preset threshold and the number of the questioning statements as included exceeds the first preset threshold, the preset binary classifier to predict whether the questioning statements as included and the antecedent questioning statements of the questioning statements as included belong to the same question;

[0028] merging, when there are the questioning statements that belong to the same question, the questioning statements that belong to the same question and generating the corresponding statement pairs according to all the merged questioning statements and the answering statements; and Date Regue/Date Received 2022-06-29

[0029] generating the corresponding statement pairs according to all the questioning statements and the answering statements included in the Q&A group, when there are no questioning statements that belong to the same question.

[0030] In some embodiments, the step of updating a knowledge base of a Q&A
system according to the statement pairs includes:

[0031] clustering the statement pairs by employing a preset clustering algorithm, generating statement pair groups, and determining the number of the questioning statements included in each statement pair group;

[0032] determining matching degrees between the questioning statements and the answering statements included in the statement pair groups according to a preset similarity algorithm;

[0033] determining a weight to which each statement pair group corresponds according to the corresponding matching degrees and the numbers of questioning statements included in the statement pair groups; and

[0034] sequentially updating the knowledge base of the Q&A system according to the weight to which each statement pair group corresponds.

[0035] In some embodiments, before the step of splitting the session record into corresponding Q&A groups according to a preset Q&A splitting rule, the method further comprises:

[0036] rectifying any wrong word included in the session record according to a preset rectifying rule; and

[0037] performing a normalizing process on the rectified session record.

[0038] In some embodiments, before the step of splitting the session record into corresponding Q&A groups according to a preset Q&A splitting rule, the method further comprises:

[0039] recognizing the intent class to which each questioning statement included in the session record corresponds by employing the preset classifier algorithm and eliminating any questioning statement to which a preset irrelevant intent class corresponds as included Date Regue/Date Received 2022-06-29 in the session record.

[0040] According to the second aspect, the present application provides a Q&A
statement processing device that comprises:

[0041] an obtaining module, for obtaining a session record to be processed, wherein the session record includes at least two statements, and the statements include questioning statements sent by questioners and answering statements sent by answerers;

[0042] a splitting module, for splitting the session record into corresponding Q&A groups according to a preset Q&A splitting rule, wherein the Q&A groups include at least one questioning statement and at least one answering statement;

[0043] a judging module, for determining a processing rule to which the Q&A
groups correspond according to the number of the questioning statement(s) and the number of the answering statement(s) included in the Q&A groups; wherein

[0044] the splitting module is further employed for splitting the Q&A groups into corresponding statement pairs according to the processing rule to which the Q&A
groups correspond; and

[0045] an updating module, for updating a knowledge base of a Q&A system according to the statement pairs.

[0046] According to the third aspect, the present application provides an electronic equipment that comprises:

[0047] one or more processor(s); and

[0048] a memory, associated with the one or more processor(s) and used for storing a program instruction. The program instruction executes the following operations when it is read and executed by the one or more processor(s):

[0049] obtaining a session record to be processed, wherein the session record includes at least two statements, and the statements include questioning statements sent by questioners and answering statements sent by answerers;

Date Regue/Date Received 2022-06-29

[0050] splitting the session record into corresponding Q&A groups according to a preset Q&A
splitting rule, wherein the Q&A groups include at least one questioning statement and at least one answering statement;

[0051] determining a processing rule to which the Q&A groups correspond according to the number of the questioning statement(s) and the number of the answering statement(s) included in the Q&A groups;

[0052] splitting the Q&A groups into corresponding statement pairs according to the processing rule to which the Q&A groups correspond; and

[0053] updating a knowledge base of a Q&A system according to the statement pairs.

[0054] The present invention achieves the following advantageous effects.

[0055] The present application provides a Q&A statement processing method, comprising splitting the session record into corresponding Q&A groups according to a preset Q&A
splitting rule, wherein the Q&A groups include at least one questioning statement and at least one answering statement; determining a processing rule to which the Q&A
groups correspond according to the number of the questioning statement(s) and the number of the answering statement(s) included in the Q&A groups; splitting the Q&A
groups into corresponding statement pairs according to the processing rule to which the Q&A groups correspond; and updating a knowledge base of a Q&A system according to the statement pairs. By splitting a session into finer grains, the present application achieves update of the knowledge base of the Q&A system according to historical Q&A
records, and solves prior-art problems that questioning statements and answering statements included in human session data cannot be analyzed and mined, whereby update of the knowledge base is slow, and success rate of response is adversely affected.
BRIEF DESCRIPTION OF THE DRAWINGS

[0056] In order to more clearly describe the technical solutions in the embodiments of the present invention, drawings required for the illustration of the embodiments will be briefly introduced below. Apparently, the drawings described below are merely directed Date Regue/Date Received 2022-06-29 to some embodiments of the present invention, and it is possible for persons ordinarily skilled in the art to acquire other drawings without spending creative effort in the process based on these drawings.

[0057] Fig. 1 is a flowchart illustrating a session process provided by an embodiment of the present application;

[0058] Fig. 2 is a flowchart illustrating Q&A group splitting provided by an embodiment of the present application;

[0059] Fig. 3 is a flowchart illustrating merging of text segments provided by an embodiment of the present application;

[0060] Fig. 4 is a flowchart illustrating merging of questioning statements provided by an embodiment of the present application;

[0061] Fig. 5 is a flowchart illustrating the method provided by an embodiment of the present application;

[0062] Fig. 6 is a view illustrating the structure of the device provided by an embodiment of the present application; and

[0063] Fig. 7 is a view illustrating the structure of the electronic equipment provided by an embodiment of the present application.
DETAILED DESCRIPTION OF THE INVENTION

[0064] In order to make more lucid and clear the objectives, technical solutions and advantages of the present invention, the technical solutions in the embodiments of the present invention will be clearly and comprehensively described below with reference to the accompanying drawings in the embodiments of the present invention. Apparently, the embodiments as described are merely partial embodiments, rather than the entire embodiments, of the present invention. All other embodiments obtainable by persons ordinarily skilled in the art based on the embodiments in the present invention without spending any creative effort shall all be covered by the protection scope of the present invention.

Date Regue/Date Received 2022-06-29

[0065] As recited in the Description of Related Art, the comprehensive degree of the knowledge base is the deciding factor affecting the response effect of the intelligent customer service system.

[0066] To realize analysis and mining of human session data, the present application provides a Q&A statement processing method capable of enhancing generation efficiency of Q&A pairs, and ensuring update efficiency of the knowledge base of such a Q&A
system as intelligent customer service.

[0067] Embodiment 1

[0068] Specifically, as shown in Fig. 1, the process of analyzing and mining dialogue statements between customer service and a user according to the Q&A statement processing method provided by an embodiment of the present application includes the following.

[0069] S10 ¨ obtaining a session record to be processed, and preprocessing the obtained session record.

[0070] Specifically, the process of preprocessing the session record includes the following.

[0071] S 1 1 - rectifying any wrong word included in the session record according to a preset rectifying rule.

[0072] The session record can include speech statements and text statements.
When the session record is directed to text statements, the principal wrong words are homonyms;
when the session record is directed to speech statements, it is firstly required to convert the speech statements into text statements through the speech recognition technique, the main reason for generating wrong words is the imprecise speech recognition, so the corresponding wrong words are not only homonyms but also the words that are similar or identical in pronunciation. Accordingly, the language model and word frequency features are combined in the embodiments of the present application, corresponding rectifying rules are provided for speech statements and text statements respectively, and wrong words can be rectified according to the corresponding rectifying rules.

Date Regue/Date Received 2022-06-29

[0073] S120 ¨ performing a purification operation on all characters included in the session record.

[0074] Specifically, the purification operation includes removing irrelevant characters such as preset useless punctuations and preset stop words, thereafter recognizing irrelevant information contained in each text statement such as commodity names and placenames, etc., and normalizing the irrelevant information to corresponding preset characters according to the type to which the irrelevant information corresponds.

[0075] S130 ¨recognizing a dialogue intent to which the questioning statement sent by each user corresponds by employing a preset classifier algorithm.

[0076] S200 ¨ splitting the preprocessed session record into Q&A groups according to a preset Q&A splitting rule.

[0077] It is possible to define the session record of a user with customer service within one day as a segment of dialogue. The session record can be firstly split into one or more dialogue(s), and the dialogue(s) is/are then split into Q&A groups.

[0078] As can be known from historical data analysis of users, a user usually consults about the same type of questions within a preset period of time, if the customer service has replied thereto, then the user would usually consult about different questions next time in dialogue with the customer service.

[0079] Based on the above features, it is possible to split a segment of dialogue into one or more Q&A group(s) according to time unit and splitting strategy, as shown in Fig. 2, such a splitting process includes the following.

[0080] S210¨ traversing the dialogue according to a temporal sequence of generation times.

[0081] S220¨ screening questioning statements to be processed of a user when the questioning statements to be processed are traversed, and eliminating any questioning statement to be processed of the user that does not conform to a preset condition.

[0082] It is possible to define that each Q&A group starts with a questioning statement (marked lo Date Regue/Date Received 2022-06-29 as Q) of the user and ends with an answering statement (marked as A) of the customer service.

[0083] Specifically, it is possible to eliminate any questioning statement with eliminable intent and irrelevant to business whose number of characters is smaller than a preset number threshold or whose intent is judged by the preset classifier algorithm as chitchat intent.

[0084] S221 ¨ determining whether a questioning statement to be processed is to be merged with the antecedent questioning statement of the questioning statement to be processed according to a preset merging rule.

[0085] Specifically, when the interval time between the antecedent questioning statement of the questioning statement to be processed and the questioning statement to be processed exceeds a corresponding preset time threshold and/or when the sentence pattern of the antecedent answering statement of the questioning statement to be processed is a preset sentence pattern, the questioning statement to be processed can be merged with its antecedent questioning statement.

[0086] The antecedent questioning statement is a questioning statement that is temporally antecedent to the statement to be processed and with the shortest interval time to the statement to be processed. The antecedent answering statement is an answering statement that is temporally antecedent to the statement to be processed and with the shortest interval time to the statement to be processed.

[0087] When the interval time between the antecedent questioning statement of the questioning statement to be processed and the questioning statement to be processed exceeds a corresponding preset time threshold, it can be judged that there is no relevancy between the antecedent questioning statement and the questioning statement to be processed, so a new Q&A group can be generated according to the questioning statement to be processed. When the corresponding preset time threshold is not exceeded, it can be judged that there is relevancy between the antecedent questioning statement and the questioning statement to be processed, and the questioning statement to be processed can be merged in the Q&A group to which the antecedent questioning statement Date Regue/Date Received 2022-06-29 corresponds.

[0088] Preset sentence patterns include statements that guide the user to further respond to responses made by the customer service, usually include asks in reply by customer service to indefinite expressions of users, or sentence patterns asking for essential information from the user, such as "please provide your mobile phone number", etc.
When an antecedent answering statement is of a preset sentence pattern, no matter how long the interval time is between the statement to be processed and the antecedent questioning statement or the antecedent answering statement, it can all be considered that the statement sent by the user after the antecedent answering statement is made in reply to this antecedent answering statement and is relevant to this antecedent answering statement, rather than a new, independent question. Therefore, the questioning statement to be processed can be merged with the antecedent questioning statement.

[0089] When the antecedent answering statement of the questioning statement to be processed is of a preset sentence pattern, the questioning statement to be processed can be merged with the antecedent questioning statement, and the questioning statement to be processed is merged in the Q&A group to which the antecedent questioning statement corresponds.

[0090] S230 ¨ eliminating any answering statement whose number of characters is smaller than the preset number threshold when the answering statements of the customer service are traversed, and determining any answering statement remaining after the elimination as an answering statement to be processed.

[0091] S231 ¨ merging the answering statement to be processed with an antecedent answering statement when the antecedent statement of the answering statement to be processed is an answering statement of the customer service, and storing the merged answering statement to the Q&A group to which the antecedent questioning statement corresponds.

[0092] The antecedent statement is a statement that is temporally antecedent to the statement to be processed and with the shortest interval time to the statement to be processed.

Date Regue/Date Received 2022-06-29

[0093] S232 ¨ merging the answering statement to be processed in the Q&A group to which the questioning statement corresponds when the antecedent statement of the answering statement to be processed is a questioning statement of the user.

[0094] Through the above process of splitting the dialogue into Q&A groups, the processing result includes three types of Q&A groups, including:

[0095] the circumstance in which one question corresponds to a segment of reply, and this circumstance is marked as QA;

[0096] the circumstance in which plural questions correspond to a segment of reply, that is to say, the user asks plural questions and the customer service replies with a segment of words, and this circumstance is marked as QQA; and

[0097] the circumstance in which plural questions correspond to plural replies, that is to say, several rounds of communication are carried out between the user and the customer service in a short time, and this circumstance is marked as QAQA.

[0098] The corresponding type can be determined according to the numbers of answering statements and questioning statements included in each Q&A group, and can be processed according to the corresponding processing rule. The above process includes the following.

[0099] S310 ¨ splitting the questioning statements included in the Q&A group each into text segments when the Q&A group is of the QA type.

[0100] QA is a standard input form of the subsequent algorithm, but original questions of the QA type are rather complicated, as a questioning statement of the user might contain two or more questions, for example, "How would it be if Willful Pay is overdue? And how much will be the overdue interest rate?". However, one standard question is only meant to express one question in the knowledge base, and splitting is therefore needed.

[0101] An auxiliary algorithm is required during splitting to judge whether two segments of words are directed to one question or to two questions. The auxiliary algorithm can be a binary classifier, inputs to the binary classifier are two statements, and the task thereof Date Regue/Date Received 2022-06-29 is to judge whether the two statements are describing the same and single question or different questions. Any random model can be employed to realize the binary questions.
Preferably, since the model bert can predict during the process of pretraining as to whether the input two statements are directed to the context of the same and single statement or topics irrelevant to each other, it is naturally suited to the above task, and bert can therefore be employed to serve as the classifier, and fine-tuning training can be performed under the task.

[0102] S311 ¨ processing the text segments from front to back, and merging any text segment whose number of characters is smaller than the preset number threshold in the posterior text segment of this text segment; and/or

[0103] merging any text segment pertaining to the same intent class as the corresponding posterior text segment or pertaining to a preset merging intent class in the posterior text segment of this text segment.

[0104] The posterior text segment indicates a text segment following and immediately adjacent to the text segment being processed.

[0105] Specifically, the above classifier algorithm can be employed to merge any text segment judged as pertaining to the same intent class as the posterior text segment or pertaining to such a preset merging intent class as a chitchat class in the posterior text segment.

[0106] S320 ¨ sequentially obtaining a preset number of adjacent text segments through a sliding window, and predicting whether the obtained text segments belong to the same and single question by means of the binary classifier algorithm.

[0107] As shown in Fig. 3, text segments predicted to belong to the same question can be merged into one questioning statement, text segments predicted to belong to different questions can be split into two different questioning statements, and the sequentially posterior text segment will continue to participate in the subsequent predicting process.

[0108] Through the above splitting process, Q&A groups of the QA type not belonging to the same and single question can be converted to Q&A groups of the QQA type, while Date Regue/Date Received 2022-06-29 Q&A groups of the QA type whose all text segments belong to the same and single question are split into a QA statement pair that only includes one questioning statement and one answering statement.

[0109] S320 ¨ traversing the questioning statements included in a Q&A group and judging whether each questioning statement and its antecedent questioning statement belong to the same question when the Q&A group is of the QQA type.

[0110] When the user sends a questioning statement to the customer service, meaningless pause of sentence might be generated, whereby the same question is split into two questioning statements, such as "May I venture to ask" and "how to pay back".
In some other embodiments, there is also the circumstance in which the user actually raises two questions but the customer service makes reply by the same segment of response statement.

[0111] To recognize to which circumstance a Q&A group of the QQA type specifically pertains, it is possible to judge through the aforementioned binary classification algorithm.
Specifically, the questioning statement can be split into text segments, and any text segment whose number of characters included is smaller than the preset number threshold or pertaining to the preset merging intent is directly merged with the antecedent questioning statement, or the questioning statement and the antecedent questioning statement are input together in the binary classification algorithm to judge whether they belong to the same question.

[0112] As shown in Fig. 4, when it is judged any text segment and its antecedent questioning statement belong to the same question, they can be merged into one questioning statement; when it is recognized that they do not belong to the same question, the questioning statement can be split into new questioning statements.

[0113] After all questioning statements have been traversed, if the statements that remain are only one answering statement and one questioning statement, they can be determined as a QA statement pair. If the statements that remain are more than one questioning statement and one answering statement, the questioning statements and the answering Date Regue/Date Received 2022-06-29 statements can be combined in pairs to generate corresponding QA statement pairs. For instance, when the remaining statements include questioning statement Ql, questioning statement Q2, and answering statement Al, statement pairs as generated then include statement pair Q1A1 and statement pair Q2A1.

[0114] S330 ¨ combining all questioning statements and answering statements in pairs to generate corresponding OA statement pairs when the Q&A group is of the QAQA
type.

[0115] The great deal of interaction between the user and the customer service in a short time can be usually split into plural groups of QA statement pairs. But there are some special cases, for instance, there is the guiding statement by the customer service as previously mentioned, so it is impossible to judge whether the questioning statements and the answering statements of the user and the customer service correspond to one another on a one-by-one basis.

[0116] To fully mine the problems of this type, answering statements and questioning statements can be directly combined in pairs with respect to Q&A groups of the QAQA
type. For instance, if there are three answering statements and three questioning statements, 9 combination modes will be generated out of the different Qs and different As.

[0117] S400 ¨ clustering the statements by employing a preset clustering algorithm, generating statement pair groups, and determining the number of questioning statements included in each statement pair group.

[0118] Clustering as made public in the embodiments of the present application means to incorporate similar questions together to constitute a cluster. Since users might raise repetitive questions, the objective of this operation is to place similar questions in the same and single cluster, so that it suffices to screen out therefrom only one or several representative questioning-answering pair(s) during subsequent manual selection or machine screening.

[0119] It is possible to calculate text distance metrics amongst the statement pairs via a text Date Regue/Date Received 2022-06-29 matching algorithm, and to determine whether the statement pairs belong to the same statement pair group according to the text distance metrics.

[0120] The text matching algorithm is an algorithm that calculates the similarity degree of two texts. Considering that the clustered objects are mostly questions outside the knowledge base, this means the supervised text matching algorithm trained on the basis of the original marking data in the knowledge base is of little effect. An unsupervised text matching algorithm, word mover's distance (WMD), can hence be employed. After the text distance metrics have been determined, any clustering algorithm can be applied to determine whether statement pairs belong to the same statement pair group, considering the advantage that the number of clusters is not needed to be predetermined in hierarchical clustering, hierarchical clustering is preferentially selected.

[0121] S410 ¨ determining matching degrees between the questioning statements and the answering statements included in the statement pair groups according to a preset similarity algorithm.

[0122] In all the QA pairs, there might appear invalid QA pairs caused by imprecise splitting, and also the circumstance in which answers are not pertinent to the questions asked due to negligence of the customer service, and such invalid QA statement pairs should be removed after filtration.

[0123] Filtration of the QA statement pairs is mainly decided by matching degrees of questions and answers, QA statement pairs whose matching degrees between questioning statements and answering statements satisfy a preset condition can be remained, while QA statement pairs whose matching degrees between questioning statements and answering statements do not satisfy the condition are eliminated and filtered out.

[0124] The above matching process is also a text matching process, since matching of questions and answers possesses certain generality, a set of supervised algorithms can be trained on the basis of existing knowledge base data to perform similarity calculation.

[0125] S420 - determining a weight to which each statement pair group corresponds according Date Regue/Date Received 2022-06-29 to the corresponding matching degrees and the numbers of questioning statements included in the statement pair groups.

[0126] Insofar as a knowledge base is concerned, not all questions have the same degree of importance, so frequently asked questions should have higher priorities to be maintained to the knowledge base. At the same time, the more accurate the answers to the collected questions are, the more valuable will they be for maintenance to the knowledge base. After the statement pairs have been sorted, more important questions therein can be preferentially maintained while some less valuable questions are neglected, whereby maintenance efficiency can be enhanced to a greater extent.

[0127] Frequencies by which questions are asked can be measured by the number of questions under each cluster obtained in the above clustering process, and accuracy of answers can be measured by matching degrees of questions and answers in the filtering process.

[0128] The corresponding sorting weight can be derived by normalizing two values and thereafter weighting and accumulating the same, and corresponding statement pairs can be sequentially obtained according to the sorting weights during subsequent maintenance of the knowledge base, further screened and processed manually or by machine, and maintained in the knowledge base.

[0129] The Q&A statement processing method provided by the embodiment of the present application realizes automated processing of statement pairs, alleviates workload of business personnel, greatly reduces operation and maintenance costs, greatly lowers the threshold to answer configuration since the answers are not entirely dependent upon human conception, and reduces training cost of operation and maintenance personnel.

[0130] Embodiment 2

[0131] Corresponding to the foregoing embodiment, as shown in Fig. 5, the present application provides a Q&A statement processing method that comprises the following steps.

[0132] 510 - obtaining a session record to be processed, wherein the session record includes at least two statements, and the statements include questioning statements sent by Date Regue/Date Received 2022-06-29 questioners and answering statements sent by answerers.

[0133] 520 - splitting the session record into corresponding Q&A groups according to a preset Q&A splitting rule, wherein the Q&A groups include at least one questioning statement and at least one answering statement.

[0134] Preferably, each statement has a corresponding generation time, and the step of splitting the session record into corresponding Q&A groups according to a preset Q&A
splitting rule includes:

[0135] 521 - sequentially traversing the session record according to the generation time of each statement;

[0136] 522 - judging, when the statement traversed is a questioning statement, whether the traversed questioning statement and the antecedent questioning statement of the traversed questioning statement belong to the same Q&A group according to a sentence pattern of the antecedent answering statement of the traversed questioning statement and/or according to an interval time to the antecedent questioning statement of the traversed questioning statement; and

[0137] 523 - determining, when the statement traversed is an answering statement, that the traversed answering statement belongs to the Q&A group to which the antecedent questioning statement of the traversed answering statement corresponds.

[0138] Preferably, before the step of splitting the session record into corresponding Q&A
groups according to a preset Q&A splitting rule, the method further comprises:

[0139] 524 - rectifying any wrong word included in the session record according to a preset rectifying rule; and

[0140] 525 - performing a normalizing process on the rectified session record.

[0141] Preferably, before the step of splitting the session record into corresponding Q&A
groups according to a preset Q&A splitting rule, the method further comprises:

[0142] 526 - recognizing the intent class to which each questioning statement included in the Date Regue/Date Received 2022-06-29 session record corresponds by employing a preset classifier algorithm and eliminating any questioning statement to which a preset irrelevant intent class corresponds as included in the session record.

[0143] 530 - determining a processing rule to which the Q&A groups correspond according to the number of the questioning statement(s) and the number of the answering statement(s) included in the Q&A groups.

[0144] 540 - splitting the Q&A groups into corresponding statement pairs according to the processing rule to which the Q&A groups correspond.

[0145] Preferably, the step of splitting the Q&A groups into corresponding statement pairs according to the processing rule to which the Q&A groups correspond includes:

[0146] 541 - splitting, when the number of the questioning statements included in the Q&A
group does not exceed a first preset threshold, the questioning statements each into at least two text segments according to preset signs included in the questioning statements;

[0147] 542 - predicting whether two adjacent text segments belong to the same question by employing a preset binary classifier;

[0148] 543 - generating corresponding questioning statements respectively according to text segments predicted to belong to the same question; and

[0149] 544 - generating corresponding statement pairs according to all the questioning statements as generated and the answering statements included in the Q&A
group.

[0150] Preferably, before the step of predicting whether two adjacent text segments belong to the same question by employing a preset binary classifier, the method further comprises:

[0151] 545 - traversing the text segments, and merging the traversed text segments with corresponding posterior text segments when the number of characters of the traversed text segments is smaller than a second preset threshold; and/or

[0152] 546 - employing a preset classifier algorithm to merge the traversed text segments with corresponding posterior text segments when the traversed text segments and the Date Regue/Date Received 2022-06-29 corresponding posterior text segments belong to the same intent class or when the traversed text segments belong to a preset merging intent class.

[0153] Preferably, the step of splitting the Q&A groups into corresponding statement pairs according to the processing rule to which the Q&A groups correspond includes:

[0154] 547 - combining, when the numbers of the questioning statements and the answering statements included in the Q&A group both exceed the first preset threshold, the questioning statements and the answering statements included in the Q&A group, and generating the corresponding statement pairs.

[0155] Preferably, the step of splitting the Q&A groups into corresponding statement pairs according to the processing rule to which the Q&A groups correspond includes:

[0156] 548 - employing, when the number of the answering statements included in the Q&A
group does not exceed the first preset threshold and the number of the questioning statements as included exceeds the first preset threshold, the preset binary classifier to predict whether the questioning statements as included and antecedent questioning statements of the questioning statements as included belong to the same question;

[0157] 549 - merging, when there are the questioning statements that belong to the same question, the questioning statements that belong to the same question and generating the corresponding statement pairs according to all the merged questioning statements and the answering statements; and

[0158] generating the corresponding statement pairs according to all the questioning statements and answering statements included in the Q&A group, when there are no questioning statements that belong to the same question.

[0159] 550 ¨ updating a knowledge base of a Q&A system according to the statement pairs.

[0160] Preferably, the step of updating a knowledge base of a Q&A system according to the statement pairs includes:

[0161] 551 - clustering the statement pairs by employing a preset clustering algorithm, Date Regue/Date Received 2022-06-29 generating statement pair groups, and determining the number of the questioning statements included in each statement pair group;

[0162] 552 - determining matching degrees between the questioning statements and the answering statements included in the statement pair groups according to a preset similarity algorithm;

[0163] 553 - determining a weight to which each statement pair group corresponds according to the corresponding matching degrees and the numbers of questioning statements included in the statement pair groups; and

[0164] 554 - sequentially updating the knowledge base of the Q&A system according to the weight to which each statement pair group corresponds.

[0165] Embodiment 3

[0166] Corresponding to Embodiment 1 and Embodiment 2, as shown in Fig. 6, the present application provides a Q&A statement processing device that comprises:

[0167] an obtaining module 610, for obtaining a session record to be processed, wherein the session record includes at least two statements, and the statements include questioning statements sent by questioners and answering statements sent by answerers;

[0168] a splitting module 620, for splitting the session record into corresponding Q&A groups according to a preset Q&A splitting rule, wherein the Q&A groups include at least one questioning statement and at least one answering statement;

[0169] a judging module 630, for determining a processing rule to which the Q&A groups correspond according to the number of the questioning statement(s) and the number of the answering statement(s) included in the Q&A groups; wherein

[0170] the splitting module 620 is further employed for splitting the Q&A
groups into corresponding statement pairs according to the processing rule to which the Q&A
groups correspond; and

[0171] an updating module 640, for updating a knowledge base of a Q&A system according to Date Regue/Date Received 2022-06-29 the statement pairs.

[0172] Preferably, each statement has a corresponding generation time, and the splitting module 640 can be further employed for sequentially traversing the session record according to the generation time of each statement; judging, when the statement traversed is a questioning statement, whether the traversed questioning statement and the antecedent questioning statement of the traversed questioning statement belong to the same Q&A group according to a sentence pattern of an antecedent answering statement of the traversed questioning statement and/or according to an interval time to the antecedent questioning statement of the traversed questioning statement;
and determining, when the statement traversed is an answering statement, that the traversed answering statement belongs to the Q&A group to which an antecedent questioning statement of the traversed answering statement corresponds.

[0173] Preferably, the splitting module 630 can be further employed for splitting, when the number of the questioning statements included in the Q&A group does not exceed a first preset threshold, the questioning statements each into at least two text segments according to preset signs included in the questioning statements;

[0174] predicting whether two adjacent text segments belong to the same question by employing a preset binary classifier;

[0175] generating corresponding questioning statements respectively according to text segments predicted to belong to the same question; and

[0176] generating corresponding statement pairs according to all the questioning statements as generated and the answering statements included in the Q&A group.

[0177] Preferably, the splitting module 630 can be further employed for employing, when the number of the answering statements included in the Q&A group does not exceed the first preset threshold and the number of the questioning statements as included exceeds the first preset threshold, the preset binary classifier to predict whether the questioning statements as included and antecedent questioning statements of the questioning Date Regue/Date Received 2022-06-29 statements as included belong to the same question;

[0178] merging, when there are the questioning statements that belong to the same question, the questioning statements that belong to the same question and generating the corresponding statement pairs according to all the merged questioning statements and the answering statements; and

[0179] generating the corresponding statement pairs according to all the questioning statements and answering statements included in the Q&A group, when there are no questioning statements that belong to the same question.

[0180] Preferably, the splitting module 630 can be further employed for traversing the text segments, and merging the traversed text segments with corresponding posterior text segments when the number of characters of the traversed text segments is smaller than a second preset threshold; and/or merging the traversed text segments with corresponding posterior text segments by employing a preset classifier algorithm when the traversed text segments and the corresponding posterior text segments belong to the same intent class or when the traversed text segments belong to a preset merging intent class.

[0181] Preferably, the splitting module 630 can be further employed for combining, when the numbers of the questioning statements and the answering statements included in the Q&A group both exceed the first preset threshold, the questioning statements and the answering statements included in the Q&A group, and generating the corresponding statement pairs.

[0182] Preferably, the updating module 640 can be further employed for clustering the statement pairs by employing a preset clustering algorithm, generating statement pair groups, and determining the number of the questioning statements included in each statement pair group; determining matching degrees between the questioning statements and the answering statements included in the statement pair groups according to a preset similarity algorithm; determining a weight to which each statement pair group corresponds according to the corresponding matching degrees and Date Regue/Date Received 2022-06-29 the numbers of questioning statements included in the statement pair groups;
and sequentially updating the knowledge base of the Q&A system according to the weight to which each statement pair group corresponds.

[0183] Preferably, the splitting module 630 can be further employed for rectifying any wrong word included in the session record according to a preset rectifying rule; and performing a normalizing process on the rectified session record.

[0184] Preferably, the splitting module 630 can be further employed for recognizing the intent class to which each questioning statement included in the session record corresponds by employing the preset classifier algorithm and eliminating any questioning statement to which a preset irrelevant intent class corresponds as included in the session record.

[0185] Embodiment 4

[0186] Corresponding to all the foregoing embodiments, an embodiment of the present application provides an electronic equipment that comprises:

[0187] one or more processor(s); and a memory, associated with the one or more processor(s) and used for storing a program instruction. The program instruction executes the following operations when it is read and executed by the one or more processor(s):

[0188] before the step of splitting the session record into corresponding Q&A
groups according to a preset Q&A splitting rule, the method further comprises:

[0189] recognizing the intent class to which each questioning statement included in the session record corresponds by employing the preset classifier algorithm and eliminating any questioning statement to which a preset irrelevant intent class corresponds as included in the session record.

[0190] Fig. 6 exemplarily illustrates the framework of the electronic equipment that can specifically include a processor 1510, a video display adapter 1511, a magnetic disk driver 1512, an input/output interface 1513, a network interface 1514, and a memory 1520. The processor 1510, the video display adapter 1511, the magnetic disk driver 1512, the input/output interface 1513, the network interface 1514, and the memory 1520 Date Regue/Date Received 2022-06-29 can be communicably connected with one another via a communication bus 1530.

[0191] The processor 1510 can be embodied as a general CPU (Central Processing Unit), a microprocessor, an ASIC (Application Specific Integrated Circuit), or one or more integrated circuit(s) for executing relevant program(s) to realize the technical solutions provided by the present application.

[0192] The memory 1520 can be embodied in such a form as an ROM (Read Only Memory), an RAM (Random Access Memory), a static storage device, or a dynamic storage device. The memory 1520 can store an operating system 1521 for controlling the running of an electronic equipment 1500, and a basic input/output system (BIOS) 1522 for controlling lower-level operations of the electronic equipment 1500. In addition, the memory 1520 can also store a web browser 1523, a data storage administration system 1524, and an icon font processing system 1525, etc. The icon font processing system 1525 can be an application program that specifically realizes the aforementioned various step operations in the embodiments of the present application. To sum it up, when the technical solutions provided by the present application are to be realized via software or firmware, the relevant program codes are stored in the memory 1520, and invoked and executed by the processor 1510.

[0193] The input/output interface 1513 is employed to connect with an input/output module to realize input and output of information. The input/output module can be equipped in the device as a component part (not shown in the drawings), and can also be externally connected with the device to provide corresponding functions. The input means can include a keyboard, a mouse, a touch screen, a microphone, and various sensors etc., and the output means can include a display screen, a loudspeaker, a vibrator, an indicator light etc.

[0194] The network interface 1514 is employed to connect to a communication module (not shown in the drawings) to realize intercommunication between the current device and other devices. The communication module can realize communication in a wired mode (via USB, network cable, for example) or in a wireless mode (via mobile network, WIFI, Date Regue/Date Received 2022-06-29 Bluetooth, etc.).

[0195] The bus 1530 includes a passageway transmitting information between various component parts of the device (such as the processor 1510, the video display adapter 1511, the magnetic disk driver 1512, the input/output interface 1513, the network interface 1514, and the memory 1520).

[0196] Additionally, the electronic equipment 1500 may further obtain information of specific collection conditions from a virtual resource object collection condition information database 1541 for judgment on conditions, and so on.

[0197] As should be noted, although merely the processor 1510, the video display adapter 1511, the magnetic disk driver 1512, the input/output interface 1513, the network interface 1514, the memory 1520, and the bus 1530 are illustrated for the aforementioned equipment, the equipment may further include other component parts prerequisite for realizing normal running during specific implementation. In addition, as can be understood by persons skilled in the art, the aforementioned equipment may as well only include component parts necessary for realizing the solutions of the present application, without including the entire component parts as illustrated.

[0198] As can be known through the description to the aforementioned embodiments, it is clearly learnt by person skilled in the art that the present application can be realized through software plus a general hardware platform. Based on such understanding, the technical solutions of the present application, or the contributions made thereby over the state of the art, can be essentially embodied in the form of a software product, and such a computer software product can be stored in a storage medium, such as an ROM/RAM, a magnetic disk, an optical disk etc., and includes plural instructions enabling a computer equipment (such as a personal computer, a cloud server, or a network device etc.) to execute the methods described in various embodiments or some sections of the embodiments of the present application.

[0199] The various embodiments are progressively described in the Description, identical or similar sections among the various embodiments can be inferred from one another, and Date Regue/Date Received 2022-06-29 each embodiment stresses what is different from other embodiments.
Particularly, with respect to the system or system embodiment, since it is essentially similar to the method embodiment, its description is relatively simple, and the relevant sections thereof can be inferred from the corresponding sections of the method embodiment. The system or system embodiment as described above is merely exemplary in nature, units therein described as separate parts can be or may not be physically separate, parts displayed as units can be or may not be physical units, that is to say, they can be located in a single site, or distributed over a plurality of network units. It is possible to base on practical requirements to select partial modules or the entire modules to realize the objectives of the embodied solutions. It is understandable and implementable by persons ordinarily skilled in the art without spending creative effort in the process.

[0200] What the above describes is merely directed to preferred embodiments of the present invention, and is not meant to restrict the present invention. Any modification, equivalent substitution, and improvement made within the spirit and scope of the present invention shall all be covered by the protection scope of the present invention.

Date Regue/Date Received 2022-06-29

Claims

Claims:

1. A computer device comprising:
an obtaining module, configured to obtain a session record, wherein the session record includes at least two statements, wherein the statements include questioning statements sent by questioners and answering statements sent by answerers;
a splitting module, configured to:
split the session record into corresponding groups according to a preset splitting rule, wherein groups include at least one questioning statement and at least one answering statement;
split the groups into corresponding statement pairs according to a processing rule to which the groups correspond;
use, when a number of the answering statements included in the group does not exceed a first preset threshold and a number of the questioning statements as included exceeds the first preset threshold, a preset binary classifier to predict whether the questioning statements as included and an antecedent questioning statements of the questioning statements as included belong to a same question;
a judging module, configured to determine the processing rule to which the groups correspond according to the number of the questioning statements and the number of the answering statements included in the groups, wherein the processing rule is based on the number of questioning statements and the number of answering statements as compared to the first preset threshold; and an updating module, configured to update a knowledge base of a system according to statement pairs.

Date Recue/Date Received 202402-06

2. The computer device of claim 1, wherein each statement has a corresponding generation time.

3. The computer device of claim 2, wherein the splitting module further comprises:
sequentially traversing the session record according to generation time of each statement;
judging, when the statement traversed is the questioning statement, whether a traversed questioning statement and an antecedent questioning statement of the traversed questioning statement belong to same group according to a sentence pattern of antecedent answering statement of the traversed questioning statement and/or according to an interval time to the antecedent questioning statement of the traversed questioning statement; and determining, when the statement traversed is the answering statement, that a traversed answering statement belongs to the group to which the antecedent questioning statement of the traversed answering statement corresponds.

4. The computer device of claim 3, wherein the splitting module further comprises:
splitting, when the number of the questioning statements included in the group does not exceed the first preset threshold, the questioning statements each into at least two text segments according to preset signs included in the questioning statements;
predicting whether two adjacent text segments belong to a same question by employing a preset binary classifier;
generating corresponding questioning statements respectively according to text segments predicted to belong to the same question; and generating the corresponding statement pairs according to all the questioning statements as generated and the answering statements included in the group.
Date Recue/Date Received 202402-06

5. The computer device of claim 4, further comprises:
traversing the text segments, and merging traversed text segments with corresponding posterior text segments when number of characters of the traversed text segments is smaller than a second preset threshold.

6. The computer device of claim 5, further comprises:
merging the traversed text segments with corresponding posterior text segments by employing a preset classifier algorithm when the traversed text segments and the corresponding posterior text segments belong to a same intent class or when the traversed text segments belong to a preset merging intent class.

7. The computer device of claim 6, wherein the splitting module further comprises:
merging, when there are the questioning statements that belong to the same question, the questioning statements that belong to the same question and generating the corresponding statement pairs according to all merged questioning statements and the answering statements; and generating the corresponding statement pairs according to all the questioning statements and the answering statements included in the group, when there are no questioning statements that belong to the same question.

8. The computer device of claim 7, wherein the splitting module further comprises:
combining, when the numbers of the questioning statements and the answering statements included in the group both exceed the first preset threshold, the questioning statements and the answering statements included in the group; and generating the corresponding statement pairs.

9. The computer device of claim 8, wherein the updating module further comprises:
clustering the statement pairs by using a preset clustering algorithm;

Date Recue/Date Received 202402-06 generating statement pair groups;
determining the number of the questioning statements included in each statement pair group;
determining matching degrees between the questioning statements and the answering statements included in the statement pair groups according to a preset similarity algorithm;
determining a weight to each statement pair group corresponds according to corresponding matching degrees and number of questioning statements included in the statement pair groups; and sequentially updating the knowledge base of the system according to the weight to which each statement pair group corresponds.

10. The computer device of claim 9, the splitting module further comprises:
rectifying any wrong word included in the session record according to a preset rectifying rule; and performing a normalizing process on rectified session record.

11. The computer device of claim 10, the splitting module further comprises:
recognizing an intent class to which each questioning statement included in the session record corresponds by employing the preset classifier algorithm and eliminating any questioning statement to which a preset irrelevant intent class corresponds as included in the session record.

12. The computer device of claim 11, further comprises a process of analyzing and mining dialogue statements between customer service and a user comprising:
obtaining a session record to be processed and preprocessing obtained session record, and Date Recue/Date Received 202402-06 rectifying any wrong word included in the session record according to a preset rectifying rule.

13. The computer device of claim 12, further comprises:
performing a purification operation on all characters included in the session record;
recognizing a dialogue intent to which the questioning statement sent by each user corresponds by using the preset classifier algorithm;

14. The computer device of claim 13, further comprises:
traversing a dialogue according to a temporal sequence of generation times;
screening questioning statements to be processed of the user when the questioning statements to be processed are traversed, and eliminating any questioning statement to be processed of the user does not conform to a preset condition;
determining a questioning statement to be processed is to be merged with the antecedent questioning statement of the questioning statement to be processed according to a preset merging rule;
eliminating any answering statement whose number of characters is smaller than a preset number threshold when the answering statements of the customer service are traversed, and determining any answering statement remaining after the elimination as an answering statement to be processed;
merging the answering statement to be processed with an antecedent answering statement when the antecedent statement of the answering statement to be processed is an answering statement of the customer service, and storing the merged answering statement to the group to which the antecedent questioning statement corresponds; and Date Recue/Date Received 202402-06 merging the answering statement to be processed in the group to which the questioning statement corresponds when the antecedent statement of the answering statement to be processed is a questioning statement of the user.

15. The computer device of claim 14, further comprises splitting the questioning statements included in the group each into text segments when the group is of QA type;
processing the text segments from front to back, and merging any text segment whose number of characters is smaller than the preset number threshold in a posterior text segment of this text segment; and/or merging any text segment pertaining to the same intent class as the corresponding posterior text segment or pertaining to a preset merging intent class in the posterior text segment of this text segment;
sequentially obtaining a preset number of adjacent text segments through a sliding window, and predicting obtained text segments belong to a same and single question by means of a binary classifier algorithm;
traversing the questioning statements included in the group and judging each questioning statement and its antecedent questioning statement belong to the same question when the group is of QQA type; and combining all questioning statements and answering statements in pairs to generate corresponding OA statement pairs when the group is of QAQA type.

16. The computer device of claim 15, further comprises:
clustering the statements by employing a preset clustering algorithm, generating statement pair groups, and determining the number of questioning statements included in each statement pair group;

Date Recue/Date Received 202402-06 determining matching degrees between the questioning statements and the answering statements included in the statement pair groups according to a preset similarity algorithm; and determining a weight to which each statement pair group corresponds according to the corresponding matching degrees and the numbers of questioning statements included in the statement pair groups.

17. The computer device of claim 16, wherein the session record is directed to text statements, principal wrong words are homonyms.

18. The computer device of claim 17, wherein the session record is directed to a speech statements, the session record is firstly required to convert the speech statements into the text statements through speech recognition technique.

19. The computer device of claim 18, wherein language model and word frequency features are combined.

20. The computer device of claim 19, wherein corresponding rectifying rules are provided for the speech statements and the text statements respectively.

21. The computer device of claim 20, wherein a wrong words are rectified according to the corresponding rectifying rules.

22. The computer device of claim 21, wherein the purification operation includes removing irrelevant characters including preset useless punctuations and preset stop words, recognizing irrelevant information contained in each text statement including commodity names and placenames and normalizing the irrelevant infoimation to corresponding preset characters according to which the irrelevant information corresponds.

23. The computer device of claim 22, the session record of the user with customer service within one day is a segment of the dialogue.
Date Recue/Date Received 202402-06

24. The computer device of claim 23, wherein the session record is split into one or more dialogues, and the dialogues are split into groups.

25. The computer device of claim 24, wherein the user consults same type of questions within a preset period of time, wherein the customer service has replied, the user consults different questions next time in the dialogue with the customer service.

26. The computer device of claim 25, wherein to eliminate any questioning statement with eliminable intent and irrelevant to business whose number of characters is smaller than a preset number threshold or intent is judged by the preset classifier algorithm as chitchat intent.

27. The computer device of claim 26, wherein the interval time between the antecedent questioning statement of the questioning statement to be processed and the questioning statement to be processed exceeds a corresponding preset time threshold and/or when the sentence pattern of the antecedent answering statement of the questioning statement to be processed is a preset sentence pattern, the questioning statement to be processed is merged with its antecedent questioning statement.

28. The computer device of claim 27, wherein the antecedent questioning statement is a questioning statement that is temporally antecedent to the statement to be processed and with a shortest interval time to the statement to be processed.

29. The computer device of claim 28, wherein the antecedent answering statement is an answering statement that is temporally antecedent to the statement to be processed and with the shortest interval time to the statement to be processed.

30. The computer device of claim 29, wherein the interval time between the antecedent questioning statement of the questioning statement to be processed and the questioning statement to be processed exceeds the corresponding preset time threshold, judge there is no relevancy between the antecedent questioning statement and the questioning statement to be processed, so a new group is generated according to the questioning statement to be processed.

Date Recue/Date Received 202402-06

31. The computer device of claim 30, wherein the corresponding preset time threshold is not exceeded, judge there is relevancy between the antecedent questioning statement and the questioning statement to be processed, and the questioning statement to be processed is merged in the group to which the antecedent questioning statement corresponds.

32. The computer device of claim 31, wherein the preset sentence patterns include statements that guide the user to further respond to responses made by the customer service, includes asks in reply by the customer service to indefinite expressions of users, or the sentence pattern asking for essential information from the user.

33. The computer device of claim 32, wherein the antecedent answering statement is of the preset sentence pattern, the statement sent by the user after the antecedent answering statement is made in reply to this antecedent answering statement and is relevant to the antecedent answering statement, and the questioning statement to be processed is merged with the antecedent questioning statement.

34. The computer device of claim 33, wherein the antecedent answering statement of the questioning statement to be processed is of the preset sentence pattern, the questioning statement to be processed is merged with the antecedent questioning statement, and the questioning statement to be processed is merged in the group to which the antecedent questioning statement corresponds.

35. The computer device of claim 34, wherein the antecedent statement is a statement that is temporally antecedent to the statement to be processed and with the shortest interval time to the statement to be processed.

36. The computer device of claim 35, wherein splitting the dialogue into groups, wherein result includes three types of groups comprising:
one question corresponds to a segment of reply, is marked as QA;

Date Recue/Date Received 202402-06 plural questions correspond to a segment of reply, wherein the user asks plural questions and the customer service replies with a segment of words, is marked as QQA;
and plural questions correspond to plural replies, wherein several rounds of communication are carried out between the user and the customer service in a short time, is marked as QAQA.

37. The computer device of claim 36, wherein corresponding type is determined according to number of answering statements and questioning statements included in each group, and is processed according to corresponding processing rule.

38. The computer device of claim 37, wherein QA is a standard input form of a algorithm, wherein one standard question is only meant to express one question in the knowledge base.

39. The computer device of claim 38, wherein an auxiliary algorithm is required during splitting to judge whether two segments of words are directed to one question or to two questions.

40. The computer device of claim 39, wherein the auxiliary algorithm is a binary classifier, wherein inputs to the binary classifier are two statements.

41. The computer device of claim 40, wherein any model realizes binary questions.

42. The computer device of claim 41, wherein model bert predicts during the process of pretraining whether input two statements are directed to context of the same and single statement or topics irrelevant to each other, serves as the classifier, and fine-tuning training is performed.

43. The computer device of claim 42, wherein the posterior text segment indicates a text segment following and immediately adjacent to the text segment being processed.

Date Recue/Date Received 202402-06

44. The computer device of claim 43, wherein the classifier algorithm merges any text segment judged as pertaining to the same intent class as the posterior text segment or pertaining to the preset merging intent class as a chitchat class in the posterior text segment.

45. The computer device of claim 44, wherein text segments predicted to belong to the same question is merged into one questioning statement.

46. The computer device of claim 45, wherein the text segments predicted to belong to different questions are split into two different questioning statements.

47. The computer device of claim 46, wherein the groups of QA type not belonging to the same and single question are converted to groups of QQA type, wherein the groups of QA
type whose all text segments belong to the same and single question are split into a QA
statement pair only includes one questioning statement and one answering statement.

48. The computer device of claim 47, wherein to recognize to which circumstance a group of QQA type specifically pertains, judge through a binary classification algorithm.

49. The computer device of claim 48, wherein the questioning statement is split into text segments, and any text segment whose number of characters included is smaller than the preset number threshold or pertaining to the preset merging intent is directly merged with the antecedent questioning statement, or the questioning statement and the antecedent questioning statement are input together in the binary classification algorithm to judge they belong to the same question.

50. The computer device of claim 49, wherein it is judged any text segment and corresponding antecedent questioning statement belong to the same question, the text statement and the corresponding antecedent questioning statement are merged into one questioning statement.

Date Recue/Date Received 202402-06

51. The computer device of claim 50, wherein it is recognized any text segment and the corresponding antecedent questioning statement do not belong to the same question, the questioning statement is split into new questioning statements.

52. The computer device of claim 51, wherein the statements remain are only one answering statement and one questioning statement, they are determined as the QA
statement pair.

53. The computer device of claim 52, wherein the statements remain are more than one questioning statement and one answering statement, the questioning statements and the answering statements are combined in pairs to generate corresponding QA
statement pairs.

54. The computer device of claim 53, wherein interaction between the user and the customer service in the short time is split into plural groups of QA statement pairs.

55. The computer device of claim 54, wherein the answering statements and the questioning statements are directly combined in pairs with respect to groups of QAQA type.

56. The computer device of claim 55, wherein clustering is to incorporate similar questions together to constitute a cluster.

57. The computer device of claim 56, wherein calculate text distance metrics amongst the statement pairs via a text matching algorithm and determine the statement pairs belong to same statement pair group according to the text distance metrics.

58. The computer device of claim 57, wherein the text matching algorithm is an algorithm calculates similarity degree of two texts.

59. The computer device of claim 58, wherein an unsupervised text matching algorithm, word mover's distance (WMD), is used.

60. The computer device of claim 59, wherein any clustering algorithm is applied to determine statement pairs belong to the same statement pair group.

61. The computer device of claim 60, wherein in all QA pairs, there are invalid QA pairs caused by imprecise splitting, and circumstance in which answers are not pertinent to Date Recue/Date Received 202402-06 questions asked due to negligence of the customer service, wherein invalid QA
statement pairs are removed.

62. The computer device of claim 61, wherein filtration of the QA statement pairs is decided by matching degrees of questions and answers, wherein the QA statement pairs whose matching degrees between questioning statements and answering statements satisfy a preset condition remain, wherein the QA statement pairs whose matching degrees between questioning statements and answering statements do not satisfy are eliminated and filtered out.

63. The computer device of claim 62, wherein matching process is a text matching process, wherein a set of supervised algorithms are trained based on existing knowledge base data to perform similarity calculation.

64. The computer device of claim 63, wherein frequently asked questions have higher priorities to be maintained in the knowledge base.

65. The computer device of claim 64, wherein more important questions are preferentially maintained wherein some less valuable questions are neglected.

66. The computer device of claim 65, wherein frequencies by which questions are asked are measured by the number of questions under each cluster obtained.

67. The computer device of claim 66, wherein accuracy of answers are measured by matching degrees of questions and answers in a filtering process.

68. The computer device of claim 67, wherein the corresponding sorting weight is derived by normalizing two values and weighting and accumulating he two values.

69. The computer device of claim 68, wherein corresponding statement pairs are sequentially obtained according to sorting weights during subsequent maintenance of the knowledge base, and screened and processed manually or by machine, and maintained in the knowledge base.

Date Recue/Date Received 202402-06

70. A system comprising:
an obtaining module, configured to obtain a session record, wherein the session record includes at least two statements, wherein the statements include questioning statements sent by questioners and answering statements sent by answerers;
a splitting module, configured to:
split the session record into corresponding groups according to a preset splitting rule, wherein groups include at least one questioning statement and at least one answering statement;
split the groups into corresponding statement pairs according to a processing rule to which the groups correspond;
use, when a number of the answering statements included in the group does not exceed a first preset threshold and a number of the questioning statements as included exceeds the first preset threshold, a preset binary classifier to predict whether the questioning statements as included and an antecedent questioning statements of the questioning statements as included belong to a same question;
a judging module, configured to determine the processing rule to which the groups correspond according to the number of the questioning statements and the number of the answering statements included in the groups, wherein the processing nrle is based on the number of questioning statements and the number of answering statements as compared to the first preset threshold; and an updating module, configured to update a knowledge base of a system according to statement pairs.

71. The system of claim 70, wherein each statement has a corresponding generation time.

72. The system of claim 71, wherein the splitting module further comprises:

Date Recue/Date Received 202402-06 sequentially traversing the session record according to generation time of each statement;
judging, when the statement traversed is the questioning statement, whether a traversed questioning statement and an antecedent questioning statement of the traversed questioning statement belong to same group according to a sentence pattern of antecedent answering statement of the traversed questioning statement and/or according to an interval time to the antecedent questioning statement of the traversed questioning statement; and determining, when the statement traversed is the answering statement, that a traversed answering statement belongs to the group to which the antecedent questioning statement of the traversed answering statement corresponds.

73. The system of claim 72, wherein the splitting module further comprises:
splitting, when the number of the questioning statements included in the group does not exceed the first preset threshold, the questioning statements each into at least two text segments according to preset signs included in the questioning statements;
predicting whether two adjacent text segments belong to a same question by employing a preset binary classifier;
generating corresponding questioning statements respectively according to text segments predicted to belong to the same question; and generating the corresponding statement pairs according to all the questioning statements as generated and the answering statements included in the group.

74. The system of claim 73, further comprises:
traversing the text segments, and merging traversed text segments with corresponding posterior text segments when number of characters of the traversed text segments is smaller than a second preset threshold.

Date Recue/Date Received 202402-06

75. The system of claim 74, further comprises:
merging the traversed text segments with corresponding posterior text segments by employing a preset classifier algorithm when the traversed text segments and the corresponding posterior text segments belong to a same intent class or when the traversed text segments belong to a preset merging intent class.

76. The system of claim 75, wherein the splitting module further comprises:
merging, when there are the questioning statements that belong to the same question, the questioning statements that belong to the same question and generating the corresponding statement pairs according to all merged questioning statements and the answering statements; and generating the corresponding statement pairs according to all the questioning statements and the answering statements included in the group, when there are no questioning statements that belong to the same question.

77. The system of claim 76, wherein the splitting module further comprises:
combining, when the numbers of the questioning statements and the answering statements included in the group both exceed the first preset threshold, the questioning statements and the answering statements included in the group; and generating the corresponding statement pairs.

78. The system of claim 77, wherein the updating module further comprises:
clustering the statement pairs by using a preset clustering algorithm;
generating statement pair groups;
determining the number of the questioning statements included in each statement pair group;

Date Recue/Date Received 202402-06 deterinining matching degrees between the questioning statements and the answering statements included in the statement pair groups according to a preset similarity algorithm;
deterinining a weight to each statement pair group corresponds according to corresponding matching degrees and number of questioning statements included in the statement pair groups; and sequentially updating the knowledge base of the system according to the weight to which each statement pair group corresponds.

79. The system of claim 78, the splitting module further comprises:
rectifying any wrong word included in the session record according to a preset rectifying rule; and performing a normalizing process on rectified session record.

80. The system of claim 79, the splitting module further comprises:
recognizing an intent class to which each questioning statement included in the session record corresponds by employing the preset classifier algorithm and eliminating any questioning statement to which a preset irrelevant intent class corresponds as included in the session record.

81. The system of claim 80, further comprises a process of analyzing and mining dialogue statements between customer service and a user comprising:
obtaining a session record to be processed and preprocessing obtained session record;
and rectifying any wrong word included in the session record according to a preset rectifying rule.

82. The system of claim 81, further comprises:
Date Recue/Date Received 202402-06 performing a purification operation on all characters included in the session record; and recognizing a dialogue intent to which the questioning statement sent by each user corresponds by using the preset classifier algorithm.

83. The system of claim 82, further comprises:
traversing a dialogue according to a temporal sequence of generation times;
screening questioning statements to be processed of the user when the questioning statements to be processed are traversed, and eliminating any questioning statement to be processed of the user does not conform to a preset condition;
determining a questioning statement to be processed is to be merged with the antecedent questioning statement of the questioning statement to be processed according to a preset merging rule;
eliminating any answering statement whose number of characters is smaller than a preset number threshold when the answering statements of the customer service are traversed, and determining any answering statement remaining after the elimination as an answering statement to be processed;
merging the answering statement to be processed with an antecedent answering statement when the antecedent statement of the answering statement to be processed is an answering statement of the customer service, and storing the merged answering statement to the group to which the antecedent questioning statement corresponds; and merging the answering statement to be processed in the group to which the questioning statement corresponds when the antecedent statement of the answering statement to be processed is a questioning statement of the user.

84. The system of claim 83, further comprises splitting the questioning statements included in the group each into text segments when the group is of QA type;

Date Recue/Date Received 202402-06 processing the text segments from front to back, and merging any text segment whose number of characters is smaller than the preset number threshold in a posterior text segment of this text segment; and/or merging any text segment pertaining to the same intent class as the corresponding posterior text segment or pertaining to a preset merging intent class in the posterior text segment of this text segment;
sequentially obtaining a preset number of adjacent text segments through a sliding window, and predicting obtained text segments belong to a same and single question by means of a binary classifier algorithm;
traversing the questioning statements included in the group and judging each questioning statement and its antecedent questioning statement belong to the same question when the group is of a QQA type; and combining all questioning statements and answering statements in pairs to generate corresponding OA statement pairs when the group is of a QAQA type.

85. The system of claim 84, further comprises:
clustering the statements by employing a preset clustering algorithm, generating statement pair groups, and determining the number of questioning statements included in each statement pair group;
determining matching degrees between the questioning statements and the answering statements included in the statement pair groups according to a preset similarity algorithm; and determining a weight to which each statement pair group corresponds according to the corresponding matching degrees and the numbers of questioning statements included in the statement pair groups.

Date Recue/Date Received 202402-06

86. The system of claim 85, wherein the session record is directed to text statements, principal wrong words are homonyms.

87. The system of claim 86, wherein the session record is directed to a speech statements, the session record is firstly required to convert the speech statements into the text statements through speech recognition technique.

88. The system of claim 87, wherein language model and word frequency features are combined.

89. The system of claim 88, wherein corresponding rectifying rules are provided for the speech statements and the text statements respectively.

90. The system of claim 89, wherein a wrong words are rectified according to the corresponding rectifying rules.

91. The system of claim 90, wherein the purification operation includes removing irrelevant characters including preset useless punctuations and preset stop words, recognizing irrelevant information contained in each text statement including commodity names and placenames and normalizing the irrelevant information to corresponding preset characters according to which the irrelevant information corresponds.

92. The system of claim 91, the session record of the user with customer service within one day is a segment of the dialogue.

93. The system of claim 92, wherein the session record is split into one or more dialogues, and the dialogues are split into groups.

94. The system of claim 93, wherein the user consults same type of questions within a preset period of time, wherein the customer service has replied, the user consults different questions next time in the dialogue with the customer service.

95. The system of claim 94, wherein to eliminate any questioning statement with eliminable intent and irrelevant to business whose number of characters is smaller than a preset number threshold or intent is judged by the preset classifier algorithm as chitchat intent.

Date Recue/Date Received 202402-06

96. The system of claim 95, wherein the interval time between the antecedent questioning statement of the questioning statement to be processed and the questioning statement to be processed exceeds a corresponding preset time threshold and/or when the sentence pattern of the antecedent answering statement of the questioning statement to be processed is a preset sentence pattern, the questioning statement to be processed is merged with its antecedent questioning statement.

97. The system of claim 96, wherein the antecedent questioning statement is a questioning statement that is temporally antecedent to the statement to be processed and with a shortest interval time to the statement to be processed.

98. The system of claim 97, wherein the antecedent answering statement is an answering statement that is temporally antecedent to the statement to be processed and with the shortest interval time to the statement to be processed.

99. The system of claim 98, wherein the interval time between the antecedent questioning statement of the questioning statement to be processed and the questioning statement to be processed exceeds the corresponding preset time threshold, judge there is no relevancy between the antecedent questioning statement and the questioning statement to be processed, so a new group is generated according to the questioning statement to be processed.

100. The system of claim 99, wherein the corresponding preset time threshold is not exceeded, judge there is relevancy between the antecedent questioning statement and the questioning statement to be processed, and the questioning statement to be processed is merged in the group to which the antecedent questioning statement corresponds.

101. The system of claim 100, wherein the preset sentence patterns include statements that guide the user to further respond to responses made by the customer service, includes asks in reply by the customer service to indefinite expressions of users, or the sentence pattern asking for essential information from the user.

Date Recue/Date Received 202402-06

102. The system of claim 101, wherein the antecedent answering statement is of the preset sentence pattern, the statement sent by the user after the antecedent answering statement is made in reply to this antecedent answering statement and is relevant to the antecedent answering statement, and the questioning statement to be processed is merged with the antecedent questioning statement.

103. The system of claim 102, wherein the antecedent answering statement of the questioning statement to be processed is of the preset sentence pattern, the questioning statement to be processed is merged with the antecedent questioning statement, and the questioning statement to be processed is merged in the group to which the antecedent questioning statement corresponds.

104. The system of claim 103, wherein the antecedent statement is a statement that is temporally antecedent to the statement to be processed and with the shortest interval time to the statement to be processed.

105. The system of claim 104, wherein splitting the dialogue into groups, wherein result includes three types of groups comprising:
one question corresponds to a segment of reply, is marked as QA;
plural questions correspond to a segment of reply, wherein the user asks plural questions and the customer service replies with a segment of words, is marked as QQA;
and plural questions correspond to plural replies, wherein several rounds of communication are carried out between the user and the customer service in a short time, is marked as QAQA.

106. The system of claim 105, wherein corresponding type is determined according to number of answering statements and questioning statements included in each group, and is processed according to corresponding processing rule.
Date Recue/Date Received 202402-06

107. The system of claim 106, wherein QA is a standard input form of a algorithm, wherein one standard question is only meant to express one question in the knowledge base.

108. The system of claim 107, wherein an auxiliary algorithm is required during splitting to judge whether two segments of words are directed to one question or to two questions.

109. The system of claim 108, wherein the auxiliary algorithm is a binary classifier, wherein inputs to the binary classifier are two statements.

110. The system of claim 109, wherein any model realizes binary questions.

111. The system of claim 110, wherein model bert predicts during the process of pretraining whether input two statements are directed to context of the same and single statement or topics irrelevant to each other, serves as the classifier, and fine-tuning training is performed.

112. The system of claim 111, wherein the posterior text segment indicates a text segment following and immediately adjacent to the text segment being processed.

113. The system of claim 112, wherein the classifier algorithm merges any text segment judged as pertaining to the same intent class as the posterior text segment or pertaining to the preset merging intent class as a chitchat class in the posterior text segment.

114. The system of claim 113, wherein text segments predicted to belong to the same question is merged into one questioning statement.

115. The system of claim 114, wherein the text segments predicted to belong to different questions are split into two different questioning statements.

116. The system of claim 115, wherein the groups of the QA type not belonging to the same and single question are converted to groups of the QQA type, wherein the groups of the QA type whose all text segments belong to the same and single question are split into a QA statement pair only includes one questioning statement and one answering statement.

Date Recue/Date Received 202402-06

117. The system of claim 116, wherein to recognize to which circumstance a group of the QQA type specifically pertains, judge through a binary classification algorithm.

118. The system of claim 117, wherein the questioning statement is split into text segments, and any text segment whose number of characters included is smaller than the preset number threshold or pertaining to the preset merging intent is directly merged with the antecedent questioning statement, or the questioning statement and the antecedent questioning statement are input together in the binary classification algorithm to judge they belong to the same question.

119. The system of claim 118, wherein it is judged any text segment and corresponding antecedent questioning statement belong to the same question, the text statement and the corresponding antecedent questioning statement are merged into one questioning statement.

120. The system of claim 119, wherein it is recognized any text segment and the corresponding antecedent questioning statement do not belong to the same question, the questioning statement is split into new questioning statements.

121. The system of claim 120, wherein the statements remain are only one answering statement and one questioning statement, they are determined as the QA statement pair.

122. The system of claim 121, wherein the statements remain are more than one questioning statement and one answering statement, the questioning statements and the answering statements are combined in pairs to generate corresponding QA statement pairs.

123. The system of claim 122, wherein interaction between the user and the customer service in the short time is split into plural groups of QA statement pairs.

124. The system of claim 123, wherein the answering statements and the questioning statements are directly combined in pairs with respect to groups of the QAQA type.

125. The system of claim 124, wherein clustering is to incorporate similar questions together to constitute a cluster.

Date Recue/Date Received 202402-06

126. The system of claim 125, wherein calculate text distance metrics amongst the statement pairs via a text matching algorithm and determine the statement pairs belong to same statement pair group according to the text distance metrics.

127. The system of claim 126, wherein the text matching algorithm is an algorithm calculates similarity degree of two texts.

128. The system of claim 127, wherein an unsupervised text matching algorithm, word mover's distance (WMD), is used.

129. The system of claim 128, wherein any clustering algorithm is applied to detelinine statement pairs belong to the same statement pair group.

130. The system of claim 129, wherein in all QA pairs, there are invalid QA
pairs caused by imprecise splitting, and circumstance in which answers are not pertinent to questions asked due to negligence of the customer service, wherein invalid QA statement pairs are removed.

131. The system of claim 130, wherein filtration of the QA statement pairs is decided by matching degrees of questions and answers, wherein the QA statement pairs whose matching degrees between questioning statements and answering statements satisfy a preset condition remain, wherein the QA statement pairs whose matching degrees between questioning statements and answering statements do not satisfy are eliminated and filtered out.

132. The system of claim 131, wherein matching process is a text matching process, wherein a set of supervised algorithms are trained based on existing knowledge base data to perform similarity calculation.

133. The system of claim 132, wherein frequently asked questions have higher priorities to be maintained in the knowledge base.

134. The system of claim 133, wherein more important questions are preferentially maintained wherein some less valuable questions are neglected.

Date Recue/Date Received 202402-06

135. The system of claim 134, wherein frequencies by which questions are asked are measured by the number of questions under each cluster obtained.

136. The system of claim 135, wherein accuracy of answers are measured by matching degrees of questions and answers in a filtering process.

137. The system of claim 136, wherein the corresponding sorting weight is derived by normalizing two values and weighting and accumulating he two values.

138. The system of claim 137, wherein corresponding statement pairs are sequentially obtained according to sorting weights during subsequent maintenance of the knowledge base, and screened and processed manually or by machine, and maintained in the knowledge base.

139.A method comprising:
obtaining a session record, wherein the session record includes at least two statements, wherein the statements include questioning statements sent by questioners and answering statements sent by answerers;
splitting the session record into corresponding groups according to a preset splitting rule, wherein groups include at least one questioning statement and at least one answering statement;
deteunining a processing rule to which the groups correspond according to number of the questioning statements and number of the answering statements included in the groups, wherein the processing rule is based on the number of questioning statements and the number of answering statements as compared to a first preset threshold;
splitting the groups into corresponding statement pairs according to the processing rule to which the groups correspond;
using, when the number of the answering statements included in the group does not exceed the first preset threshold and the number of the questioning statements as included exceeds the first preset threshold, a preset binary classifier to predict whether Date Recue/Date Received 202402-06 the questioning statements as included and an antecedent questioning statements of the questioning statements as included belong to a same question; and updating a knowledge base of a system according to statement pairs.

140. The method of claim 139, wherein each statement has a corresponding generation time.

141. The method of claim 140, wherein splitting the session record into corresponding groups according to the preset splitting rule comprises:
sequentially traversing the session record according to generation time of each statement;
judging, when the statement traversed is the questioning statement, whether a traversed questioning statement and an antecedent questioning statement of the traversed questioning statement belong to same group according to a sentence pattern of antecedent answering statement of the traversed questioning statement and/or according to an interval time to the antecedent questioning statement of the traversed questioning statement; and determining, when the statement traversed is the answering statement, a traversed answering statement belongs to the group to which the antecedent questioning statement of the traversed answering statement corresponds.

142. The method of claim 141, wherein splitting the groups into the corresponding statement pairs according to the processing rule to which the groups correspond comprises:
splitting, when the number of the questioning statements included in the group does not exceed the first preset threshold, the questioning statements each into at least two text segments according to preset signs included in the questioning statements;
predicting two adjacent text segments belong to a same question by employing a preset binary classifier;
Date Recue/Date Received 202402-06 generating corresponding questioning statements respectively according to text segments predicted to belong to the same question; and generating the corresponding statement pairs according to all the questioning statements as generated and the answering statements included in the group.

143. The method of claim 142, further comprises:
traversing the text segments, and merging traversed text segments with corresponding posterior text segments when number of characters of the traversed text segments is smaller than a second preset threshold.

144. The method of claim 143, further comprises:
merging the traversed text segments with corresponding posterior text segments by employing a preset classifier algorithm when the traversed text segments and the corresponding posterior text segments belong to a same intent class or when the traversed text segments belong to a preset merging intent class.

145. The method of claim 144, wherein splitting the groups into the corresponding statement pairs according to the processing rule to which the groups correspond comprises:
combining, when the numbers of the questioning statements and the answering statements included in the group both exceed the first preset threshold, the questioning statements and the answering statements included in the group; and generating the corresponding statement pairs.

146. The method of claim 145, wherein splitting the groups into the corresponding statement pairs according to the processing rule to which the groups correspond comprises:
merging, when there are the questioning statements belong to the same question, the questioning statements belong to the same question and generating the corresponding Date Recue/Date Received 202402-06 statement pairs according to all merged questioning statements and the answering statements; and generating the corresponding statement pairs according to all the questioning statements and the answering statements included in the group, when there are no questioning statements belong to the same question.

147. The method of claim 146, wherein updating the knowledge base of the system according to the statement pairs comprises:
clustering the statement pairs by using a preset clustering algorithm;
generating statement pair groups;
determining the number of the questioning statements included in each statement pair group;
detemining matching degrees between the questioning statements and the answering statements included in the statement pair groups according to a preset similarity algorithm;
determining a weight to each statement pair group corresponds according to corresponding matching degrees and number of questioning statements included in the statement pair groups; and sequentially150 updating the knowledge base of the system according to the weight to which each statement pair group corresponds.

148. The method of claim 147, further comprises:
rectifying any wrong word included in the session record according to a preset rectifying rule; and performing a normalizing process on rectified session record.

Date Recue/Date Received 202402-06

149. The method of claim 148, further comprises:
recognizing the intent class to which each questioning statement included in the session record corresponds by using the preset classifier algorithm and eliminating any questioning statement to which a preset irrelevant intent class corresponds as included in the session record.

150. The method of claim 149, further comprises a process of analyzing and mining dialogue statements between customer service and a user comprising:
obtaining a session record to be processed and preprocessing obtained session record;
and rectifying any wrong word included in the session record according to a preset rectifying rule.

151. The method of claim 150, further comprises:
performing a purification operation on all characters included in the session record; and recognizing a dialogue intent to which the questioning statement sent by each user corresponds by using the preset classifier algorithm.

152. The method of claim 151, further comprises:
traversing a dialogue according to a temporal sequence of generation times;
screening questioning statements to be processed of the user when the questioning statements to be processed are traversed, and eliminating any questioning statement to be processed of the user does not conform to a preset condition;
determining a questioning statement to be processed is to be merged with the antecedent questioning statement of the questioning statement to be processed according to a preset merging rule;

eliminating any answering statement whose number of characters is smaller than a preset number threshold when the answering statements of the customer service are traversed, and determining any answering statement remaining after the elimination as an answering statement to be processed;
merging the answering statement to be processed with an antecedent answering statement when the antecedent statement of the answering statement to be processed is an answering statement of the customer service, and storing the merged answering statement to the group to which the antecedent questioning statement corresponds; and merging the answering statement to be processed in the group to which the questioning statement corresponds when the antecedent statement of the answering statement to be processed is a questioning statement of the user.

153. The method of claim 152, further comprises splitting the questioning statements included in the group each into text segments when the group is of a QA type;
processing the text segments from front to back, and merging any text segment whose number of characters is smaller than the preset number threshold in a posterior text segment of this text segment; and/or merging any text segment pertaining to the same intent class as the corresponding posterior text segment or pertaining to a preset merging intent class in the posterior text segment of this text segment;
sequentially obtaining a preset number of adjacent text segments through a sliding window, and predicting obtained text segments belong to a same and single question by means of a binary classifier algorithm;
traversing the questioning statements included in the group and judging each questioning statement and its antecedent questioning statement belong to the same question when the group is of a QQA type; an Date Recue/Date Received 202402-06 combining all questioning statements and answering statements in pairs to generate corresponding OA statement pairs when the group is of a QAQA type.

154. The method of claim 153 further comprises:
clustering the statements by employing a preset clustering algorithm, generating statement pair groups, and determining the number of questioning statements included in each statement pair group;
determining matching degrees between the questioning statements and the answering statements included in the statement pair groups according to a preset similarity algorithm; and detemrining a weight to which each statement pair group corresponds according to the corresponding matching degrees and the numbers of questioning statements included in the statement pair groups.

155. The method of claim 154, wherein the session record is directed to text statements, principal wrong words are homonyms.

156. The method of claim 155, wherein the session record is directed to a speech statements, the session record is firstly required to convert the speech statements into the text statements through speech recognition technique.

157. The method of claim 156, wherein language model and word frequency features are combined.

158. The method of claim 157, wherein corresponding rectifying rules are provided for the speech statements and the text statements respectively.

159. The method of claim 158, wherein a wrong words are rectified according to the corresponding rectifying rules.

160. The method of claim 159, wherein the purification operation includes removing irrelevant characters including preset useless punctuations and preset stop words, recognizing Date Recue/Date Received 202402-06 irrelevant information contained in each text statement including commodity names and placenames and normalizing the irrelevant information to corresponding preset characters according to which the irrelevant information corresponds.

161. The method of claim 160, the session record of the user with customer service within one day is a segment of the dialogue.

162. The method of claim 161, wherein the session record is split into one or more dialogues, and the dialogues are split into groups.

163. The method of claim 162, wherein the user consults same type of questions within a preset period of time, wherein the customer service has replied, the user consults different questions next time in the dialogue with the customer service.

164. The method of claim 163, wherein to eliminate any questioning statement with eliminable intent and irrelevant to business whose number of characters is smaller than a preset number threshold or intent is judged by the preset classifier algorithm as chitchat intent.

165. The method of claim 164, wherein the interval time between the antecedent questioning statement of the questioning statement to be processed and the questioning statement to be processed exceeds a corresponding preset time threshold and/or when the sentence pattern of the antecedent answering statement of the questioning statement to be processed is a preset sentence pattern, the questioning statement to be processed is merged with its antecedent questioning statement.

166. The method of claim 165, wherein the antecedent questioning statement is a questioning statement that is temporally antecedent to the statement to be processed and with a shortest interval time to the statement to be processed.

167. The method of claim 166, wherein the antecedent answering statement is an answering statement that is temporally antecedent to the statement to be processed and with the shortest interval time to the statement to be processed.

Date Recue/Date Received 202402-06

168. The method of claim 167, wherein the interval time between the antecedent questioning statement of the questioning statement to be processed and the questioning statement to be processed exceeds the corresponding preset time threshold, judge there is no relevancy between the antecedent questioning statement and the questioning statement to be processed, so a new group is generated according to the questioning statement to be processed.

169. The method of claim 168, wherein the corresponding preset time threshold is not exceeded, judge there is relevancy between the antecedent questioning statement and the questioning statement to be processed, and the questioning statement to be processed is merged in the group to which the antecedent questioning statement corresponds.

170. The method of claim 169, wherein the preset sentence patterns include statements that guide the user to further respond to responses made by the customer service, includes asks in reply by the customer service to indefinite expressions of users, or the sentence pattern asking for essential information from the user.

171. The method of claim 170, wherein the antecedent answering statement is of the preset sentence pattern, the statement sent by the user after the antecedent answering statement is made in reply to this antecedent answering statement and is relevant to the antecedent answering statement, and the questioning statement to be processed is merged with the antecedent questioning statement.

172. The method of claim 171, wherein the antecedent answering statement of the questioning statement to be processed is of the preset sentence pattern, the questioning statement to be processed is merged with the antecedent questioning statement, and the questioning statement to be processed is merged in the group to which the antecedent questioning statement corresponds.

173. The method of claim 172, wherein the antecedent statement is a statement that is temporally antecedent to the statement to be processed and with the shortest interval time to the statement to be processed.

Date Recue/Date Received 202402-06

174. The method of claim 173, wherein splitting the dialogue into groups, wherein result includes three types of groups comprising:
one question corresponds to a segment of reply, is marked as QA;
plural questions correspond to a segment of reply, wherein the user asks plural questions and the customer service replies with a segment of words, is marked as QQA;
and plural questions correspond to plural replies, wherein several rounds of communication are carried out between the user and the customer service in a short time, is marked as QAQA.

175. The method of claim 174, wherein corresponding type is determined according to number of answering statements and questioning statements included in each group, and is processed according to corresponding processing rule.

176. The method of claim 175, wherein QA is a standard input form of a algorithm, wherein one standard question is only meant to express one question in the knowledge base.

177. The method of claim 176, wherein an auxiliary algorithm is required during splitting to judge whether two segments of words are directed to one question or to two questions.

178. The method of claim 177, wherein the auxiliary algorithm is a binary classifier, wherein inputs to the binary classifier are two statements.

179. The method of claim 178, wherein any model realizes binary questions.

180. The method of claim 179, wherein model bed predicts during the process of pretraining whether input two statements are directed to context of the same and single statement or topics irrelevant to each other, serves as the classifier, and fine-tuning training is performed.

181. The method of claim 180, wherein the posterior text segment indicates a text segment following and immediately adjacent to the text segment being processed.

Date Recue/Date Received 202402-06

182. The method of claim 181, wherein the classifier algorithm merges any text segment judged as pertaining to the same intent class as the posterior text segment or pertaining to the preset merging intent class as a chitchat class in the posterior text segment.

183. The method of claim 182, wherein text segments predicted to belong to the same question is merged into one questioning statement.

184. The method of claim 183, wherein the text segments predicted to belong to different questions are split into two different questioning statements.

185. The method of claim 184, wherein the groups of the QA type not belonging to the same and single question are converted to groups of the QQA type, wherein the groups of the QA type whose all text segments belong to the same and single question are split into a QA statement pair only includes one questioning statement and one answering statement.

186. The method of claim 185, wherein to recognize to which circumstance a group of the QQA type specifically pertains, judge through a binary classification algorithm.

187. The method of claim 186, wherein the questioning statement is split into text segments, and any text segment whose number of characters included is smaller than the preset number threshold or pertaining to the preset merging intent is directly merged with the antecedent questioning statement, or the questioning statement and the antecedent questioning statement are input together in the binary classification algorithm to judge they belong to the same question.

188. The method of claim 187, wherein it is judged any text segment and corresponding antecedent questioning statement belong to the same question, the text statement and the corresponding antecedent questioning statement are merged into one questioning statement.

189. The method of claim 188, wherein it is recognized any text segment and the corresponding antecedent questioning statement do not belong to the same question, the questioning statement is split into new questioning statements.

Date Recue/Date Received 202402-06

190. The method of claim 189, wherein the statements remain are only one answering statement and one questioning statement, they are determined as the QA statement pair.

191. The method of claim 190, wherein the statements remain are more than one questioning statement and one answering statement, the questioning statements and the answering statements are combined in pairs to generate corresponding QA statement pairs.

192. The method of claim 191, wherein interaction between the user and the customer service in the short time is split into plural groups of QA statement pairs.

193. The method of claim 192, wherein the answering statements and the questioning statements are directly combined in pairs with respect to groups of the QAQA
type.

194. The method of claim 193, wherein clustering is to incorporate similar questions together to constitute a cluster.

195. The method of claim 194, wherein calculate text distance metrics amongst the statement pairs via a text matching algorithm and determine the statement pairs belong to same statement pair group according to the text distance metrics.

196. The method of claim 195, wherein the text matching algorithm is an algorithm calculates similarity degree of two texts.

197. The method of claim 196, wherein an unsupervised text matching algorithm, word mover's distance (WMD), is used.

198. The method of claim 197, wherein any clustering algorithm is applied to determine statement pairs belong to the same statement pair group.

199. The method of claim 198, wherein in all QA pairs, there are invalid QA
pairs caused by imprecise splitting, and circumstance in which answers are not pertinent to questions asked due to negligence of the customer service, wherein invalid QA statement pairs are removed.
Date Recue/Date Received 202402-06

200. The method of claim 199, wherein filtration of the QA statement pairs is decided by matching degrees of questions and answers, wherein the QA statement pairs whose matching degrees between questioning statements and answering statements satisfy a preset condition remain, wherein the QA statement pairs whose matching degrees between questioning statements and answering statements do not satisfy are eliminated and filtered out.

201. The method of claim 200, wherein matching process is a text matching process, wherein a set of supervised algorithms are trained based on existing knowledge base data to perform similarity calculation.

202. The method of claim 201, wherein frequently asked questions have higher priorities to be maintained in the knowledge base.

203. The method of claim 202, wherein more important questions are preferentially maintained wherein some less valuable questions are neglected.

204. The method of claim 203, wherein frequencies by which questions are asked are measured by the number of questions under each cluster obtained.

205. The method of claim 204, wherein accuracy of answers are measured by matching degrees of questions and answers in a filtering process.

206. The method of claim 205, wherein the corresponding sorting weight is derived by normalizing two values and weighting and accumulating he two values.

207. The method of claim 206, wherein corresponding statement pairs are sequentially obtained according to sorting weights during subsequent maintenance of the knowledge base, and screened and processed manually or by machine, and maintained in the knowledge base.

208. An electronic equipment comprising:
one or more processors;

Date Recue/Date Received 202402-06 a memory, associated with the one or more processors and used for storing a program instruction wherein the program instruction is executed by the one or more processors configured to:
obtain a session record, wherein the session record includes at least two statements, wherein the statements include questioning statements sent by questioners and answering statements sent by answerers;
split the session record into corresponding groups according to a preset splitting rule, wherein the groups include at least one questioning statement and at least one answering statement;
determine a processing rule to which the groups correspond according to a number of the questioning statements and a number of the answering statements included in the groups, wherein the processing rule is based on the number of questioning statements and the number of answering statements as compared to a first preset threshold;
split the groups into corresponding statement pairs according to the processing rule to which the groups correspond;
using, when the number of the answering statements included in the group does not exceed the first preset threshold and the number of the questioning statements as included exceeds the first preset threshold, a preset binary classifier to predict whether the questioning statements as included and an antecedent questioning statements of the questioning statements as included belong to a same question; and updating a knowledge base of a system according to the statement pairs.

209. The equipment of claim 208, wherein each statement has a corresponding generation time.

Date Recue/Date Received 202402-06

210. The equipment of claim 209, wherein splitting the session record into corresponding groups according to the preset splitting rule comprises:
sequentially traversing the session record according to generation time of each statement;
judging, when the statement traversed is the questioning statement, whether a traversed questioning statement and an antecedent questioning statement of the traversed questioning statement belong to same group according to a sentence pattern of antecedent answering statement of the traversed questioning statement and/or according to an interval time to the antecedent questioning statement of the traversed questioning statement; and determining, when the statement traversed is the answering statement, a traversed answering statement belongs to the group to which the antecedent questioning statement of the traversed answering statement corresponds.

211. The equipment of claim 210, wherein splitting the groups into the corresponding statement pairs according to the processing rule to which the groups correspond comprises:
splitting, when the number of the questioning statements included in the group does not exceed the first preset threshold, the questioning statements each into at least two text segments according to preset signs included in the questioning statements;
predicting two adjacent text segments belong to a same question by employing a preset binary classifier;
generating corresponding questioning statements respectively according to text segments predicted to belong to the same question; and generating the corresponding statement pairs according to all the questioning statements as generated and the answering statements included in the group.

212. The equipment of claim 211, further comprises:

Date Recue/Date Received 202402-06 traversing the text segments, and merging traversed text segments with corresponding posterior text segments when number of characters of the traversed text segments is smaller than a second preset threshold.

213. The equipment of claim 212, further comprises:
merging the traversed text segments with corresponding posterior text segments by employing a preset classifier algorithm when the traversed text segments and the corresponding posterior text segments belong to a same intent class or when the traversed text segments belong to a preset merging intent class.

214. The equipment of claim 213, wherein splitting the groups into the corresponding statement pairs according to the processing rule to which the groups correspond comprises:
combining, when the numbers of the questioning statements and the answering statements included in the group both exceed the first preset threshold, the questioning statements and the answering statements included in the group; and generating the corresponding statement pairs.

215. The equipment of claim 214, wherein splitting the groups into the corresponding statement pairs according to the processing rule to which the groups correspond comprises:
merging, when there are the questioning statements belong to the same question, the questioning statements belong to the same question and generating the corresponding statement pairs according to all merged questioning statements and the answering statements; and generating the corresponding statement pairs according to all the questioning statements and the answering statements included in the group, when there are no questioning statements belong to the same question.

216. The equipment of claim 215, wherein updating the knowledge base of the system according to the statement pairs comprises:

Date Recue/Date Received 202402-06 clustering the statement pairs by using a preset clustering algorithm;
generating statement pair groups;
determining the number of the questioning statements included in each statement pair group;
determining matching degrees between the questioning statements and the answering statements included in the statement pair groups according to a preset similarity algorithm;
determining a weight to each statement pair group corresponds according to corresponding matching degrees and number of questioning statements included in the statement pair groups; and sequentially150 updating the knowledge base of the system according to the weight to which each statement pair group corresponds.

217. The equipment of claim 216, further comprises:
rectifying any wrong word included in the session record according to a preset rectifying rule; and performing a normalizing process on rectified session record.

218. The equipment of claim 217, further comprises:
recognizing the intent class to which each questioning statement included in the session record corresponds by using the preset classifier algorithm and eliminating any questioning statement to which a preset irrelevant intent class corresponds as included in the session record.

219. The equipment of claim 218, further comprises a process of analyzing and mining dialogue statements between customer service and a user comprising:
Date Recue/Date Received 202402-06 obtaining a session record to be processed and preprocessing obtained session record;
and rectifying any wrong word included in the session record according to a preset rectifying rule.

220. The equipment of claim 219, further comprises:
perfoiming a purification operation on all characters included in the session record; and recognizing a dialogue intent to which the questioning statement sent by each user corresponds by using the preset classifier algorithm.

221. The equipment of claim 220, further comprises:
traversing a dialogue according to a temporal sequence of generation times;
screening questioning statements to be processed of the user when the questioning statements to be processed are traversed, and eliminating any questioning statement to be processed of the user does not conform to a preset condition;
determining a questioning statement to be processed is to be merged with the antecedent questioning statement of the questioning statement to be processed according to a preset merging rule;
eliminating any answering statement whose number of characters is smaller than a preset number threshold when the answering statements of the customer service are traversed, and determining any answering statement remaining after the elimination as an answering statement to be processed;
merging the answering statement to be processed with an antecedent answering statement when the antecedent statement of the answering statement to be processed is an answering statement of the customer service, and storing the merged answering statement to the group to which the antecedent questioning statement corresponds;

Date Recue/Date Received 202402-06 merging the answering statement to be processed in the group to which the questioning statement corresponds when the antecedent statement of the answering statement to be processed is a questioning statement of the user;

222. The equipment of claim 221, further comprises splitting the questioning statements included in the group each into text segments when the group is of a QA type;
processing the text segments from front to back, and merging any text segment whose number of characters is smaller than the preset number threshold in a posterior text segment of this text segment; and/or merging any text segment pertaining to the same intent class as the corresponding posterior text segment or pertaining to a preset merging intent class in the posterior text segment of this text segment;
sequentially obtaining a preset number of adjacent text segments through a sliding window, and predicting obtained text segments belong to a same and single question by means of a binary classifier algorithm;
traversing the questioning statements included in the group and judging each questioning statement and its antecedent questioning statement belong to the same question when the group is of a QQA type; and combining all questioning statements and answering statements in pairs to generate corresponding OA statement pairs when the group is of a QAQA type.

223. The equipment of claim 222 further comprises:
clustering the statements by employing a preset clustering algorithm, generating statement pair groups, and determining the number of questioning statements included in each statement pair group;

Date Recue/Date Received 202402-06 determining matching degrees between the questioning statements and the answering statements included in the statement pair groups according to a preset similarity algorithm; and determining a weight to which each statement pair group corresponds according to the corresponding matching degrees and the numbers of questioning statements included in the statement pair groups.

224. The equipment of claim 223, wherein the session record is directed to text statements, principal wrong words are homonyms.

225. The equipment of claim 224, wherein the session record is directed to a speech statements, the session record is firstly required to convert the speech statements into the text statements through speech recognition technique.

226. The equipment of claim 225, wherein language model and word frequency features are combined.

227. The equipment of claim 226, wherein corresponding rectifying rules are provided for the speech statements and the text statements respectively.

228. The equipment of claim 227, wherein a wrong words are rectified according to the corresponding rectifying rules.

229. The equipment of claim 228, wherein the purification operation includes removing irrelevant characters including preset useless punctuations and preset stop words, recognizing irrelevant infoimation contained in each text statement including commodity names and placenames and normalizing the irrelevant information to corresponding preset characters according to which the irrelevant information corresponds.

230. The equipment of claim 229, the session record of the user with customer service within one day is a segment of the dialogue.

Date Recue/Date Received 202402-06

231. The equipment of claim 230, wherein the session record is split into one or more dialogues, and the dialogues are split into groups.

232. The equipment of claim 231, wherein the user consults same type of questions within a preset period of time, wherein the customer service has replied, the user consults different questions next time in the dialogue with the customer service.

233. The equipment of claim 232, wherein to eliminate any questioning statement with eliminable intent and irrelevant to business whose number of characters is smaller than a preset number threshold or intent is judged by the preset classifier algorithm as chitchat intent.

234. The equipment of claim 233, wherein the interval time between the antecedent questioning statement of the questioning statement to be processed and the questioning statement to be processed exceeds a corresponding preset time threshold and/or when the sentence pattern of the antecedent answering statement of the questioning statement to be processed is a preset sentence pattern, the questioning statement to be processed is merged with its antecedent questioning statement.

235. The equipment of claim 234, wherein the antecedent questioning statement is a questioning statement that is temporally antecedent to the statement to be processed and with a shortest interval time to the statement to be processed.

236. The equipment of claim 235, wherein the antecedent answering statement is an answering statement that is temporally antecedent to the statement to be processed and with the shortest interval time to the statement to be processed.

237. The equipment of claim 236, wherein the interval time between the antecedent questioning statement of the questioning statement to be processed and the questioning statement to be processed exceeds the corresponding preset time threshold, judge there is no relevancy between the antecedent questioning statement and the questioning statement to be processed, so a new group is generated according to the questioning statement to be processed.

Date Recue/Date Received 202402-06

238. The equipment of claim 237, wherein the corresponding preset time threshold is not exceeded, judge there is relevancy between the antecedent questioning statement and the questioning statement to be processed, and the questioning statement to be processed is merged in the group to which the antecedent questioning statement corresponds.

239. The equipment of claim 238, wherein the preset sentence patterns include statements that guide the user to further respond to responses made by the customer service, includes asks in reply by the customer service to indefinite expressions of users, or the sentence pattern asking for essential information from the user.

240. The equipment of claim 239, wherein the antecedent answering statement is of the preset sentence pattern, the statement sent by the user after the antecedent answering statement is made in reply to this antecedent answering statement and is relevant to the antecedent answering statement, and the questioning statement to be processed is merged with the antecedent questioning statement.

241. The equipment of claim 240, wherein the antecedent answering statement of the questioning statement to be processed is of the preset sentence pattern, the questioning statement to be processed is merged with the antecedent questioning statement, and the questioning statement to be processed is merged in the group to which the antecedent questioning statement corresponds.

242. The equipment of claim 241, wherein the antecedent statement is a statement that is temporally antecedent to the statement to be processed and with the shortest interval time to the statement to be processed.

243. The equipment of claim 242, wherein splitting the dialogue into groups, wherein result includes three types of groups comprising:
one question corresponds to a segment of reply, is marked as QA;
Date Recue/Date Received 202402-06 plural questions correspond to a segment of reply, wherein the user asks plural questions and the customer service replies with a segment of words, is marked as QQA;
and plural questions correspond to plural replies, wherein several rounds of communication are carried out between the user and the customer service in a short time, is marked as QAQA.

244. The equipment of claim 243, wherein corresponding type is determined according to number of answering statements and questioning statements included in each group, and is processed according to corresponding processing rule.

245. The equipment of claim 244, wherein QA is a standard input foiiii of a algorithm, wherein one standard question is only meant to express one question in the knowledge base.

246. The equipment of claim 245, wherein an auxiliary algorithm is required during splitting to judge whether two segments of words are directed to one question or to two questions.

247. The equipment of claim 246, wherein the auxiliary algorithm is a binary classifier, wherein inputs to the binary classifier are two statements.

248. The equipment of claim 247, wherein any model realizes binary questions.

249. The equipment of claim 248, wherein model bert predicts during the process of pretraining whether input two statements are directed to context of the same and single statement or topics irrelevant to each other, serves as the classifier, and fine-tuning training is perfoimed.

250. The equipment of claim 249, wherein the posterior text segment indicates a text segment following and immediately adjacent to the text segment being processed.

251. The equipment of claim 250, wherein the classifier algorithm merges any text segment judged as pertaining to the same intent class as the posterior text segment or pertaining to the preset merging intent class as a chitchat class in the posterior text segment.

Date Recue/Date Received 202402-06

252. The equipment of claim 251, wherein text segments predicted to belong to the same question is merged into one questioning statement.

253. The equipment of claim 252, wherein the text segments predicted to belong to different questions are split into two different questioning statements.

254. The equipment of claim 253, wherein the groups of the QA type not belonging to the same and single question are converted to groups of the QQA type, wherein the groups of the QA type whose all text segments belong to the same and single question are split into a QA statement pair only includes one questioning statement and one answering statement.

255. The equipment of claim 254, wherein to recognize to which circumstance a group of the QQA type specifically pertains, judge through a binary classification algorithm.

256. The equipment of claim 255, wherein the questioning statement is split into text segments, and any text segment whose number of characters included is smaller than the preset number threshold or pertaining to the preset merging intent is directly merged with the antecedent questioning statement, or the questioning statement and the antecedent questioning statement are input together in the binary classification algorithm to judge they belong to the same question.

257. The equipment of claim 256, wherein it is judged any text segment and corresponding antecedent questioning statement belong to the same question, the text statement and the corresponding antecedent questioning statement are merged into one questioning statement.

258. The equipment of claim 257, wherein it is recognized any text segment and the corresponding antecedent questioning statement do not belong to the same question, the questioning statement is split into new questioning statements.

259. The equipment of claim 258, wherein the statements remain are only one answering statement and one questioning statement, they are determined as the QA
statement pair.

Date Recue/Date Received 202402-06

260. The equipment of claim 259, wherein the statements remain are more than one questioning statement and one answering statement, the questioning statements and the answering statements are combined in pairs to generate corresponding QA statement pairs.

261. The equipment of claim 260, wherein interaction between the user and the customer service in the short time is split into plural groups of QA statement pairs.

262. The equipment of claim 261, wherein the answering statements and the questioning statements are directly combined in pairs with respect to groups of the QAQA
type.

263. The equipment of claim 262, wherein clustering is to incorporate similar questions together to constitute a cluster.

264. The equipment of claim 263, wherein calculate text distance metrics amongst the statement pairs via a text matching algorithm and determine the statement pairs belong to same statement pair group according to the text distance metrics.

265. The equipment of claim 264, wherein the text matching algorithm is an algorithm calculates similarity degree of two texts.

266. The equipment of claim 265, wherein an unsupervised text matching algorithm, word mover's distance (WMD), is used.

267. The equipment of claim 266, wherein any clustering algorithm is applied to determine statement pairs belong to the same statement pair group.

268. The equipment of claim 267, wherein in all QA pairs, there are invalid QA
pairs caused by imprecise splitting, and circumstance in which answers are not pertinent to questions asked due to negligence of the customer service, wherein invalid QA statement pairs are removed.

269. The equipment of claim 268, wherein filtration of the QA statement pairs is decided by matching degrees of questions and answers, wherein the QA statement pairs whose matching degrees between questioning statements and answering statements satisfy a preset condition remain, wherein the QA statement pairs whose matching degrees between Date Recue/Date Received 202402-06 questioning statements and answering statements do not satisfy are eliminated and filtered out.

270. The equipment of claim 269, wherein matching process is a text matching process, wherein a set of supervised algorithms are trained based on existing knowledge base data to perform similarity calculation.

271. The equipment of claim 270, wherein frequently asked questions have higher priorities to be maintained in the knowledge base.

272. The equipment of claim 271, wherein more important questions are preferentially maintained wherein some less valuable questions are neglected.

273. The equipment of claim 272, wherein frequencies by which questions are asked are measured by the number of questions under each cluster obtained.

274. The equipment of claim 273, wherein accuracy of answers are measured by matching degrees of questions and answers in a filtering process.

275. The equipment of claim 274, wherein the corresponding sorting weight is derived by normalizing two values and weighting and accumulating he two values.

276. The equipment of claim 275, wherein corresponding statement pairs are sequentially obtained according to sorting weights during subsequent maintenance of the knowledge base, and screened and processed manually or by machine, and maintained in the knowledge base.

277.A computer readable physical memory having stored thereon, computer-executable instructions, when executed by a computer, the computer is configured to:
obtain a session record, wherein the session record includes at least two statements, wherein the statements include questioning statements sent by questioners and answering statements sent by answerers;

Date Recue/Date Received 202402-06 split the session record into corresponding groups according to a preset splitting rule, wherein the groups include at least one questioning statement and at least one answering statement;
deteimine a processing rule to which the groups correspond according to a number of the questioning statements and a number of the answering statements included in the groups, wherein the processing rule is based on the number of questioning statements and the number of answering statements as compared to a first preset threshold;
split the groups into corresponding statement pairs according to the processing rule to which the groups correspond;
using, when the number of the answering statements included in the group does not exceed the first preset threshold and the number of the questioning statements as included exceeds the first preset threshold, a preset binary classifier to predict whether the questioning statements as included and an antecedent questioning statements of the questioning statements as included belong to a same question; and updating a knowledge base of a system according to the statement pairs.

278. The memory of claim 277, wherein each statement has a corresponding generation time.

279. The memory of claim 278, wherein splitting the session record into corresponding groups according to the preset splitting rule comprises:
sequentially traversing the session record according to generation time of each statement;
judging, when the statement traversed is the questioning statement, whether a traversed questioning statement and an antecedent questioning statement of the traversed questioning statement belong to same group according to a sentence pattern of antecedent answering statement of the traversed questioning statement and/or according Date Recue/Date Received 202402-06 to an interval time to the antecedent questioning statement of the traversed questioning statement; and determining, when the statement traversed is the answering statement, a traversed answering statement belongs to the group to which the antecedent questioning statement of the traversed answering statement corresponds.

280. The memory of claim 279, wherein splitting the groups into the corresponding statement pairs according to the processing rule to which the groups correspond comprises:
splitting, when the number of the questioning statements included in the group does not exceed the first preset threshold, the questioning statements each into at least two text segments according to preset signs included in the questioning statements;
predicting two adjacent text segments belong to a same question by employing a preset binary classifier;
generating corresponding questioning statements respectively according to text segments predicted to belong to the same question; and generating the corresponding statement pairs according to all the questioning statements as generated and the answering statements included in the group.

281. The memory of claim 280, further comprises:
traversing the text segments, and merging traversed text segments with corresponding posterior text segments when number of characters of the traversed text segments is smaller than a second preset threshold;

282. The memory of claim 281, further comprises:
merging the traversed text segments with corresponding posterior text segments by employing a preset classifier algorithm when the traversed text segments and the Date Recue/Date Received 202402-06 corresponding posterior text segments belong to a same intent class or when the traversed text segments belong to a preset merging intent class.

283. The memory of claim 282, wherein splitting the groups into the corresponding statement pairs according to the processing rule to which the groups correspond comprises:
combining, when the numbers of the questioning statements and the answering statements included in the group both exceed the first preset threshold, the questioning statements and the answering statements included in the group; and generating the corresponding statement pairs.

284. The memory of claim 283, wherein splitting the groups into the corresponding statement pairs according to the processing rule to which the groups correspond comprises:
merging, when there are the questioning statements belong to the same question, the questioning statements belong to the same question and generating the corresponding statement pairs according to all merged questioning statements and the answering statements; and generating the corresponding statement pairs according to all the questioning statements and the answering statements included in the group, when there are no questioning statements belong to the same question.

285. The memory of claim 284, wherein updating the knowledge base of the system according to the statement pairs comprises:
clustering the statement pairs by using a preset clustering algorithm;
generating statement pair groups;
determining the number of the questioning statements included in each statement pair group;

Date Recue/Date Received 202402-06 determining matching degrees between the questioning statements and the answering statements included in the statement pair groups according to a preset similarity algorithm;
determining a weight to each statement pair group corresponds according to corresponding matching degrees and number of questioning statements included in the statement pair groups; and sequentially150 updating the knowledge base of the system according to the weight to which each statement pair group corresponds.

286. The memory of claim 285, further comprises:
rectifying any wrong word included in the session record according to a preset rectifying rule; and performing a normalizing process on rectified session record.

287. The memory of claim 286, further comprises:
recognizing the intent class to which each questioning statement included in the session record corresponds by using the preset classifier algorithm and eliminating any questioning statement to which a preset irrelevant intent class corresponds as included in the session record.

288. The memory of claim 287, further comprises a process of analyzing and mining dialogue statements between customer service and a user comprising:
obtaining a session record to be processed and preprocessing obtained session record;
and rectifying any wrong word included in the session record according to a preset rectifying rule.

289. The memory of claim 288, further comprises:

Date Recue/Date Received 202402-06 performing a purification operation on all characters included in the session record; and recognizing a dialogue intent to which the questioning statement sent by each user corresponds by using the preset classifier algorithm.

290. The memory of claim 289, further comprises:
traversing a dialogue according to a temporal sequence of generation times;
screening questioning statements to be processed of the user when the questioning statements to be processed are traversed, and eliminating any questioning statement to be processed of the user does not conform to a preset condition;
determining a questioning statement to be processed is to be merged with the antecedent questioning statement of the questioning statement to be processed according to a preset merging rule;
eliminating any answering statement whose number of characters is smaller than a preset number threshold when the answering statements of the customer service are traversed, and determining any answering statement remaining after the elimination as an answering statement to be processed;
merging the answering statement to be processed with an antecedent answering statement when the antecedent statement of the answering statement to be processed is an answering statement of the customer service, and storing the merged answering statement to the group to which the antecedent questioning statement corresponds; and merging the answering statement to be processed in the group to which the questioning statement corresponds when the antecedent statement of the answering statement to be processed is a questioning statement of the user.

291. The memory of claim 290, further comprises splitting the questioning statements included in the group each into text segments when the group is of a QA type;

Date Recue/Date Received 202402-06 processing the text segments from front to back, and merging any text segment whose number of characters is smaller than the preset number threshold in a posterior text segment of this text segment; and/or merging any text segment pertaining to the same intent class as the corresponding posterior text segment or pertaining to a preset merging intent class in the posterior text segment of this text segment;
sequentially obtaining a preset number of adjacent text segments through a sliding window, and predicting obtained text segments belong to a same and single question by means of a binary classifier algorithm;
traversing the questioning statements included in the group and judging each questioning statement and its antecedent questioning statement belong to the same question when the group is of a QQA type; and combining all questioning statements and answering statements in pairs to generate corresponding OA statement pairs when the group is of a QAQA type.

292. The memory of claim 291 further comprises:
clustering the statements by employing a preset clustering algorithm, generating statement pair groups, and determining the number of questioning statements included in each statement pair group;
determining matching degrees between the questioning statements and the answering statements included in the statement pair groups according to a preset similarity algorithm; and determining a weight to which each statement pair group corresponds according to the corresponding matching degrees and the numbers of questioning statements included in the statement pair groups.
Date Recue/Date Received 202402-06

293. The memory of claim 292, wherein the session record is directed to text statements, principal wrong words are homonyms.

294. The memory of claim 293, wherein the session record is directed to a speech statements, the session record is firstly required to convert the speech statements into the text statements through speech recognition technique.

295. The memory of claim 294, wherein language model and word frequency features are combined.

296. The memory of claim 295, wherein corresponding rectifying rules are provided for the speech statements and the text statements respectively.

297. The memory of claim 296, wherein a wrong words are rectified according to the corresponding rectifying rules.

298. The memory of claim 297, wherein the purification operation includes removing irrelevant characters including preset useless punctuations and preset stop words, recognizing irrelevant information contained in each text statement including commodity names and placenames and normalizing the irrelevant information to corresponding preset characters according to which the irrelevant information corresponds.

299. The memory of claim 298, the session record of the user with customer service within one day is a segment of the dialogue.

300. The memory of claim 299, wherein the session record is split into one or more dialogues, and the dialogues are split into groups.

301. The memory of claim 300, wherein the user consults same type of questions within a preset period of time, wherein the customer service has replied, the user consults different questions next time in the dialogue with the customer service.

302. The memory of claim 301, wherein to eliminate any questioning statement with eliminable intent and irrelevant to business whose number of characters is smaller than a preset number threshold or intent is judged by the preset classifier algorithm as chitchat intent.

Date Recue/Date Received 202402-06

303. The memory of claim 302, wherein the interval time between the antecedent questioning statement of the questioning statement to be processed and the questioning statement to be processed exceeds a corresponding preset time threshold and/or when the sentence pattern of the antecedent answering statement of the questioning statement to be processed is a preset sentence pattern, the questioning statement to be processed is merged with its antecedent questioning statement.

304. The memory of claim 303, wherein the antecedent questioning statement is a questioning statement that is temporally antecedent to the statement to be processed and with a shortest interval time to the statement to be processed.

305. The memory of claim 304, wherein the antecedent answering statement is an answering statement that is temporally antecedent to the statement to be processed and with the shortest interval time to the statement to be processed.

306. The memory of claim 305, wherein the interval time between the antecedent questioning statement of the questioning statement to be processed and the questioning statement to be processed exceeds the corresponding preset time threshold, judge there is no relevancy between the antecedent questioning statement and the questioning statement to be processed, so a new group is generated according to the questioning statement to be processed.

307. The memory of claim 306, wherein the corresponding preset time threshold is not exceeded, judge there is relevancy between the antecedent questioning statement and the questioning statement to be processed, and the questioning statement to be processed is merged in the group to which the antecedent questioning statement corresponds.

308. The memory of claim 307, wherein the preset sentence patterns include statements that guide the user to further respond to responses made by the customer service, includes asks in reply by the customer service to indefinite expressions of users, or the sentence pattern asking for essential information from the user.

Date Recue/Date Received 202402-06

309. The memory of claim 308, wherein the antecedent answering statement is of the preset sentence pattern, the statement sent by the user after the antecedent answering statement is made in reply to this antecedent answering statement and is relevant to the antecedent answering statement, and the questioning statement to be processed is merged with the antecedent questioning statement.

310. The memory of claim 309, wherein the antecedent answering statement of the questioning statement to be processed is of the preset sentence pattern, the questioning statement to be processed is merged with the antecedent questioning statement, and the questioning statement to be processed is merged in the group to which the antecedent questioning statement corresponds.

311. The memory of claim 310, wherein the antecedent statement is a statement that is temporally antecedent to the statement to be processed and with the shortest interval time to the statement to be processed.

312. The memory of claim 311, wherein splitting the dialogue into groups, wherein result includes three types of groups comprising:
one question corresponds to a segment of reply, is marked as QA;
plural questions correspond to a segment of reply, wherein the user asks plural questions and the customer service replies with a segment of words, is marked as QQA;
and plural questions correspond to plural replies, wherein several rounds of communication are carried out between the user and the customer service in a short time, is marked as QAQA.

313. The memory of claim 312, wherein corresponding type is determined according to number of answering statements and questioning statements included in each group, and is processed according to corresponding processing rule.

Date Recue/Date Received 202402-06

314. The memory of claim 313, wherein QA is a standard input form of a algorithm, wherein one standard question is only meant to express one question in the knowledge base.

315. The memory of claim 314, wherein an auxiliary algorithm is required during splitting to judge whether two segments of words are directed to one question or to two questions.

316. The memory of claim 315, wherein the auxiliary algorithm is a binary classifier, wherein inputs to the binary classifier are two statements.

317. The memory of claim 316, wherein any model realizes binary questions.

318. The memory of claim 317, wherein model bert predicts during the process of pretraining whether input two statements are directed to context of the same and single statement or topics irrelevant to each other, serves as the classifier, and fine-tuning training is performed.

319. The memory of claim 318, wherein the posterior text segment indicates a text segment following and immediately adjacent to the text segment being processed.

320. The memory of claim 319, wherein the classifier algorithm merges any text segment judged as pertaining to the sarne intent class as the posterior text segment or pertaining to the preset merging intent class as a chitchat class in the posterior text segment.

321. The memory of claim 320, wherein text segments predicted to belong to the same question is merged into one questioning statement.

322. The memory of claim 321, wherein the text segments predicted to belong to different questions are split into two different questioning statements.

323. The memory of claim 322, wherein the groups of the QA type not belonging to the same and single question are converted to groups of the QQA type, wherein the groups of the QA type whose all text segments belong to the same and single question are split into a QA statement pair only includes one questioning statement and one answering statement.

Date Recue/Date Received 202402-06

324. The memory of claim 323, wherein to recognize to which circumstance a group of the QQA type specifically pertains, judge through a binary classification algorithm.

325. The memory of claim 324, wherein the questioning statement is split into text segments, and any text segment whose number of characters included is smaller than the preset number threshold or pertaining to the preset merging intent is directly merged with the antecedent questioning statement, or the questioning statement and the antecedent questioning statement are input together in the binary classification algorithm to judge they belong to the same question.

326. The memory of claim 325, wherein it is judged any text segment and corresponding antecedent questioning statement belong to the same question, the text statement and the corresponding antecedent questioning statement are merged into one questioning statement.

327. The memory of claim 326, wherein it is recognized any text segment and the corresponding antecedent questioning statement do not belong to the same question, the questioning statement is split into new questioning statements.

328. The memory of claim 327, wherein the statements remain are only one answering statement and one questioning statement, they are determined as the QA
statement pair.

329. The memory of claim 328, wherein the statements remain are more than one questioning statement and one answering statement, the questioning statements and the answering statements are combined in pairs to generate corresponding QA statement pairs.

330. The memory of claim 329, wherein interaction between the user and the customer service in the short time is split into plural groups of QA statement pairs.

331. The memory of claim 330, wherein the answering statements and the questioning statements are directly combined in pairs with respect to groups of the QAQA
type.

332. The memory of claim 331, wherein clustering is to incorporate similar questions together to constitute a cluster.
Date Recue/Date Received 202402-06

333. The memory of claim 332, wherein calculate text distance metrics amongst the statement pairs via a text matching algorithm and determine the statement pairs belong to same statement pair group according to the text distance metrics.

334. The memory of claim 333, wherein the text matching algorithm is an algorithm calculates similarity degree of two texts.

335. The memory of claim 334, wherein an unsupervised text matching algorithm, word mover's distance (WMD), is used.

336. The memory of claim 335, wherein any clustering algorithm is applied to deterinine statement pairs belong to the same statement pair group.

337. The memory of claim 336, wherein in all QA pairs, there are invalid QA
pairs caused by imprecise splitting, and circumstance in which answers are not pertinent to questions asked due to negligence of the customer service, wherein invalid QA statement pairs are removed.

338. The memory of claim 337, wherein filtration of the QA statement pairs is decided by matching degrees of questions and answers, wherein the QA statement pairs whose matching degrees between questioning statements and answering statements satisfy a preset condition remain, wherein the QA statement pairs whose matching degrees between questioning statements and answering statements do not satisfy are eliminated and filtered out.

339. The memory of claim 338, wherein matching process is a text matching process, wherein a set of supervised algorithms are trained based on existing knowledge base data to perform similarity calculation.

340. The memory of claim 339, wherein frequently asked questions have higher priorities to be maintained in the knowledge base.

341. The memory of claim 340, wherein more important questions are preferentially maintained wherein some less valuable questions are neglected.

Date Recue/Date Received 202402-06

342. The memory of claim 341, wherein frequencies by which questions are asked are measured by the number of questions under each cluster obtained.

343. The memory of claim 342, wherein accuracy of answers are measured by matching degrees of questions and answers in a filtering process.

344. The memory of claim 343, wherein the corresponding sorting weight is derived by normalizing two values and weighting and accumulating he two values.

345. The memory of claim 344, wherein corresponding statement pairs are sequentially obtained accorcfing to sorting weights during subsequent maintenance of the knowledge base, and screened and processed manually or by machine, and maintained in the knowledge base.

Date Recue/Date Received 202402-06