CN112434140B

CN112434140B - Reply information processing method and system

Info

Publication number: CN112434140B
Application number: CN202011248098.4A
Authority: CN
Inventors: 邹凯涛; 金苗; 康海洋; 姚博; 刘宗孺; 李志为
Original assignee: Hangzhou Bolian Intelligent Technology Co ltd
Current assignee: Hangzhou Bolian Intelligent Technology Co ltd
Priority date: 2020-11-10
Filing date: 2020-11-10
Publication date: 2024-02-09
Anticipated expiration: 2040-11-10
Also published as: CN112434140A

Abstract

The invention relates to the technical field of Internet, in particular to a reply information processing method and system. The method comprises the following steps: when acquiring a problem to be processed of a current user, screening similar users based on user attribute data of the current user to acquire a similar user set; preprocessing the to-be-processed problem to extract a corresponding effective vocabulary set, and screening a historical problem set based on the effective vocabulary set to obtain an alternative problem set; and determining the historical problem with the highest correlation degree with the to-be-processed problem in the alternative problem set, and configuring corresponding reply information as alternative reply information. The method realizes searching similar questions and providing alternative answers by utilizing the characteristics of the questions, effectively improves the accuracy of automatic answers, and further improves the answer efficiency.

Description

Reply information processing method and system

Technical Field

The invention relates to the technical field of Internet, in particular to a reply information processing method and system.

Background

Currently, with the increasing convenience of online shopping, an increasing number of people tend to purchase goods and consume them in the network. With this, when the user encounters various related confusion problems, the user also tends to ask confusion and consult problems on the network. Against this background, automatic response systems have been developed that can help customer service personnel to quickly respond to user feedback and questions.

Currently existing automatic answer systems to user feedback/questions mainly use confirmation of user status (in, busy, out) and simple matching of keywords in user questions to automatically return preset answers.

While the automatic answer to answer by simple matching of keywords can answer questions faster, the questions of users cannot be effectively matched, and common answering and non-asking phenomena are caused.

It should be noted that the information of the present invention in the above background section is only for enhancing the understanding of the background of the present invention and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.

Disclosure of Invention

It is an object of the present invention to provide a reply information processing method and system that further overcome, at least in part, one or more of the problems due to the limitations and disadvantages of the related art.

Other features and advantages of the invention will be apparent from the following detailed description, or may be learned by the practice of the invention.

According to a first aspect of the present invention, there is provided a reply information processing method including:

when acquiring a problem to be processed of a current user, screening similar users based on user attribute data of the current user to acquire a similar user set;

preprocessing the to-be-processed problem to extract a corresponding effective vocabulary set, and screening a historical problem set based on the effective vocabulary set to obtain an alternative problem set;

and determining the historical problem with the highest correlation degree with the to-be-processed problem in the alternative problem set, and configuring corresponding reply information as alternative reply information.

According to a second aspect of the present invention, there is provided a reply information processing system comprising:

the system comprises a similar user set generation module, a user attribute data generation module and a user attribute data generation module, wherein the similar user set generation module is used for screening similar users based on the user attribute data of a current user to acquire a similar user set when acquiring a problem to be processed of the current user;

the candidate problem set generating module is used for preprocessing the problems to be processed to extract corresponding effective vocabulary sets, and screening the historical problem sets based on the effective vocabulary sets to obtain candidate problem sets;

and the alternative reply information generation module is used for determining the historical problem with the highest correlation degree with the to-be-processed problem in the alternative problem set and configuring corresponding reply information as alternative reply information.

According to the method provided by the technical scheme of the embodiment of the invention, the candidate question set is screened by utilizing the effective vocabulary contained in the question to be processed, so that the history question most relevant to the question to be processed of the current user can be selected from the candidate question set, and the answer information of the history question is used as the candidate answer information. Therefore, the method and the device can find similar questions and provide alternative answers by utilizing the characteristics of the questions, effectively improve the accuracy of automatic answers, and further improve the answer efficiency.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

Fig. 1 schematically illustrates a schematic diagram of a reply information processing method in an exemplary embodiment of the present invention;

fig. 2 schematically shows a flowchart of a reply information processing method in an exemplary embodiment of the present invention;

fig. 3 schematically shows a schematic block diagram of a reply information processing apparatus in an exemplary embodiment of the invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

Next, a reply information processing method in the present exemplary embodiment will be described in more detail with reference to the drawings and examples.

Referring to fig. 1, a reply information processing method provided in the present exemplary embodiment includes:

in step S1, when obtaining a problem to be processed of a current user, similar users are screened based on user attribute data of the current user to obtain a set of similar users.

In this example embodiment, the method described above may be applied to an application scenario of intelligent question-answering, and may recommend corresponding answer information to customer service personnel according to a question of a user. Taking a shopping platform as an example, after a user sends a problem to customer service personnel of a merchant through a terminal, the problem of the current user is taken as a problem to be processed at a background customer service end where the customer service personnel are located. After the problem to be processed is received, attribute information of the current user and other known users is provided, and a similar user set of the current user is screened through the attribute information.

Specifically, the screening the similar user set based on the user attribute data of the current user may include the following steps:

step S11, obtaining attribute information of the current user and obtaining attribute information of each target user in a target user set;

step S12, screening the set of similar users corresponding to the current user based on a result of similarity calculation between the attribute information of the current user and the attribute information of the target user.

For example, the user attributes may include user terminal device type, user age, channel type entered by the user, historical operating information, and so forth; in addition, user preferences, location information, etc. may also be included. In addition, for the known historical users, a historical user set may be generated in advance, where the historical user set may include attribute information of each known user, historical questions corresponding to each known user, and reply information of each historical question.

For the current user, similarity values can be calculated between the current user and each historical user in the historical user set respectively, and part of the historical users are screened by utilizing a preset similarity threshold value to generate a similar user set.

Specifically, a computing manner based on collaborative filtering of users may be used to filter a set of similar users for a current user. That is, the more the same items of attribute information of two users, the higher the similarity of the two users is determined. Specifically, the similarity between users may be calculated using a Jaccard formula or a cosine similarity formula. Wherein, the Jaccard formula is shown in formula 1:

the cosine similarity formula is shown in formula 2:

wherein w is _uv Representing the similarity between users u, v, N (u) represents the set of attributes of user u and N (v) represents the set of attributes of user v.

In step S2, the to-be-processed problem is preprocessed to extract a corresponding valid vocabulary set, and the history problem set is screened based on the valid vocabulary set to obtain an alternative problem set.

In this example embodiment, when a problem to be processed is obtained, the problem to be processed may be first preprocessed; for example, valid vocabulary reserved after the processing such as word segmentation and word stopping removal can be performed on the problem to be processed, and a valid vocabulary set corresponding to the problem to be processed is generated according to the valid vocabulary. Similarly, for each history problem in the history problem set, preprocessing may be performed in advance to obtain an effective vocabulary set corresponding to each history problem. So that valid words can be utilized to screen for alternative questions.

Specifically, the step S2 may further include:

step S21, preprocessing each history problem in the history problem set to determine an effective vocabulary set corresponding to each history problem;

step S22, calculating the similarity between the to-be-processed problem and each historical problem in the historical problem set based on the effective vocabulary set;

step S23, selecting a preset number of historical questions as the candidate question set according to the similarity calculation result.

Specifically, the above-mentioned history question set may include a question that has been answered and a question that has not been answered, that is, a question with answer information and a question without answer information. Each vocabulary in the active vocabulary set may be utilized as a question attribute for a corresponding question.

In addition, for the effective vocabulary, a plurality of vocabulary categories may be configured in advance, and different weight coefficients may be configured for different vocabulary categories. Furthermore, the weight coefficients of the vocabulary categories can be configured to be related to the questions. After the effective vocabulary set is obtained, the category corresponding to each effective vocabulary can be confirmed, and then the corresponding weight coefficient is determined. Specifically, finding a answered/unanswered question with high similarity to the question to be processed using a question-based collaborative filtering algorithm can be calculated by the following equation 3.

Wherein w is _qp Representing the similarity between questions Q, p, W (Q) representing the set of valid words for question Q, W (p) representing the set of valid words for question p, Q _x The weight of the vocabulary x is represented.

By the above formula 3, it is possible to implement the similarity between the calculation questions from the vocabulary included in the questions. The similarity threshold may be preset, or a threshold for screening the number of questions may be set, so that a certain number of questions may be screened as an alternative question set. Thereby enabling screening of related questions from the dimensions of the questions themselves.

In addition, when a collaborative filtering algorithm is used, a clustering algorithm such as K-means or DBSCAN may be used to classify user problems. The belonging group can be quickly found when new data is added, so that the approximate term is found in the same classification.

Based on the step S2 described above, in the present exemplary embodiment, when generating the candidate problem set, the method described above may further include: and acquiring a similar user history problem set corresponding to the similar user set, and sorting the history problems in the similar user history problem set according to a preset rule so as to screen the history problems according to a sorting result and add the history problems to the alternative problem set of the current user.

In this example embodiment, for each known user in the similar user set, a corresponding similar user historical problem set may be generated according to the historical problem and the reply information corresponding to each known user, and each historical problem in the set may be ranked. For example, the order may be based on the frequency of occurrence of the questions. Specifically, it may include:

step S241, counting the occurrence frequency of each history problem in the similar user history problem set, and sorting according to the occurrence frequency;

step S242, selecting a preset number of the history questions as the first candidate question set according to the sorting result.

Specifically, the similarity calculation can be performed on the historical problems in the similar user historical problem set, and the historical problems are classified and combined according to the calculation result. Alternatively, each history problem may be classified according to a preset problem type. Furthermore, the historical problems in the set can be ranked according to the calculation result, the occurrence frequency of each historical problem is counted, the historical problems are ranked according to the occurrence frequency, and then the ranking of similar users is achieved. According to the occurrence frequency of the historical problems, a preset number of historical problems with higher occurrence frequency are selected, and the historical problems are used as a first alternative problem set, so that screening of related problems from the dimension of a user is achieved. And the candidate question set screened by using the effective words is used as a second question set, and the first question set and the second question set are combined to generate a final candidate question set.

In step S3, the historical problem with the highest correlation degree with the to-be-processed problem in the candidate problem set is determined, and the corresponding reply information is configured as candidate reply information.

In this example embodiment, after the first candidate problem set and the second candidate problem set are obtained, the two sets may be combined, and the duplicate term may be deleted, to obtain a final candidate problem set. Alternatively, the first candidate problem set may be independently used as the final candidate problem set; alternatively, the second set of candidate questions is combined independently as the final candidate question.

For the resulting set of candidate questions, historical questions containing answer information may be first screened, and a set of questions partitioned. For the collection, the question information containing the answer information can be used as alternative answer information and displayed on the terminal device for selection by a user. In addition, unanswered historical questions may be partitioned into another set of questions.

Based on the foregoing, in this exemplary embodiment, the foregoing method may further include: and responding to the selection operation of the background user on the alternative reply information, and configuring the alternative reply information as the reply information of the to-be-processed problem.

For example, when the user selects any item of reply information from the candidate reply information, the reply information may be sent to the client as reply information of the question to be processed.

Alternatively, in other exemplary embodiments, the user may modify the selected alternative reply information at the terminal, and then send the modified reply information to the client as the reply information of the to-be-processed question.

Based on the foregoing, in this exemplary embodiment, the foregoing method may further include: calculating the similarity between the current user to-be-processed questions and other to-be-processed questions, and generating a list of answerable to-be-processed questions according to a similarity calculation result; and responding to the selection operation of the to-be-processed question list, and configuring the reply information of the to-be-processed questions of the current user as the reply information corresponding to the answerable to-be-processed question list.

For example, the similarity between each other to-be-processed question and the to-be-processed question of the current user can be calculated corresponding to-be-processed questions generated by other clients in the same period, and the questions are filtered according to the calculation result to generate no list of answerable to-be-processed questions. The other questions to be processed may be historical questions which are obtained through screening and are not replied; alternatively, it may be a new question which is not answered and which is generated within a preset time period. After replying to the current user's pending questions, the user may also choose to answer the questions in the replying list of pending questions using the same replies synchronously, thus realizing batch processing. After the answer is finished, all the questions of the answer are marked as the answered questions and stored together with the answers.

Referring to fig. 2, a user makes a question or feedback to a customer service person at a client, and for the current user, a corresponding problem to be processed may be generated. At this time, the user information of the user can be collected as user attribute data, and at the same time, the problem to be processed can be preprocessed by the method to obtain a corresponding effective vocabulary set. Therefore, the similarity calculation between the user information of the current user and all the user attributes can be performed by utilizing a collaborative filtering algorithm given to the user, and the candidate questions are screened from the user dimension. In addition, a collaborative filtering algorithm based on the problems is utilized to calculate the similarity between an effective vocabulary set of the problems to be processed and other historical problems based on a professional word stock, and candidate problems are screened from the dimension of the problems. Thus realizing the calculation of the two strategies to obtain the approximate problem. The approximate questions may include an approximate returned question and an approximate non-returned question. Based on the classification, relevant questions and response information can be displayed in an interactive interface of customer service personnel terminal equipment. The customer service personnel can manually select an approximate answer from the approximate replied questions, directly reply to the user and output an answer to the questions of the current user; alternatively, after modifying the selected approximate answer, an answer to the question is output and sent to the current user. When the answer is modified, the modified content may also be updated to the answered questions database. In addition, customer service personnel can also manually select whether to answer the similar unanswered questions or not at the terminal. Thereby effectively improving the answer efficiency to similar questions.

The method of the invention can comprehensively consider the user information and the current question information and select similar answered questions. Customer service personnel can select the same or similar question answer from the answers to be selected, and can quickly reply to the question with small modification. Meanwhile, according to the recovered problems, a list of similar problems which are not recovered is also selected, and the similar problems can be recovered uniformly at one time after the selection, so that the efficiency of customer service recovery of the problems is greatly improved. In addition, the replied questions will enter into updating the replied questions database, thereby continually optimizing the matching results.

Referring to fig. 3, a reply information processing system 30 provided in the present exemplary embodiment includes:

the similar user set generating module 301 is configured to, when acquiring a problem to be processed of a current user, screen similar users based on user attribute data of the current user to acquire a similar user set.

And the candidate question set generating module 302 is configured to pre-process the to-be-processed question to extract a corresponding valid vocabulary set, and screen the history question set based on the valid vocabulary set to obtain a candidate question set.

And the alternative reply information generation module 303 is configured to determine the historical problem with the highest correlation degree with the to-be-processed problem in the alternative problem set, and configure corresponding reply information as alternative reply information.

The specific details of the reply information processing system have been described in detail in the corresponding reply information processing method, and thus are not described here again.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

In particular, according to embodiments of the present invention, the processes described below with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts.

It should be noted that, the computer readable medium shown in the embodiments of the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims

1. A reply information processing method, characterized by comprising:

when acquiring a problem to be processed of a current user, calculating the similarity between the current user and each target user in a target user set based on user attribute data of the current user and attribute information of each target user in the target user set through a similarity calculation formula to acquire a similar user set, wherein the user attributes comprise user terminal equipment types, user ages, channel types entered by the user, historical operation information, user preferences and position information, and the similarity calculation formula comprises a Jaccard similarity formula and a cosine similarity formula;

acquiring a similar user history problem set corresponding to the similar user set; preprocessing the problem to be processed to extract a corresponding effective vocabulary set, and passing through a formula

Calculating the similarity between the to-be-processed problem and the historical problem, and screening out an alternative problem set from the historical problem set, wherein w is as follows _qp Representing the similarity between questions Q, p, W (Q) representing the set of valid words for question Q, W (p) representing the set of valid words for question p, Q _x The weight of the vocabulary x;

determining the historical problem with the highest correlation degree with the problem to be processed in the alternative problem set, and configuring corresponding reply information as alternative reply information;

wherein, after the similar user set is obtained, the method further comprises: acquiring a similar user history problem set corresponding to the similar user set, and sorting the history problems in the similar user history problem set according to a preset rule so as to screen the history problems according to a sorting result and add the history problems to an alternative problem set of the current user;

sorting the historical questions in the similar user historical question set according to a preset rule, so as to screen the historical questions according to a sorting result and add the historical questions to a first alternative question set of the current user, wherein the sorting comprises the following steps: counting the occurrence frequency of each historical problem in the similar user historical problem set, and sorting according to the occurrence frequency; selecting a preset number of historical questions as a first alternative question set of the alternative question sets according to the sorting result;

screening the historical problem set based on the effective vocabulary set to obtain an alternative problem set, including: preprocessing each history problem in the history problem set to determine an effective vocabulary set corresponding to each history problem; calculating the similarity between the to-be-processed problem and each historical problem in the historical problem set based on the effective vocabulary set; selecting a preset number of historical questions as a second alternative question set of the alternative question set according to the similarity calculation result;

after the first alternative problem set and the second alternative problem set are obtained, combining the two sets, deleting repeated items, and obtaining a final alternative problem set; or, the first candidate problem set is used as a final candidate problem set; alternatively, the second candidate problem set is taken as a final candidate problem set.

2. The method according to claim 1, wherein the method further comprises:

responding to the selection operation of the background user on the alternative reply information, wherein the alternative reply information is configured as the reply information of the to-be-processed problem or is configured as the reply information of the to-be-processed problem after being modified.

3. The method of claim 1, wherein the set of historical questions includes a answered historical question and an unanswered historical question.

4. The method according to claim 1, wherein the method further comprises:

and recognizing the vocabulary types of each effective vocabulary in the effective vocabulary set, and configuring the weight of each effective vocabulary according to the vocabulary type recognition result.

5. The method according to claim 2, wherein the method further comprises:

calculating the similarity between the current user to-be-processed questions and other to-be-processed questions, and generating a list of answerable to-be-processed questions according to a similarity calculation result;

and responding to the selection operation of the to-be-processed question list, and configuring the reply information of the to-be-processed questions of the current user as the reply information corresponding to the answerable to-be-processed question list.

6. A reply information processing system, comprising:

the system comprises a similar user set generation module, a similarity calculation module and a user processing module, wherein the similar user set generation module is used for calculating the similarity between a current user and a target user through a similarity calculation formula based on user attribute data of the current user and attribute information of each target user in the target user set when acquiring a problem to be processed of the current user so as to acquire a similar user set, the user attribute comprises a user terminal equipment type, a user age, a user entering channel type, historical operation information, user preference and position information, and the similarity calculation formula comprises a Jaccard similarity formula and a cosine similarity formula;

the alternative question set generating module is used for acquiring a similar user history question set corresponding to the similar user set, preprocessing the questions to be processed to extract a corresponding effective vocabulary set, and obtaining a corresponding effective vocabulary set according to a formula

the alternative reply information generation module is used for determining the historical problem with the highest correlation degree with the to-be-processed problem in the alternative problem set and configuring corresponding reply information as alternative reply information;

after the similar user set is obtained, a similar user history problem set corresponding to the similar user set is obtained, and history problems in the similar user history problem set are ordered according to a preset rule, so that the history problems are screened according to an ordering result and added to an alternative problem set of the current user;