CN115422214A

CN115422214A - Corpus updating method, apparatus, computer device, storage medium and product

Info

Publication number: CN115422214A
Application number: CN202211051620.9A
Authority: CN
Inventors: 章宗杰; 余振; 吴政楠; 殷富成
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-08-31
Filing date: 2022-08-31
Publication date: 2022-12-02

Abstract

The application relates to a corpus updating method, a corpus updating device, a computer device, a storage medium and a computer program product. The method comprises the following steps: acquiring an initial corpus of a specific service field, historical behavior data and a historical question-answer portrait corresponding to a user identifier in the specific service field; the historical behavior data, the historical question and answer portrait and the initial corpus comprise question and answer pairs; determining a target question-answer pair meeting preset conditions from the initial corpus, historical behavior data corresponding to the user identification and a historical question-answer portrait; the preset conditions comprise that the accuracy of the question-answer pairs is higher than a preset accuracy threshold value; and updating the initial corpus of the specific business field according to the target question-answer pair meeting the preset condition. The method and the device automatically update the initial corpus, solve the problem that experts in a specific business field need to spend a large amount of time to construct dictionaries in the specific business field in the traditional method, reduce the cost of updating the corpus, and improve the efficiency of updating the corpus.

Description

Corpus updating method, apparatus, computer device, storage medium and product

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a corpus updating method, apparatus, computer device, storage medium, and product.

Background

The intelligent question-answering system is a high-level form of the information retrieval system, a user can input a question in the intelligent question-answering system, the intelligent question-answering system accesses a corpus after receiving the question, and answers to the question are obtained from the corpus and output to the user. When a user performs business consultation in a specific business field, if the intelligent question-answering system can accurately provide consultation services for the user, the consultation efficiency can be obviously improved, and the labor cost of an enterprise is reduced.

When a user accesses the intelligent question-answering system, the intelligent question-answering system mainly matches answers of questions from a corpus so as to provide consultation services for the user. Thus, the corpus is crucial to an intelligent question-answering system.

However, the specific business domain corpus has a high specialty, and thus the specific business domain corpus also has a high specialty. The general dictionary is adopted to construct the corpus of the specific business field, so that the question and answer requirements of the specific business field cannot be met. Therefore, in the conventional art, it takes a lot of time for an expert who needs a specific business field to construct a dictionary of the specific business field, so that a corpus of the specific business field is updated based on the dictionary of the specific business field. However, in the conventional technology, the method of constructing a dictionary in a specific business field by an expert to update a corpus has the problems of high cost and low efficiency.

Disclosure of Invention

In view of the above, it is desirable to provide a corpus updating method, apparatus, computer device, storage medium and product, which can reduce cost and improve efficiency.

In a first aspect, the present application provides a corpus updating method. The method comprises the following steps:

acquiring an initial corpus of a specific service field, historical behavior data and a historical question-answer portrait corresponding to a user identifier in the specific service field; the historical behavior data, the historical question and answer portrait and the initial corpus comprise question and answer pairs;

determining a target question-answer pair meeting preset conditions from the initial corpus, historical behavior data corresponding to the user identification and a historical question-answer portrait; the preset condition comprises that the accuracy of the question-answer pair is higher than a preset accuracy threshold value;

and updating the initial corpus of the specific service field according to the target question-answer pair meeting the preset condition.

In one embodiment, the determining, from the initial corpus, historical behavior data and historical question-answer images corresponding to the user identifier, a target question-answer pair that meets a preset condition includes:

and determining a target question-answer pair meeting preset conditions from the initial corpus, historical behavior data corresponding to the user identification and a historical question-answer portrait through a collaborative filtering algorithm.

In one embodiment, the determining, by using a collaborative filtering algorithm, a target question-answer pair that meets a preset condition from the initial corpus, historical behavior data corresponding to the user identifier, and a historical question-answer portrait includes:

generating a first matrix according to historical behavior data and a historical question-answer portrait corresponding to the user identification; the first matrix is used for representing the corresponding relation between historical behavior data corresponding to the user identification and historical question and answer pairs in the historical question and answer portrait;

generating a second matrix according to the historical question-answer images corresponding to the user identifications and the initial corpus; the second matrix is used for representing the corresponding relation between the historical question-answer pairs in the historical question-answer portrait and the initial question-answer pairs in the initial corpus;

and determining a target question-answer pair meeting a preset condition from the first matrix and the second matrix through a collaborative filtering algorithm.

In one embodiment, the historical behavior data includes a historical question-answer pair corresponding to the user identifier and an accuracy rate of the historical question-answer pair; the historical question-answer portrait comprises a historical question-answer pair corresponding to the user identification and tags of historical questions in the historical question-answer pair;

generating a first matrix according to the historical behavior data and the historical question-answer portrait corresponding to the user identifier, wherein the generating of the first matrix comprises:

acquiring corresponding relations among historical question-answer pairs corresponding to the user identifications, tags of historical questions in the historical question-answer pairs and accuracy of the historical question-answer pairs from the historical behavior data and the historical question-answer portrait;

and generating a first matrix according to the corresponding relation among the historical question-answer pairs corresponding to the user identification, the labels of the historical questions in the historical question-answer pairs and the accuracy of the historical question-answer pairs.

In one embodiment, the generating a second matrix according to the historical question-answer images corresponding to the user identifiers and the initial corpus includes:

acquiring an initial question-answer pair from the initial corpus, and acquiring a corresponding relation between the initial question-answer pair and a historical question-answer pair corresponding to the user identifier in the historical question-answer picture;

and generating a second matrix according to the corresponding relation between the initial question-answer pair and the historical question-answer pair corresponding to the user identification in the historical question-answer picture.

In one embodiment, determining, by using a collaborative filtering algorithm, a target question-answer pair satisfying a preset condition from the first matrix and the second matrix includes:

sorting the historical question-answer pairs and the initial question-answer pairs in the first matrix and the second matrix by adopting a collaborative filtering algorithm according to the accuracy of the historical question-answer pairs to generate a sorting result;

and determining a target question-answer pair with accuracy higher than the preset accuracy threshold from the sequencing result.

In one embodiment, the obtaining historical behavior data corresponding to the user identifier in the specific service field includes:

acquiring a historical question-answer pair corresponding to the user identification from an intelligent question-answer system in the specific service field;

obtaining the scoring data of the user identification on the historical question-answer pair, and generating the accuracy of the historical question-answer pair according to the scoring data of the historical question-answer pair;

and generating historical behavior data corresponding to the user identification according to the historical question-answer pair corresponding to the user identification and the accuracy of the historical question-answer pair.

In one embodiment, the obtaining scoring data of the user identifier on the historical question and answer pairs includes:

if the scoring data of the user identification on the historical question-answer pairs is not obtained, obtaining the question times of the user identification on the question-answer pairs from the intelligent question-answer system;

and generating the scoring data of the historical question-answer pair according to the number of times of questions of the question-answer pair.

In one embodiment, the obtaining of the historical question-answer image corresponding to the user identifier in the specific service field includes:

acquiring a historical question-answer pair corresponding to the user identification from an intelligent question-answer system in the specific service field; historical answers in the historical question-answer pairs are determined based on matching of labels of historical questions and the initial question-answer pairs in the initial corpus;

and generating a historical question-answer picture corresponding to the user identification according to the historical question-answer pair corresponding to the user identification and the tags of the historical questions in the historical question-answer pair.

In a second aspect, the present application further provides a corpus updating apparatus. The device comprises:

the data acquisition module is used for acquiring an initial corpus of a specific business field, historical behavior data and a historical question-answer portrait corresponding to a user identifier in the specific business field; the historical behavior data, the historical question and answer portrait and the initial corpus comprise question and answer pairs;

the target question-answer pair determining module is used for determining a target question-answer pair meeting preset conditions from the initial corpus, historical behavior data corresponding to the user identification and a historical question-answer portrait; the preset condition comprises that the accuracy of the question-answer pair is higher than a preset accuracy threshold value;

and the initial corpus updating module is used for updating the initial corpus of the specific service field according to the target question-answer pair meeting the preset condition.

In a third aspect, the application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method in any of the embodiments of the first aspect described above when the processor executes the computer program.

In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method in any of the embodiments of the first aspect described above.

In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program that, when executed by a processor, performs the steps of the method in any of the embodiments of the first aspect described above.

The corpus updating method, the corpus updating device, the computer equipment, the storage medium and the computer program product are used for acquiring an initial corpus of a specific business field, historical behavior data and a historical question-answer portrait corresponding to a user identifier in the specific business field; the historical behavior data, the historical question-answer portrait and the initial corpus comprise question-answer pairs; determining a target question-answer pair meeting preset conditions from the initial corpus, historical behavior data corresponding to the user identification and a historical question-answer portrait; the preset conditions comprise that the accuracy of the question-answer pairs is higher than a preset accuracy threshold value; and updating the initial corpus of the specific business field according to the target question-answer pair meeting the preset condition. The method comprises the steps of determining a target question-answer pair meeting preset conditions from an acquired initial corpus of a specific business field, historical behavior data corresponding to a user identifier in the specific business field and a historical question-answer portrait, so that the initial corpus of the specific business field can be updated according to the target question-answer pair meeting the preset conditions, the whole process is to automatically update the initial corpus according to the initial corpus of the specific business field, the historical behavior data corresponding to the user identifier in the specific business field and the target question-answer pair meeting the preset conditions in the historical question-answer portrait, a dictionary of the specific business field is not required to be built by a specialist in the specific business field to update the corpus, and the problem that the specialist in the specific business field needs to spend a large amount of time to build the dictionary of the specific business field in the traditional method is avoided. Therefore, the cost of corpus updating is reduced, and the efficiency of corpus updating is improved.

Drawings

FIG. 1 is a diagram of an exemplary embodiment of a corpus update method;

FIG. 2 is a flow diagram illustrating a corpus update method according to one embodiment;

FIG. 3 is a schematic flow chart of the target question-answer pair determination step in one embodiment;

FIG. 4 is a schematic flow chart diagram illustrating the first matrix generation step in one embodiment;

FIG. 5 is a flowchart illustrating the second matrix generation step in one embodiment;

FIG. 6 is a flowchart of the target question-answer pair generation step in one embodiment;

FIG. 7 is a flowchart illustrating the historical behavior data generation step in one embodiment;

FIG. 8 is a schematic flowchart of the scoring data obtaining step in one embodiment;

FIG. 9 is a schematic flow chart diagram illustrating the historical question answering representation generation step in one embodiment;

FIG. 10 is a flow diagram illustrating a corpus update method in accordance with an exemplary embodiment;

FIG. 11 is a schematic flow chart of intelligent question answering in one embodiment;

FIG. 12 is a flowchart illustrating a method for automatically updating a corpus in an exemplary embodiment;

FIG. 13 is a block diagram illustrating an exemplary corpus update apparatus;

FIG. 14 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The intelligent question-answering system is an advanced form of an information retrieval system, a user can input a question in the intelligent question-answering system, the intelligent question-answering system accesses a corpus after receiving the question, obtains an answer of the question from the corpus and outputs the answer to the user. When a user performs business consultation in a specific business field, if the intelligent question-answering system can accurately provide consultation service for the user, the consultation efficiency can be obviously improved, and the labor cost of an enterprise is reduced.

The current intelligent question-answering system mainly comprises: 1) The task type question-answering system can output question-answering information to the user in a multi-turn question-answering mode; 2) The system comprises a search-type question-answering system, a query-type question-answering system and a query-type question-answering system, wherein the search-type question-answering system can continuously train a context model by adopting a large amount of user consultation data according to a specific rule set and a self-learning context model to generate a trained model; when a user has a question and answer requirement, inputting a question into a trained model, searching question and answer information corresponding to the question based on the question, and outputting the question and answer information to the user; 3) A question-answering system in which question-answering information can be accurately output for a question of a user on the assumption that the question-answering system can communicate with the user without hindrance. However, due to the technical limitations and the lack of the real corpus at the present stage, the question-answering system basically has only a crude and hard question-answering effect, and thus is an ideal model.

When a user accesses the intelligent question-answering system, the current intelligent question-answering system mainly matches answers of questions from a corpus, so that consultation services are provided for the user. Thus, the corpus is crucial for an intelligent question-answering system.

However, the specific business domain corpus has a high specialty, and thus the specific business domain corpus also has a high specialty. The general dictionary is adopted to construct the corpus of the specific business field, so that the question and answer requirements of the specific business field cannot be met. Therefore, in the conventional art, it takes a lot of time for an expert who needs a specific business field to construct a dictionary of the specific business field, so that a corpus of the specific business field is updated based on the dictionary of the specific business field. However, in the conventional technology, the method of constructing the dictionary of the specific business field by the expert to update the corpus has the problems of high cost and low efficiency.

The corpus updating method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. The terminal 102 communicates with the server 104 through a network, and the terminal 102 may transmit data such as historical behavior data and a historical question and answer figure corresponding to the user identifier to the server 104 through the network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. The server 104 acquires an initial corpus of a specific business field, historical behavior data corresponding to a user identifier in the specific business field, and a historical question-answer sketch; the historical behavior data, the historical question and answer portrait and the initial corpus comprise question and answer pairs; determining a target question-answer pair meeting preset conditions from the initial corpus, historical behavior data corresponding to the user identification and a historical question-answer portrait; the preset conditions comprise that the accuracy of the question-answer pairs is higher than a preset accuracy threshold value; and updating the initial corpus of the specific business field according to the target question-answer pair meeting the preset condition. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.

In one embodiment, as shown in fig. 2, a corpus updating method is provided, which is described by taking the method as an example applied to the server 104 in fig. 1, and includes the following steps:

step 220, acquiring an initial corpus of a specific service field, historical behavior data and a historical question-answer portrait corresponding to a user identifier in the specific service field; the historical behavioral data, the historical query-response sketch, and the initial corpus include query-response pairs.

Specifically, the server 104 may obtain an initial corpus of a specific business domain and historical behavior data and historical question-answer sketch corresponding to a user identifier in the specific business domain. Optionally, the initial corpus in the specific service field and the historical behavior data and the historical question and answer portrait corresponding to the user identifier in the specific service field may be obtained from the historical data set, or the initial corpus in the specific service field and the historical behavior data and the historical question and answer portrait corresponding to the user identifier in the specific service field may be finally obtained by recording the data of each question and answer of the user in real time, which is not limited in this application. The specific business field refers to a business field with high specialty and more professional words used by the user during intelligent question answering, such as a financial business field and a medical business field. The historical behavior data, the historical Question-Answer portrait, and the initial corpus all include Question-Answer pairs (QAP), where a Question-Answer Pair (QAP) includes a Question and an Answer corresponding to the Question.

Step 240, determining a target question-answer pair meeting preset conditions from the initial corpus, historical behavior data corresponding to the user identification and a historical question-answer portrait; the preset condition comprises that the accuracy of the question-answer pair is higher than a preset accuracy threshold value.

Specifically, the server 104 may determine a target question-answer pair satisfying a preset condition from the initial corpus, the historical behavior data corresponding to the user identifier, and the historical question-answer portrait. The preset conditions comprise that the accuracy of the question-answer pairs is higher than a preset accuracy threshold value, the target question-answer pairs meeting the preset conditions comprise an initial corpus and historical behavior data corresponding to the user identification, and the accuracy of the question-answer pairs in the historical question-answer images is higher than that of the question-answer pairs with the preset accuracy threshold value. Optionally, the target question-answer pair meeting the preset condition may be determined by a machine learning method, where the machine learning method may be an algorithm such as a random forest algorithm, a deep forest algorithm, a collaborative filtering algorithm, and the like, and of course, the present application does not limit this.

And step 260, updating the initial corpus of the specific business field according to the target question-answer pair meeting the preset conditions.

Specifically, according to the target question-answer pair meeting the preset condition, the selected target question-answer pair is used for replacing the initial question-answer pair in the initial corpus, the selected target question-answer pair is used as a new question-answer pair in the initial corpus to obtain a new corpus, and therefore the initial corpus in the specific business field is updated.

In the corpus updating method, an initial corpus of a specific service field, historical behavior data and a historical question-answer portrait corresponding to a user identifier in the specific service field are obtained; the historical behavior data, the historical question and answer portrait and the initial corpus comprise question and answer pairs; determining a target question-answer pair meeting preset conditions from the initial corpus, historical behavior data corresponding to the user identification and a historical question-answer portrait; the preset conditions comprise that the accuracy of the question-answer pairs is higher than a preset accuracy threshold value; and updating the initial corpus of the specific business field according to the target question-answer pair meeting the preset condition. According to the embodiment of the application, the target question-answer pair meeting the preset condition is determined from the obtained initial corpus of the specific business field, the historical behavior data corresponding to the user identification in the specific business field and the historical question-answer portrait, so that the initial corpus of the specific business field can be updated according to the target question-answer pair meeting the preset condition, the automatic updating of the initial corpus is realized according to the initial corpus of the specific business field, the historical behavior data corresponding to the user identification in the specific business field and the target question-answer pair meeting the preset condition in the historical question-answer portrait, a dictionary of the specific business field is not required to be built by a specialist in the specific business field to update the corpus, and the problem that the specialist in the specific business field needs to spend a large amount of time to build the dictionary of the specific business field in the traditional method is avoided. Therefore, the cost of corpus updating is reduced, and the efficiency of corpus updating is improved.

In the above embodiment, a target question-answer pair satisfying a preset condition is determined from an initial corpus, historical behavior data corresponding to a user identifier, and a historical question-answer portrait, and a specific method thereof is described below. In one embodiment, determining a target question-answer pair satisfying a preset condition from an initial corpus, historical behavior data corresponding to a user identifier, and a historical question-answer portrait includes:

Specifically, through a collaborative filtering algorithm, the priorities of the question-answer pairs in the initial corpus, the historical behavior data corresponding to the user identifier and the historical question-answer images can be calculated, the priorities of the question-answer pairs can represent the accuracy of the question-answer pairs, and the target question-answer pairs meeting preset conditions are screened out from the historical behavior data and the historical question-answer images corresponding to the initial corpus, the user identifier and the historical behavior data according to the calculated priorities of the question-answer pairs, namely the question-answer pairs with the accuracy higher than a preset accuracy threshold value are selected out from the initial corpus, the historical behavior data corresponding to the user identifier and the historical question-answer images. The basic idea of the collaborative filtering algorithm is to recommend an item to a user according to previous preferences of the user and selections of users with similar interests, and generally, item recommendation can be performed based on only user behavior data (such as evaluation, purchase, download, and the like) without depending on any additional information (such as characteristics of the item itself) or any additional information of the user (such as age, gender, and the like). Certainly, a target question-answer pair meeting a preset condition may also be determined from the initial corpus, historical behavior data corresponding to the user identifier, and a historical question-answer portrait through another filtering algorithm or another machine learning algorithm, which is not limited in the present application.

In the embodiment, the target question-answer pair meeting the preset condition is determined from the initial corpus, the historical behavior data corresponding to the user identifier and the historical question-answer portrait through the collaborative filtering algorithm, and then the initial corpus in the specific business field can be automatically updated according to the target question-answer pair meeting the preset condition, so that a dictionary in the specific business field is not required to be built by an expert in the specific business field to update the corpus, the problem that the expert in the specific business field needs to spend a large amount of time to build the dictionary in the specific business field in the traditional method is avoided, the cost for updating the corpus is reduced, and the efficiency for updating the corpus is improved.

In the above embodiment, a target question-answer pair satisfying a preset condition is determined from the initial corpus, the historical behavior data corresponding to the user identifier, and the historical question-answer portrait by using the collaborative filtering algorithm, and a specific method thereof is described below. In one embodiment, as shown in fig. 3, determining, by using a collaborative filtering algorithm, a target question-answer pair satisfying a preset condition from the historical behavior data and the historical question-answer portrait corresponding to the initial corpus and the user identifier includes:

step 320, generating a first matrix according to the historical behavior data and the historical question-answer portrait corresponding to the user identification; the first matrix is used for representing the corresponding relation between historical behavior data corresponding to the user identification and historical question and answer pairs in the historical question and answer portrait.

Specifically, according to the obtained historical behavior data and the historical question and answer portrait corresponding to the user identifier, the corresponding relation between the historical question and answer pair in the historical behavior data corresponding to the user identifier and the historical question and answer portrait in the historical question and answer portrait corresponding to the user identifier is found, and therefore the first matrix is generated. The first matrix is used for representing the corresponding relation between historical behavior data corresponding to the user identification and historical question and answer pairs in the historical question and answer portrait.

Step 340, generating a second matrix according to the historical question answering portrait corresponding to the user identification and the initial corpus; the second matrix is used for representing the corresponding relation between the historical question-answer pairs in the historical question-answer portrait and the initial question-answer pairs in the initial corpus.

Specifically, according to the obtained historical question-answer portrait corresponding to the user identifier and the initial corpus, the corresponding relationship between the historical question-answer pairs in the historical question-answer portrait corresponding to the user identifier and the initial question-answer pairs in the initial corpus is found, so that the second matrix can be generated according to the corresponding relationship between the historical question-answer pairs in the historical question-answer portrait corresponding to the user identifier and the initial question-answer pairs in the initial corpus. The second matrix is used for representing the corresponding relation between the historical question and answer pairs in the historical question and answer portrait and the initial question and answer pairs in the initial corpus.

And step 360, determining target question-answer pairs meeting preset conditions from the first matrix and the second matrix through a collaborative filtering algorithm.

Specifically, through the collaborative filtering algorithm, the historical behavior data corresponding to the initial corpus and the user identifier, all corresponding initial question-answer pairs in the historical question-answer portrait, and the priorities of the historical question-answer pairs can be calculated from the historical behavior data corresponding to the user identifier, the first matrix representing the corresponding relationship between the historical question-answer pairs in the historical question-answer portrait, and the second matrix representing the corresponding relationship between the historical question-answer pairs in the historical question-answer portrait and the initial question-answer pairs in the initial corpus. Because the priority of the initial question-answer pair and the priority of the historical question-answer pair are characterized by the accuracy of the question-answer pair, the target question-answer pair meeting the preset conditions can be screened from the historical behavior data and the historical question-answer portrait corresponding to the initial corpus and the user identifier according to the calculated priority of the question-answer pair, namely the question-answer pair with the accuracy higher than the preset accuracy threshold value in the historical behavior data and the historical question-answer portrait corresponding to the initial corpus and the user identifier is selected. Certainly, a target question-answer pair meeting the preset condition may also be determined from the first matrix and the second matrix through other filtering algorithms or other machine learning algorithms, which is not limited in the present application.

In the embodiment, a first matrix is generated according to historical behavior data and a historical question-answer portrait corresponding to a user identifier; the first matrix is used for representing the corresponding relation between historical behavior data corresponding to the user identification and historical question and answer pairs in the historical question and answer portrait; generating a second matrix according to the historical question answering portrait corresponding to the user identification and the initial corpus; the second matrix is used for representing the corresponding relation between the historical question-answer pairs in the historical question-answer portrait and the initial question-answer pairs in the initial corpus; and determining a target question-answer pair meeting a preset condition from the first matrix and the second matrix through a collaborative filtering algorithm. According to the embodiment of the application, historical behavior data corresponding to a user identification and the corresponding relation between historical question-answer pairs in a historical question-answer portrait are obtained through a first matrix, and then the corresponding relation between the historical question-answer pairs in the historical question-answer portrait and the initial question-answer pairs in an initial corpus is obtained through a second matrix, so that the corresponding relation between all the question-answer pairs can be determined through a collaborative filtering algorithm according to the corresponding relation between the first matrix and the second matrix, the accuracy of all the question-answer pairs is calculated, and therefore the target question-answer pairs meeting preset conditions can be accurately screened out according to the accuracy of all the question-answer pairs.

The above embodiment describes that the first matrix is generated according to the historical behavior data and the historical question-answer portrait corresponding to the user identifier, and a specific method thereof is described below. In one embodiment, as shown in fig. 4, the historical behavior data includes historical question-answer pairs corresponding to the user identifications and accuracy rates of the historical question-answer pairs; the historical question-answer portrait comprises a historical question-answer pair corresponding to the user identification and tags of historical questions in the historical question-answer pair;

the generating of the first matrix according to the historical behavior data and the historical question-answer portrait corresponding to the user identifier includes:

and step 420, acquiring corresponding relations among historical question and answer pairs corresponding to the user identifications, tags of historical questions in the historical question and answer pairs, and accuracy rates of the historical question and answer pairs from the historical behavior data and the historical question and answer images.

Specifically, the historical question and answer pairs corresponding to the user identifiers and the accuracy of the historical question and answer pairs are obtained from the obtained historical behavior data. The accuracy of the historical question-answer pairs can be obtained through the scoring data of the historical question-answer pairs received by the server. And acquiring historical question-answer pairs corresponding to the user identification and tags of historical questions in the historical question-answer pairs from the acquired historical question-answer images. The tags of the historical questions in the historical question-answering pair are tag information in the question text extracted by performing word segmentation on the question text by using the question text input by the user as a single document through the server 104 so as to analyze the question text. Furthermore, the corresponding relation between the historical question and answer pair corresponding to the user identifier, the label of the historical question in the historical question and answer pair, and the accuracy of the historical question and answer pair can be obtained according to the historical behavior data and the corresponding historical question and answer pair in the historical question and answer portrait.

Step 440, generating a first matrix according to the corresponding relationship among the historical question-answer pair corresponding to the user identifier, the tags of the historical questions in the historical question-answer pair, and the accuracy of the historical question-answer pair.

Specifically, since the historical question-answer pair corresponding to the user identifier includes the historical question corresponding to the user identifier and the historical answer corresponding to the historical question corresponding to the user identifier, the server 104 may obtain, according to the corresponding relationship among the historical question-answer pair corresponding to the user identifier, the label of the historical question in the historical question-answer pair, and the accuracy of the historical question-answer pair, the corresponding relationship among the historical question corresponding to the user identifier, the historical answer corresponding to the historical question corresponding to the user identifier, the label of the historical question corresponding to the user identifier, and the accuracy of the historical question-answer pair, thereby generating the first matrix.

In the embodiment, the corresponding relation among the historical question-answer pairs corresponding to the user identifiers, the labels of the historical questions in the historical question-answer pairs and the accuracy of the historical question-answer pairs is obtained from the historical behavior data and the historical question-answer images; and generating a first matrix according to the corresponding relation among the historical question-answer pairs corresponding to the user identification, the labels of the historical questions in the historical question-answer pairs and the accuracy of the historical question-answer pairs. According to the embodiment of the application, based on the historical question-answer pair corresponding to the user identification, the label of the historical question in the historical question-answer pair and the accuracy of the historical question-answer pair, the corresponding relation among the historical question-answer pair corresponding to the user identification, the label of the historical question in the historical question-answer pair and the accuracy of the historical question-answer pair is found, so that the first matrix can be accurately constructed according to the corresponding relation, further, the complete historical question-answer pair corresponding to the user identification, the label of the historical question in the historical question-answer pair, the accuracy of the historical question-answer pair and the corresponding relation can be obtained, and a foundation is laid for determining the target question-answer pair meeting the preset conditions from the first matrix and the second matrix through a collaborative filtering algorithm.

The above embodiment describes generating the second matrix according to the historical question answering portrait corresponding to the user identifier and the initial corpus, and the following describes a specific method thereof. In one embodiment, as shown in fig. 5, generating a second matrix according to the historical question-answer image corresponding to the user identifier and the initial corpus includes:

step 520, obtaining an initial question-answer pair from the initial corpus, and obtaining a corresponding relationship between the initial question-answer pair and a historical question-answer pair corresponding to the user identifier in the historical question-answer portrait.

And 540, generating a second matrix according to the corresponding relation between the initial question-answer pair and the historical question-answer pair corresponding to the user identification in the historical question-answer portrait.

Specifically, according to the obtained historical question-answer portrait corresponding to the user identifier and the initial corpus, an initial question-answer pair is obtained from the initial corpus, and a corresponding relationship between the historical question-answer pair in the historical question-answer portrait corresponding to the user identifier and the initial question-answer pair in the initial corpus is found, so that the second matrix can be generated according to the corresponding relationship between the initial question-answer pair in the initial corpus and the historical question-answer pair corresponding to the user identifier in the historical question-answer portrait. The second matrix is used for representing the corresponding relation between the historical question-answer pairs in the historical question-answer portrait and the initial question-answer pairs in the initial corpus.

In the embodiment, an initial question-answer pair is obtained from an initial corpus, and the corresponding relation between the initial question-answer pair and a historical question-answer pair corresponding to a user identifier in a historical question-answer portrait is obtained; and generating a second matrix according to the corresponding relation between the initial question-answer pairs and the historical question-answer pairs corresponding to the user identifications in the historical question-answer portrait. The embodiment of the application finds the corresponding relation between the historical question-answer pairs in the historical question-answer images corresponding to the user identifications and the initial question-answer pairs in the initial corpus based on the historical question-answer pairs in the historical question-answer images corresponding to the user identifications and the initial question-answer pairs in the initial corpus, so that the second matrix can be accurately constructed according to the corresponding relation, further, the historical question-answer pairs in the historical question-answer images corresponding to the complete user identifications and the initial question-answer pairs in the initial corpus and the corresponding relation can be obtained, and a foundation is made for determining the target question-answer pairs meeting the preset conditions from the first matrix and the second matrix through a collaborative filtering algorithm.

The above embodiment describes determining a target question-answer pair satisfying a preset condition from a first matrix and a second matrix through a collaborative filtering algorithm, and a specific method thereof is described below. In one embodiment, as shown in fig. 6, determining a target question-answer pair satisfying a preset condition from a first matrix and a second matrix through a collaborative filtering algorithm includes:

and step 620, sorting the historical question-answer pairs and the initial question-answer pairs in the first matrix and the second matrix by adopting a collaborative filtering algorithm according to the accuracy of the historical question-answer pairs, and generating a sorting result.

Specifically, the first matrix includes historical question-answer pairs corresponding to the user identifiers, labels of historical questions in the historical question-answer pairs, accuracy of the historical question-answer pairs, and corresponding relationships between the historical question-answer pairs corresponding to the user identifiers, the labels of the historical questions in the historical question-answer pairs, and the accuracy of the historical question-answer pairs, and the second matrix includes historical question-answer pairs corresponding to the user identifiers in the historical question-answer images, and corresponding relationships between the initial question-answer pairs and the historical question-answer pairs in the initial corpus, so that the historical behavior data corresponding to the initial corpus and the user identifiers, and priorities of all corresponding initial question-answer pairs and historical question-answer pairs in the historical question-answer images can be calculated by adopting a collaborative filtering algorithm according to the accuracy of the historical question-answer pairs. The priority of the initial question-answer pair and the priority of the historical question-answer pair are characterized by the accuracy of the question-answer pair, namely the question-answer pair with higher priority is the question-answer pair with higher accuracy, so that the historical question-answer pair and the initial question-answer pair in the first matrix and the second matrix are sorted according to the calculated priority of the question-answer pair to generate a sorting result.

And step 640, determining a target question-answer pair with accuracy higher than a preset accuracy threshold from the sequencing results.

Specifically, the accuracy ranking of the question-answer pairs and the question-answer pairs corresponding to the accuracy of the question-answer pairs are obtained according to the ranking result. And screening out a target question-answer pair with the accuracy higher than a preset accuracy threshold value from the accuracy sequence of the question-answer pair, wherein the target question-answer pair comprises an initial corpus, historical behavior data corresponding to the user identification and a question-answer image. The preset accuracy threshold is specifically set according to the actual situation, and may be a certain accuracy selected according to the specific accuracy value as the preset accuracy threshold, or a certain accuracy rank selected according to the rank, and the accuracy of the rank is used as the preset accuracy threshold, which is not limited in the present application.

In the embodiment, according to the accuracy of the historical question-answer pairs, a collaborative filtering algorithm is adopted to sort the historical question-answer pairs and the initial question-answer pairs in the first matrix and the second matrix to generate a sorting result; and determining a target question-answer pair with accuracy higher than a preset accuracy threshold from the sequencing result. According to the embodiment of the application, the first matrix and the second matrix are sequenced through the collaborative filtering algorithm of label extension through the accuracy of the historical question-answer pairs, the association degree of the historical question-answer pairs and the initial question-answer pairs is continuously analyzed, and therefore the target question-answer pairs with the accuracy higher than the preset accuracy threshold value are determined, the initial corpus can be automatically updated according to the target question-answer pairs, the cost of corpus updating is reduced, and the efficiency of corpus updating is improved.

The above embodiment describes obtaining historical behavior data corresponding to a user identifier in a specific service field, and a specific method thereof is described below. In one embodiment, as shown in fig. 7, acquiring historical behavior data corresponding to a user identifier in a specific business domain includes:

and 720, acquiring historical question-answer pairs corresponding to the user identifications from the intelligent question-answer system in the specific service field.

Specifically, in the intelligent question-answering system in the specific service field, after the server 104 receives the historical questions of the intelligent question-answering system in the specific service field input by the user, the question text input by the user is used as a single document, word segmentation processing is performed on the question text, question text analysis is performed on the question, and the label of the question text is extracted. And matching the label of the question text with the initial question-answer pair in the initial corpus, and taking the answer of the initial question-answer pair in the matched initial corpus as the answer of the question input by the user. The server 104 generates a historical question-answer pair corresponding to the user identifier according to the question input by the user and the answer of the question input by the user.

And step 740, obtaining the scoring data of the historical question-answer pairs of the user identification, and generating the accuracy of the historical question-answer pairs according to the scoring data of the historical question-answer pairs.

Specifically, after obtaining the historical question-answer pair corresponding to the user identifier, the server 104 displays the answer to the user, so that the user can determine whether the obtained answer is the answer desired by the user, and score the question-answer process according to the obtained answer, or ask the question again, and the server 104 receives the scoring data of the user and the number of times of asking the question-answer pair by the user identifier, and obtains the scoring data of the historical question-answer pair by the user identifier according to the scoring data of the user and the number of times of asking the question-answer pair by the user identifier. And then, obtaining the accuracy result of the historical question-answer pair according to the grading data of the historical question-answer pair. When the score data of the user identification on the historical question-answer pairs is higher, the accuracy of the historical question-answer pairs is higher; the lower the user identification scores the lower the accuracy of the historical question-answer pairs.

And 760, generating historical behavior data corresponding to the user identifier according to the historical question-answer pair corresponding to the user identifier and the accuracy of the historical question-answer pair.

Specifically, a historical question-answer pair corresponding to the user identifier is obtained according to the question input by the user and the answer of the question input by the user. And obtaining the accuracy of the historical question-answer pair according to the grading data of the historical question-answer pair. Therefore, the historical behavior data corresponding to the user identification can be generated according to the historical question-answer pair corresponding to the user identification and the accuracy of the historical question-answer pair. The historical behavior data corresponding to the user identification comprises historical question-answer pairs corresponding to the user identification and the accuracy of the historical question-answer pairs.

In the embodiment, a historical question-answer pair corresponding to a user identifier is obtained from an intelligent question-answer system in a specific service field; acquiring the grading data of the user identification on the historical question-answer pairs, and generating the accuracy of the historical question-answer pairs according to the grading data of the historical question-answer pairs; and generating historical behavior data corresponding to the user identification according to the historical question-answer pair corresponding to the user identification and the accuracy of the historical question-answer pair. According to the embodiment of the application, the label of the question text is matched with the initial question-answer pair in the initial corpus, and the answer of the initial question-answer pair in the matched initial corpus is used as the answer of the question input by the user, so that the historical question-answer pair corresponding to the user identification can be accurately obtained. And then, the accuracy of the historical question-answer pair can be accurately obtained according to the score data of the historical question-answer pair calculated by each question-answer, so that the historical behavior data corresponding to the user identifier in the specific service field can be accurately obtained according to the historical question-answer pair corresponding to the user identifier and the accuracy of the historical question-answer pair, and a foundation can be laid for updating an initial corpus of the specific service field.

The above embodiment describes obtaining the scoring data of the user identifier to the historical question-answer pair, and the following describes a specific method thereof. In one embodiment, as shown in fig. 8, obtaining scoring data for a user identification versus a historical question-answer pair includes:

and step 820, if the scoring data of the user identifier to the historical question-answer pair is not obtained, obtaining the number of times of questions asked by the user identifier to the question-answer pair from the intelligent question-answer system.

And step 840, generating grading data of historical question-answer pairs according to the question times of the question-answer pairs.

Specifically, after obtaining the historical question-answer pairs corresponding to the user identifiers, the server 104 displays the answers to the questions to the user, so that the user can judge whether the obtained answers are the answers desired by the user, if yes, the user can complete the question-answer and quit the question-answer interface, and the server 104 receives the information of quitting the question-answer interface; if not, the user will continue to input questions for questioning, and since the server 104 is preset with user embedded points, the server 104 can count the number of questioning times of the question-answer pairs of the user in the question-answer flow of this time through the user embedded points. In addition, the server 104 also presets a user scoring module, and the user scoring module is used for receiving scoring data of the user on the question-answering process. If the user scores, the server 104 receives the scoring data of the user; if the user does not score, the server 104 converts the number of questions asked by the user in the question-answer process into a user score within a preset score range by using a logarithm function according to the number of questions asked by the user in the question-answer process counted by the user buried point data, and converts the number of questions asked by the user identifier for the question-answer pair into a calculation formula of the user score as shown in the following formula (1):

R _u ＝10*1/1+log(Question_U_Asked) (1)

wherein R is _u Representing the user score converted by the user identifier for the number of questions of the Question-answer pair, and the Question _ U _ ask represents the number of questions of the user identifier for the Question-answer pair.

And then, acquiring the scoring data of the user identifier for the historical question-answer pairs according to the scoring data of the user and the user score converted by the user identifier for the number of times of questions asked for the question-answer pairs.

In the embodiment, if the scoring data of the user identifier for the historical question-answer pairs is not obtained, the number of times of questions asked by the user identifier for the question-answer pairs is obtained from the intelligent question-answer system; and generating grading data of historical question-answer pairs according to the number of times of questions of the question-answer pairs. According to the embodiment of the application, the number of times of asking the user identification for the question-answer pair is converted into the user score, and the scoring data of the user identification for the historical question-answer pair can be accurately obtained, so that the accuracy of the historical question-answer pair can be accurately generated according to the scoring data of the historical question-answer pair, and a foundation is made for generating the historical behavior data corresponding to the user identification later.

The above embodiment describes obtaining a historical question-answering portrait corresponding to a user identifier in a specific service field, and a specific method thereof is described below. In one embodiment, as shown in fig. 9, obtaining a historical question-answer image corresponding to a user identifier in a specific business field includes:

step 920, obtaining a historical question-answer pair corresponding to the user identification from an intelligent question-answer system in a specific service field; the historical answers in the historical question-answer pairs are determined based on the labels of the historical questions matching the initial question-answer pairs in the initial corpus.

And 940, generating a historical question-answer image corresponding to the user identifier according to the historical question-answer pair corresponding to the user identifier and the tags of the historical questions in the historical question-answer pair.

Specifically, in the intelligent question-answering system in the specific service field, after the server 104 receives the historical questions of the intelligent question-answering system in the specific service field input by the user, the question text input by the user is used as a single Document, and word segmentation is performed on the question text to obtain an N-dimensional Vector Space Model (VSM) based on Term Frequency-Inverse Document Frequency (TF-IDF) weight, so that the question text is analyzed according to the N-dimensional Vector Space Model, and tag information in the question text is extracted to obtain tags of the historical questions.

The N-dimensional vector space model is an algebraic model applied to information filtering, information extraction, indexing and correlation evaluation. In the N-dimensional vector space model, each question entered by the user can be represented as a text d _j A plurality of questions are represented as a question text set D, and label information in the document set D is represented as a label t _k The plurality of labels is represented as a dictionary T. The expression formula of the question text set D is shown in the following equation (2), and the expression formula of the dictionary T is shown in the following equation (3):

D＝{d ₁ ,d ₂ ,...,d _N } (2)

T＝{t ₁ ,t ₂ ,...,t _N } (3)

according to the question text set D and the dictionary T, the weight omega of the question text on each label can be obtained _kj Wherein the weight ω is _kj As a document d _j Middle label t _k The weight of (c). Weight ω _kj The expression of (b) is shown in the following formula (4).

d _j ＝{ω _1j ,ω _2j ,...,ω _Nj } (4)

Then, matching the label of the question with the initial question-answer pair in the initial corpus, and if the question-answer pair corresponding to the question input by the user can be searched in the initial corpus, taking the question answer in the question-answer pair as the answer of the question input by the user; if the question-answer pair corresponding to the question input by the user cannot be searched in the initial corpus, matching the question input by the user with the question in the initial corpus, calculating the similarity between the question input by the user and the question in the initial corpus, and taking the answer of the question with the highest similarity to the question input by the user in the initial corpus as the answer of the question input by the user. Similarity sim (d) between question texts _i ,d _j ) The formula (5) is shown below:

the server 104 generates a historical question-answer pair corresponding to the user identifier according to the question input by the user and the answer of the question input by the user. And generating a historical question-answer picture corresponding to the user identifier according to the historical question-answer pair corresponding to the generated user identifier and the labels of the historical questions in the historical question-answer pair extracted when the answers of the historical questions are acquired. The historical question-answer portrait comprises a historical question-answer pair corresponding to the user identification and tags of historical questions in the historical question-answer pair.

In the embodiment, a historical question-answer pair corresponding to a user identifier is obtained from an intelligent question-answer system in a specific service field; the historical answers in the historical question-answer pairs are determined based on the fact that labels of the historical questions are matched with the initial question-answer pairs in the initial corpus; and generating a historical question-answer picture corresponding to the user identifier according to the historical question-answer pair corresponding to the user identifier and the tags of the historical questions in the historical question-answer pair. The method comprises the steps that after a user inputs a question, answers of the question are obtained in a search-type question-answering and task-type question-answering mode, and when the answers of the question are obtained in the search-type question-answering mode, the answers corresponding to question-answer pairs corresponding to question labels are searched in an initial corpus according to the question labels; when the task type question answering mode is adopted to obtain the question answers, the question answers are continuously optimized through multiple times of question answering, and the optimal question answers are finally output to the user, so that the historical question answering pairs corresponding to the user identification can be accurately obtained, the question answering accuracy by adopting the intelligent question answering system in the specific business field is improved, and the problems that the user consultation questions in the specific business field are not clear in semantics and not strong in professional specifications are solved. Then, according to the historical question-answer pairs corresponding to the user identifications and the labels of the historical questions in the historical question-answer pairs, historical question-answer images corresponding to the user identifications in the specific service field can be accurately obtained, and a foundation can be laid for updating an initial corpus of the specific service field.

In a specific embodiment, as shown in fig. 10, there is provided a corpus updating method applied to the server 104, including:

step 1002, acquiring a historical question-answer pair corresponding to a user identifier from an intelligent question-answer system in a specific service field; the historical answers in the historical question-answer pairs are determined based on the fact that labels of the historical questions are matched with the initial question-answer pairs in the initial corpus;

step 1004, generating a historical question-answer picture corresponding to the user identifier according to the historical question-answer pair corresponding to the user identifier and the tags of the historical questions in the historical question-answer pair;

step 1006, obtaining scoring data of the user identifier for the historical question-answer pair, and if the scoring data of the user identifier for the historical question-answer pair is not obtained, obtaining the number of times of asking questions of the user identifier for the question-answer pair from the intelligent question-answer system;

step 1008, generating scoring data of historical question-answer pairs according to the number of times of questions of the question-answer pairs;

step 1010, generating the accuracy of the historical question-answer pairs according to the grading data of the historical question-answer pairs;

step 1012, generating historical behavior data corresponding to the user identifier according to the historical question-answer pair corresponding to the user identifier and the accuracy of the historical question-answer pair;

step 1014, acquiring an initial corpus of a specific service field, historical behavior data corresponding to a user identifier in the specific service field, and a historical question-answer portrait; the historical behavior data, the historical question-answer portrait and the initial corpus comprise question-answer pairs;

step 1016, determining a target question-answer pair meeting a preset condition from the initial corpus, historical behavior data corresponding to the user identification and a historical question-answer portrait; the preset conditions comprise that the accuracy of the question-answer pairs is higher than a preset accuracy threshold value;

step 1018, updating the initial corpus of the specific business field according to the target question-answer pair meeting the preset condition;

step 1020, obtaining corresponding relations among historical question and answer pairs corresponding to the user identifications, tags of historical questions in the historical question and answer pairs, and accuracy rates of the historical question and answer pairs from the historical behavior data and the historical question and answer images;

step 1022, generating a first matrix according to the corresponding relationship among the historical question-answer pair corresponding to the user identifier, the label of the historical question in the historical question-answer pair, and the accuracy of the historical question-answer pair; the first matrix is used for representing the corresponding relation between historical behavior data corresponding to the user identification and historical question and answer pairs in the historical question and answer portrait;

step 1024, acquiring initial question-answer pairs from the initial corpus, and acquiring corresponding relations between the initial question-answer pairs and historical question-answer pairs corresponding to the user identifications in the historical question-answer portrait;

step 1026, generating a second matrix according to the corresponding relation between the initial question-answer pairs and the historical question-answer pairs corresponding to the user identifications in the historical question-answer portrait; the second matrix is used for representing the corresponding relation between the historical question-answer pairs in the historical question-answer portrait and the initial question-answer pairs in the initial corpus;

step 1028, sorting the historical question-answer pairs and the initial question-answer pairs in the first matrix and the second matrix by using a collaborative filtering algorithm according to the accuracy of the historical question-answer pairs, and generating a sorting result;

step 1030, determining a target question-answer pair with accuracy higher than a preset accuracy threshold from the sequencing result;

and 1032, updating the initial corpus of the specific service field according to the target question-answer pair meeting the preset condition.

Specifically, fig. 11 is a schematic flowchart of a method for performing question answering by using an intelligent question answering system in a specific embodiment, and fig. 12 is a schematic flowchart of a method for performing corpus update by using a collaborative filtering algorithm in a specific embodiment. First, as shown in fig. 11, a user inputs a question about a specific service area on a system interface layer and makes a query. And after receiving the problem input by the user, the system interface layer sends the problem to a system service layer.

Secondly, after receiving the questions consulted by the user, the system service layer takes the question texts of the user as single documents and carries out word segmentation processing on the question texts, thereby carrying out text analysis on the questions, extracting the labels in the question texts, sending the corresponding question and label information to the system data layer, and caching the question and label information in a period of time in the question-answer portrait in the system service layer.

Thirdly, after receiving the questions and the labels sent by the system service layer, the system data layer matches the questions and the labels with the initial question-answer pairs in the initial corpus, and if the question-answer pairs corresponding to the questions input by the user can be retrieved in the initial corpus, the answers to the questions in the question-answer pairs are used as answers to the questions input by the user; if the question-answer pair corresponding to the question input by the user cannot be searched in the initial corpus, matching the question input by the user with the question in the initial corpus, calculating the similarity between the question input by the user and the question in the initial corpus, taking the answer of the question with the highest similarity to the question input by the user in the initial corpus as the answer of the question input by the user, and returning the answer to the system service layer. The system service layer caches the obtained answers to the questions in the question-answer image, wherein the question-answer image stores the corresponding relations among the questions, the labels of the questions and the answers to the questions, namely the question-answer image comprises the historical question-answer pairs corresponding to the user identifications and the labels of the historical questions in the historical question-answer pairs. Then, the answer of the question is sent back to the system interface layer; where the initial corpus is created by a domain expert providing initial question-answer pairs, and labels for the questions.

And fourthly, the system interface layer receives the answers sent by the system data layer and displays the answers of the questions to the user. The user judges whether the obtained answer is the answer wanted by the user, if so, the question answering is finished, and a question answering interface is quitted; if not, the question is continuously input to ask a question. As shown in fig. 12, since the user embedded points are set in the system interface layer in advance, the number of questions asked by the question and answer pair in the question and answer flow of this time can be counted by the user embedded points. And a user scoring module is also arranged in the system interface layer and is used for receiving scoring data of the question-answering process of the user. And then, the system interface layer sends the question-answer pair, the question times of the question-answer pair and the grading data of the question-answer pair to a historical behavior data module in a system service layer. And the historical behavior data module obtains the grading data of the historical question-answer pair of the user identification based on the question number of the question-answer pair and the grading data of the question-answer pair, thereby generating the historical behavior data corresponding to the user identification. And then, the system service layer sends the historical behavior data corresponding to the user identification and the corresponding question-answer portrait to the system data layer.

Fifthly, as shown in fig. 12, after receiving the historical behavior data corresponding to the user identifier and the corresponding question-answering portrait, the system data layer obtains a first matrix according to the corresponding relationship between the historical behavior data corresponding to the user identifier and the question-answering portrait; and obtaining a second matrix through the corresponding relation between the initial question-answer pairs in the initial corpus and the historical question-answer pairs corresponding to the user identifications in the question-answer portrait. And then, according to the grading data of the historical question-answer pairs, namely according to the accuracy of the historical question-answer pairs, sequencing the historical question-answer pairs and the initial question-answer pairs in the first matrix and the second matrix by adopting a collaborative filtering algorithm to generate a sequencing result of the question-answer pairs. And determining a target question-answer pair with the accuracy higher than a preset accuracy threshold from the sequencing result. And updating the initial corpus of the specific service field according to the target question-answer pair with the accuracy higher than the preset accuracy threshold.

In the embodiment, an initial corpus of a specific service field, historical behavior data corresponding to a user identifier in the specific service field, and a historical question-answer portrait are obtained; the historical behavior data, the historical question and answer portrait and the initial corpus comprise question and answer pairs; determining a target question-answer pair meeting preset conditions from the initial corpus, historical behavior data corresponding to the user identification and a historical question-answer portrait; the preset conditions comprise that the accuracy of the question-answer pairs is higher than a preset accuracy threshold value; and updating the initial corpus of the specific business field according to the target question-answer pair meeting the preset condition. According to the method, the target question-answer pairs meeting the preset conditions are determined from the acquired initial corpus of the specific business field and the historical behavior data and the historical question-answer portrait corresponding to the user identification in the specific business field, so that the initial corpus of the specific business field can be updated according to the target question-answer pairs meeting the preset conditions, the whole process is to automatically update the initial corpus according to the initial corpus of the specific business field and the historical behavior data and the target question-answer pairs meeting the preset conditions in the historical question-answer portrait, a dictionary of the specific business field is not required to be built by an expert of the specific business field to update the corpus, and the problem that the expert of the specific business field needs to spend a large amount of time to build the dictionary of the specific business field in the traditional method is avoided. Therefore, the cost of corpus updating is reduced, and the efficiency of corpus updating is improved.

It should be understood that, although the steps in the flowcharts related to the above embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the above embodiments may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides a corpus updating apparatus for implementing the above-mentioned corpus updating method. The solution of the problem provided by the apparatus is similar to the solution described in the above method, so the specific limitations in one or more embodiments of the corpus updating apparatus provided below can be referred to the limitations of the corpus updating method in the above, and are not described herein again.

In one embodiment, as shown in fig. 13, there is provided a corpus updating apparatus 1300, including: a data acquisition module 1320, a target question-answer pair determination module 1340, and an initial corpus update module 1360, wherein:

a data obtaining module 1320, configured to obtain an initial corpus of a specific business field, and historical behavior data and a historical question-answer sketch corresponding to a user identifier in the specific business field; the historical behavioral data, the historical query-response sketch, and the initial corpus include query-response pairs.

The target question-answer pair determining module 1340 is configured to determine a target question-answer pair meeting a preset condition from the initial corpus, historical behavior data corresponding to the user identifier, and a historical question-answer portrait; the preset condition comprises that the accuracy of the question-answer pair is higher than a preset accuracy threshold value.

The initial corpus updating module 1360 is configured to update the initial corpus in the specific service field according to the target question-answer pair satisfying the preset condition.

In one embodiment, the target question-answer pair determining module 1340 includes:

and the target question-answer pair determining unit is used for determining a target question-answer pair meeting preset conditions from the initial corpus, historical behavior data corresponding to the user identification and a historical question-answer portrait through a collaborative filtering algorithm.

In one embodiment, the target question-answer pair determining unit includes:

the first matrix generation subunit is used for generating a first matrix according to the historical behavior data and the historical question-answer portrait corresponding to the user identification; the first matrix is used for representing the corresponding relation between historical behavior data corresponding to the user identification and historical question and answer pairs in the historical question and answer portrait;

the second matrix generation subunit is used for generating a second matrix according to the historical question answering portrait corresponding to the user identification and the initial corpus; the second matrix is used for representing the corresponding relation between the historical question-answer pairs in the historical question-answer portrait and the initial question-answer pairs in the initial corpus;

and the target question-answer pair generation subunit is used for determining a target question-answer pair meeting a preset condition from the first matrix and the second matrix through a collaborative filtering algorithm.

In one embodiment, the historical behavior data comprises historical question-answer pairs corresponding to the user identifications and accuracy rates of the historical question-answer pairs; the historical question-answer portrait comprises a historical question-answer pair corresponding to the user identification and tags of historical questions in the historical question-answer pair; a first matrix generation unit comprising:

the first corresponding relation obtaining subunit is used for obtaining corresponding relations among historical question and answer pairs corresponding to the user identifiers, tags of historical questions in the historical question and answer pairs and accuracy of the historical question and answer pairs from the historical behavior data and the historical question and answer images;

and the first matrix obtaining subunit is used for generating a first matrix according to the corresponding relation among the historical question-answer pairs corresponding to the user identification, the labels of the historical questions in the historical question-answer pairs and the accuracy of the historical question-answer pairs.

In one embodiment, the second matrix generating unit includes:

a second corresponding relation obtaining subunit, configured to obtain an initial question-answer pair from the initial corpus, and obtain a corresponding relation between the initial question-answer pair and a historical question-answer pair corresponding to the user identifier in the historical question-answer portrait;

and the second matrix obtaining subunit is used for generating a second matrix according to the corresponding relation between the initial question-answer pairs and the historical question-answer pairs corresponding to the user identifications in the historical question-answer portrait.

In one embodiment, the target question-answer pair generating unit includes:

the sequencing result generating subunit is used for sequencing the historical question-answer pairs and the initial question-answer pairs in the first matrix and the second matrix by adopting a collaborative filtering algorithm according to the accuracy of the historical question-answer pairs to generate a sequencing result;

and the target question-answer pair determining subunit is used for determining a target question-answer pair with the accuracy higher than a preset accuracy threshold value from the sequencing result.

In one embodiment, the data obtaining module 1320 includes:

the historical question-answer pair acquisition unit is used for acquiring a historical question-answer pair corresponding to the user identification from an intelligent question-answer system in the specific service field;

the score data acquisition unit is used for acquiring the score data of the historical question-answer pair of the user identifier and generating the accuracy of the historical question-answer pair according to the score data of the historical question-answer pair;

and the historical behavior data generating unit is used for generating the historical behavior data corresponding to the user identification according to the historical question-answer pair corresponding to the user identification and the accuracy of the historical question-answer pair.

In one embodiment, the score data acquiring unit includes:

the question number obtaining subunit is used for obtaining the question number of the user identifier for the question-answer pair from the intelligent question-answer system if the grading data of the user identifier for the historical question-answer pair is not obtained;

and the scoring data generating subunit is used for generating scoring data of historical question-answer pairs according to the number of times of questions of the question-answer pairs.

In one embodiment, the data obtaining module 1320 includes:

the historical question-answer pair generating unit is used for acquiring a historical question-answer pair corresponding to the user identification from an intelligent question-answer system in a specific service field; historical answers in the historical question-answer pairs are determined based on the fact that labels of historical questions are matched with the initial question-answer pairs in the initial corpus;

and the historical question-answer portrait generating unit is used for generating a historical question-answer portrait corresponding to the user identifier according to the historical question-answer pair corresponding to the user identifier and the tags of the historical questions in the historical question-answer pair.

The various modules in the corpus update apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 14. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store corpus update data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a corpus update method.

Those skilled in the art will appreciate that the architecture shown in fig. 14 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory having a computer program stored therein and a processor that when executing the computer program performs the steps of:

determining a target question-answer pair meeting preset conditions from the initial corpus, historical behavior data corresponding to the user identification and a historical question-answer portrait; the preset conditions comprise that the accuracy of the question-answer pairs is higher than a preset accuracy threshold value;

and updating the initial corpus of the specific business field according to the target question-answer pair meeting the preset condition.

In one embodiment, a target question-answer pair satisfying a preset condition is determined from an initial corpus, historical behavior data corresponding to a user identifier, and a historical question-answer portrait, and the processor executes the computer program to further implement the following steps:

In one embodiment, a collaborative filtering algorithm is used to determine a target question-answer pair satisfying a preset condition from an initial corpus, historical behavior data corresponding to a user identifier, and a historical question-answer portrait, and the processor executes the computer program to further implement the following steps:

generating a second matrix according to the historical question-answer portrait corresponding to the user identification and the initial corpus; the second matrix is used for representing the corresponding relation between the historical question-answer pairs in the historical question-answer portrait and the initial question-answer pairs in the initial corpus;

In one embodiment, the historical behavior data includes historical question-answer pairs corresponding to the user identifications and accuracy rates of the historical question-answer pairs; the historical question-answer portrait comprises a historical question-answer pair corresponding to the user identification and tags of historical questions in the historical question-answer pair;

generating a first matrix according to historical behavior data and a historical question-answer portrait corresponding to the user identification, wherein the processor further realizes the following steps when executing the computer program:

acquiring corresponding relations among historical question-answer pairs corresponding to user identifications, tags of historical questions in the historical question-answer pairs and accuracy rates of the historical question-answer pairs from historical behavior data and the historical question-answer images;

In one embodiment, the second matrix is generated according to the historical question-answering portrait corresponding to the user identifier and the initial corpus, and the processor, when executing the computer program, further implements the following steps:

acquiring an initial question-answer pair from the initial corpus, and acquiring a corresponding relation between the initial question-answer pair and a historical question-answer pair corresponding to a user identifier in a historical question-answer portrait;

and generating a second matrix according to the corresponding relation between the initial question-answer pairs and the historical question-answer pairs corresponding to the user identifications in the historical question-answer portrait.

In one embodiment, the processor determines a target question-answer pair satisfying a preset condition from the first matrix and the second matrix through a collaborative filtering algorithm, and when executing the computer program, the processor further implements the following steps:

and determining a target question-answer pair with the accuracy higher than a preset accuracy threshold value from the sequencing result.

In one embodiment, historical behavior data corresponding to a user identifier in a specific business domain is obtained, and the processor executes the computer program to further perform the following steps:

acquiring a historical question-answer pair corresponding to a user identification from an intelligent question-answer system in a specific service field;

obtaining scoring data of the user identification to the historical question-answer pair, and generating the accuracy of the historical question-answer pair according to the scoring data of the historical question-answer pair;

In one embodiment, scoring data for historical question-answer pairs identified by the user is obtained, and the processor, when executing the computer program, further performs the steps of:

if the grading data of the user identification to the historical question-answer pair is not obtained, obtaining the question times of the user identification to the question-answer pair from the intelligent question-answer system;

and generating scoring data of the historical question-answer pairs according to the number of times of questions of the question-answer pairs.

In one embodiment, a historical question answering portrait corresponding to a user identifier in a specific business domain is obtained, and the processor, when executing the computer program, further implements the following steps:

acquiring a historical question-answer pair corresponding to a user identifier from an intelligent question-answer system in a specific service field; historical answers in the historical question-answer pairs are determined based on the fact that labels of historical questions are matched with the initial question-answer pairs in the initial corpus;

and generating a historical question-answer portrait corresponding to the user identification according to the historical question-answer pair corresponding to the user identification and the tags of the historical questions in the historical question-answer pair.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program, when executed by the processor, further implements the following steps:

In one embodiment, the historical behavior data includes historical question-answer pairs corresponding to the user identification and accuracy rates of the historical question-answer pairs; the historical question-answer portrait comprises a historical question-answer pair corresponding to the user identification and tags of historical questions in the historical question-answer pair;

generating a first matrix according to historical behavior data and a historical question-answer portrait corresponding to the user identification, wherein when the computer program is executed by the processor, the following steps are further realized:

In one embodiment, the second matrix is generated based on the historical question answering image corresponding to the user identifier and the initial corpus, and the computer program when executed by the processor further implements the following steps:

and generating a second matrix according to the corresponding relation between the initial question-answer pair and the historical question-answer pair corresponding to the user identification in the historical question-answer portrait.

In one embodiment, the target question-answer pair satisfying the preset condition is determined from the first matrix and the second matrix through a collaborative filtering algorithm, and the computer program further realizes the following steps when executed by the processor:

In one embodiment, the historical behavior data corresponding to the user identifier in the specific business domain is obtained, and the computer program when executed by the processor further performs the steps of:

acquiring a historical question-answer pair corresponding to a user identifier from an intelligent question-answer system in a specific service field;

acquiring the grading data of the user identification on the historical question-answer pairs, and generating the accuracy of the historical question-answer pairs according to the grading data of the historical question-answer pairs;

and generating historical behavior data corresponding to the user identification according to the historical question-answer pairs corresponding to the user identification and the accuracy of the historical question-answer pairs.

In one embodiment, scoring data for historical question-answer pairs is obtained for user identification, the computer program when executed by the processor further performing the steps of:

if the grading data of the user identification on the historical question-answer pairs are not obtained, obtaining the question times of the user identification on the question-answer pairs from the intelligent question-answer system;

In one embodiment, a historical quiz representation corresponding to a user identifier in a particular business segment is obtained, and the computer program when executed by the processor further performs the steps of:

In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of:

In one embodiment, a target question-answer pair satisfying a preset condition is determined from an initial corpus and historical behavior data and historical question-answer portraits corresponding to a user identifier, and when executed by a processor, the computer program further implements the following steps:

In one embodiment, a collaborative filtering algorithm is used to determine a target question-answer pair satisfying a preset condition from an initial corpus, historical behavior data corresponding to a user identifier, and a historical question-answer portrait, and when executed by a processor, the computer program further implements the following steps:

acquiring initial question-answer pairs from an initial corpus, and acquiring corresponding relations between the initial question-answer pairs and historical question-answer pairs corresponding to user identifications in historical question-answer portraits;

acquiring a historical question-answer pair corresponding to a user identifier from an intelligent question-answer system in a specific service field; the historical answers in the historical question-answer pairs are determined based on the fact that labels of the historical questions are matched with the initial question-answer pairs in the initial corpus;

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, databases, or other media used in the embodiments provided herein can include at least one of non-volatile and volatile memory. The nonvolatile Memory may include a Read-Only Memory (ROM), a magnetic tape, a floppy disk, a flash Memory, an optical Memory, a high-density embedded nonvolatile Memory, a resistive Random Access Memory (ReRAM), a Magnetic Random Access Memory (MRAM), a Ferroelectric Random Access Memory (FRAM), a Phase Change Memory (PCM), a graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A method for corpus update, the method comprising:

acquiring an initial corpus of a specific service field, historical behavior data and a historical question-answer portrait corresponding to a user identifier in the specific service field; the historical behavior data, the historical question-answer portrait and the initial corpus comprise question-answer pairs;

and updating the initial corpus of the specific business field according to the target question-answer pair meeting the preset conditions.

2. The method according to claim 1, wherein the determining, from the historical behavior data and the historical question-answer image corresponding to the initial corpus and the user id, a target question-answer pair that meets a preset condition includes:

3. The method according to claim 2, wherein the determining, by using a collaborative filtering algorithm, a target question-answer pair satisfying a preset condition from historical behavior data and a historical question-answer portrait corresponding to the initial corpus and the user identifier includes:

4. The method of claim 3, wherein the historical behavior data comprises historical question-answer pairs corresponding to the user identifications and accuracy rates of the historical question-answer pairs; the historical question-answer portrait comprises a historical question-answer pair corresponding to the user identification and tags of historical questions in the historical question-answer pair;

generating a first matrix according to the historical behavior data and the historical question-answer portrait corresponding to the user identifier, wherein the generating of the first matrix comprises the following steps:

obtaining a corresponding relation between a historical question-answer pair corresponding to the user identification, a label of a historical question in the historical question-answer pair and the accuracy of the historical question-answer pair from the historical behavior data and the historical question-answer portrait;

5. The method according to claim 4, wherein generating a second matrix according to the historical question-answer images corresponding to the user identifiers and the initial corpus comprises:

6. The method according to any one of claims 3-5, wherein determining, from the first matrix and the second matrix, a target question-answer pair satisfying a preset condition through a collaborative filtering algorithm comprises:

7. The method according to claim 4, wherein the obtaining historical behavior data corresponding to the user identifier in the specific business segment comprises:

obtaining the scoring data of the historical question-answer pairs by the user identification, and generating the accuracy of the historical question-answer pairs according to the scoring data of the historical question-answer pairs;

8. The method of claim 7, wherein obtaining scoring data for the historical question-answer pairs for the user identification comprises:

if the scoring data of the user identification on the historical question-answer pair is not obtained, obtaining the number of times of questions asked by the user identification on the question-answer pair from the intelligent question-answer system;

and generating the grading data of the historical question-answer pairs according to the question times of the question-answer pairs.

9. The method according to claim 7, wherein said obtaining the historical quiz images corresponding to the user identifiers in the specific business segment comprises:

10. A corpus updating apparatus, the apparatus comprising:

the data acquisition module is used for acquiring an initial corpus of a specific service field and historical behavior data and a historical question-answer portrait corresponding to a user identifier in the specific service field; the historical behavior data, the historical question-answer portrait and the initial corpus comprise question-answer pairs;

11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 9 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 9.

13. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 9 when executed by a processor.