CN114358023B

CN114358023B - Intelligent question-answer recall method, intelligent question-answer recall device, computer equipment and storage medium

Info

Publication number: CN114358023B
Application number: CN202210028532.0A
Authority: CN
Inventors: 颜泽龙; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-01-11
Filing date: 2022-01-11
Publication date: 2023-08-22
Anticipated expiration: 2042-01-11
Also published as: CN114358023A

Abstract

The embodiment of the application belongs to the field of artificial intelligence and relates to an intelligent question-answer recall method, which comprises the steps of segmenting training questions and candidate answers to obtain a first word set and a second word set; encoding the first word set and the second word set according to the basic question-answering model to obtain a first encoding vector and a second encoding vector; calculating the first code vector to obtain a question code vector, and calculating the second code vector to obtain a candidate answer characterization vector; calculating the question coding vector and the candidate answer characterization vector to obtain a question characterization vector, and calculating the similarity of the question characterization vector and the candidate answer characterization vector to obtain a target question-answering model; and calculating the questions to be processed and the candidate answers according to the target question-answering model to obtain target recall sentences. The application also provides an intelligent question-answering recall device, computer equipment and a storage medium. The target recall statement may be stored in the blockchain. The application improves the accuracy and efficiency of question-answer recall.

Description

Intelligent question-answer recall method, intelligent question-answer recall device, computer equipment and storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to an intelligent question-answer recall method, an intelligent question-answer recall device, computer equipment and a storage medium.

Background

The intelligent question-answering system can accurately position question knowledge required by website users in a question-answer mode. By interacting with the website user, the intelligent question-answering system can provide personalized information services for the website user. The intelligent question-answering system comprises the aspects of relevant question-answering pushing, automatic ranking of focus questions, online customer service question-answering and the like, which are closely related to the similarity of calculated sentences, scoring is carried out according to the sentence similarity, and tasks such as pushing answers, ranking, online answers and the like are completed.

In recent years, intelligent question-answering systems are realized by pre-training a language database mainly through a pre-training language model and then fine-tuning according to specific question-answering sentence selection tasks. However, the accuracy is often not high while the question-answer matching efficiency is ensured by the current method of fine tuning the pre-training language model; when the accuracy of question-answer matching is ensured, high-efficiency question-answer matching cannot be realized due to the characteristics of cross coding.

Disclosure of Invention

The embodiment of the application aims to provide an intelligent question-answer recall method, device, computer equipment and storage medium, so as to solve the technical problem that the accuracy of question-answer matching cannot be ensured and the question-answer recall efficiency cannot be improved by a current question-answer matching system.

In order to solve the technical problems, the embodiment of the application provides an intelligent question-answer recall method, which adopts the following technical scheme:

acquiring a preset data set, performing word segmentation on training questions in the preset data set to obtain a first word set, and performing word segmentation on candidate answers in the preset data set to obtain a second word set;

constructing a basic question-answering model, wherein the basic question-answering model comprises a left coding layer, a right coding layer, a first attention layer, an aggregation layer and a second attention layer, the first word set is input to the left coding layer of the basic question-answering model, a first coding vector is obtained by coding, and the second word set is input to the right coding layer of the basic question-answering model, and a second coding vector is obtained by coding;

performing attention calculation on the first coding vector based on a first attention layer of the basic question-answering model to obtain a question coding vector corresponding to the training question, and performing calculation on the second coding vector based on an aggregation function in an aggregation layer of the basic question-answering model to obtain a candidate answer characterization vector corresponding to the candidate answer;

Performing attention calculation on the question coding vector and the candidate answer characterization vector according to a second attention layer of the basic question-answering model to obtain a question characterization vector, calculating the similarity of the question characterization vector and the candidate answer characterization vector, and training the basic question-answering model based on the similarity to obtain a target question-answering model;

and when the to-be-processed question is received, inputting the to-be-processed question and the candidate answer into the target question-answering model to obtain a target recall sentence corresponding to the to-be-processed question.

Further, the step of calculating the first coding vector based on the first attention layer of the basic question-answering model to obtain a question coding vector corresponding to the training question, and calculating the second coding vector based on an aggregation function in the aggregation layer of the basic question-answering model to obtain a candidate answer characterization vector corresponding to the candidate answer includes:

acquiring an enumeration vector corresponding to the first coding vector, and performing attention calculation according to the first coding vector and the enumeration vector to obtain the problem coding vector;

and acquiring a preset aggregation function, and aggregating the second coding vector according to the aggregation function to obtain candidate answer characterizations of the second coding vector.

Further, the step of obtaining the enumeration vector corresponding to the first coding vector includes:

acquiring a random initial value and the dimension of the first coding vector;

and selecting the vectors with the random initial value numbers from the first coding vectors as the enumeration vectors, and taking the dimension of the first coding vectors as the vector dimension of the enumeration vectors.

Further, the step of constructing the basic question-answering model includes:

acquiring a pre-trained pre-coding layer, and taking the pre-coding layer as a left coding layer and a right coding layer of the basic question-answering model;

acquiring a preset first attention function and an activation function, constructing a first attention layer according to the first attention function, and constructing an aggregation layer according to the activation function;

and acquiring a second attention function, constructing a second attention layer according to the second attention function, and combining the left coding layer, the right coding layer, the first attention layer, the aggregation layer and the second attention layer to form the basic question-answering model.

Further, the step of obtaining the pre-trained pre-encoded layer includes:

acquiring an online corpus and a basic pre-training language model, training the basic pre-training language model according to the online corpus to obtain a target pre-training language model, and taking a coding layer of the target pre-training language model as a pre-coding layer after pre-training is completed.

Further, the step of training the basic question-answering model based on the similarity to obtain a target question-answering model includes:

acquiring a matching degree label of the training question and the candidate answer according to the preset training set;

and calculating a loss function of the basic question-answer model according to the matching degree label and the similarity, and determining that the basic question-answer model is trained to obtain the target question-answer model when the loss function converges.

Further, the step of inputting the to-be-processed question and the candidate answer into the target question-answer model to obtain a target recall sentence corresponding to the to-be-processed question includes:

splitting the problem to be processed into a word set composed of a plurality of words, and encoding the word set according to a target encoding layer of the target question-answering model to obtain a third encoding vector;

selecting a problem enumeration vector of the third coding vector, and performing attention calculation on the third coding vector and the problem enumeration vector according to a first attention layer in the target question-answering model to obtain a coding vector to be processed;

obtaining all stored candidate answer characterization vectors, and calculating the to-be-processed coding vector and the candidate answer characterization vectors according to a second attention layer in the target question-answering model to obtain to-be-processed characterization vectors of the to-be-processed questions;

And calculating the target similarity of the to-be-processed characterization vector and the candidate answer characterization vector, and determining a candidate answer corresponding to the maximum target similarity as a target recall sentence of the to-be-processed question.

In order to solve the technical problems, the embodiment of the application also provides an intelligent question-answering recall device, which adopts the following technical scheme:

the word segmentation module is used for acquiring a preset data set, performing word segmentation on training questions in the preset data set to obtain a first word set, and performing word segmentation on candidate answers in the preset data set to obtain a second word set;

the coding module is used for constructing a basic question-answering model, the basic question-answering model comprises a left coding layer, a right coding layer, a first attention layer, an aggregation layer and a second attention layer, the first word set is input to the left coding layer of the basic question-answering model, a first coding vector is obtained through coding, and the second word set is input to the right coding layer of the basic question-answering model, and a second coding vector is obtained through coding;

the calculation module is used for carrying out attention calculation on the first coding vector based on a first attention layer of the basic question-answering model to obtain a question coding vector corresponding to the training question, and carrying out calculation on the second coding vector based on an aggregation function in an aggregation layer of the basic question-answering model to obtain a candidate answer characterization vector corresponding to the candidate answer;

The training module is used for carrying out attention calculation on the question coding vector and the candidate answer characterization vector according to the second attention layer of the basic question-answering model to obtain a question characterization vector, calculating the similarity of the question characterization vector and the candidate answer characterization vector, and training the basic question-answering model based on the similarity to obtain a target question-answering model;

and the recall module is used for inputting the to-be-processed problem and the candidate answer into the target question-answer model when the to-be-processed problem is received, and calculating the to-be-processed problem and the candidate answer according to the target question-answer model to obtain a target recall sentence corresponding to the to-be-processed problem.

In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:

In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:

According to the intelligent question-answering recall method, a preset data set is obtained, word segmentation processing is conducted on training questions in the preset data set to obtain a first word set, word segmentation processing is conducted on candidate answers in the preset data set to obtain a second word set; constructing a basic question-answering model, wherein the basic question-answering model comprises a left coding layer, a right coding layer, a first attention layer, an aggregation layer and a second attention layer, the first word set is input to the left coding layer of the basic question-answering model, a first coding vector is obtained through coding, the second word set is input to the right coding layer of the basic question-answering model, a second coding vector is obtained through coding, and words can be accurately expressed through the coding layers; then, performing attention calculation on the first coding vector based on a first attention layer of the basic question-answering model to obtain a question coding vector corresponding to the training question, and performing calculation on the second coding vector based on an aggregation function in an aggregation layer of the basic question-answering model to obtain a candidate answer characterization vector corresponding to the candidate answer, wherein the calculated vector can contain more semantic information through the attention calculation; then, performing attention calculation on the question coding vector and the candidate answer characterization vector according to a second attention layer of the basic question-answering model to obtain a question characterization vector, calculating the similarity of the question characterization vector and the candidate answer characterization vector, and training the basic question-answering model based on the similarity to obtain a target question-answering model; when the to-be-processed problem is received, the to-be-processed problem and the candidate answer are input into the target question-answering model, and a target recall statement corresponding to the to-be-processed problem is obtained, so that efficient recall of questions and answers is realized, the accuracy of the questions and answers is ensured, the duration of the questions and answers is reduced, the efficiency of the questions and answers is improved, and intelligent questions and answers of a machine are further realized.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of an intelligent question-and-answer recall method according to the present application;

FIG. 3 is a schematic diagram of one embodiment of an intelligent question-and-answer recall apparatus according to the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device in accordance with the present application.

Reference numerals: intelligent question-and-answer recall apparatus 300, word segmentation module 301, encoding module 302, computing module 303, training module 304, and recall module 305.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture ExpertsGroup Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving PictureExperts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the intelligent question-answer recall method provided by the embodiment of the application is generally executed by a server/terminal device, and accordingly, the intelligent question-answer recall device is generally arranged in the server/terminal device.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow chart of one embodiment of a method of intelligent question-and-answer recall according to the present application is shown. The intelligent question-answer recall method comprises the following steps:

Step S201, a preset data set is obtained, word segmentation processing is carried out on training questions in the preset data set to obtain a first word set, word segmentation processing is carried out on candidate answers in the preset data set to obtain a second word set;

in this embodiment, the preset data set is a data set formed by combining data acquired in advance, where the preset data set includes training questions, candidate answers and tag data, and the training questions X, the candidate answers Y and the tag data L form a set of ternary data, which is denoted by (X, Y, L); the preset data set comprises a plurality of groups of ternary data. When a preset data set is obtained, word segmentation processing is carried out on training problems of each group of ternary data in the preset data set by taking groups as units, and a first word set corresponding to each group of ternary data is obtained; and carrying out word segmentation processing on candidate answers corresponding to the training questions in each group of ternary data to obtain a second word set corresponding to each group of ternary data.

Step S202, a basic question-answering model is constructed, wherein the basic question-answering model comprises a left coding layer, a right coding layer, a first attention layer, an aggregation layer and a second attention layer, the first word set is input to the left coding layer of the basic question-answering model, a first coding vector is obtained through coding, and the second word set is input to the right coding layer of the basic question-answering model, and a second coding vector is obtained through coding;

In this embodiment, the basic question-answering model is an initially set question-answering model, including a left coding layer, a right coding layer, a first attention layer, an aggregation layer, and a second attention layer, where parameters of each layer are initial parameters, and the initial parameters may be random parameters or default parameters. When a first word set is obtained, the first word set is input to a left coding layer of the basic question-answering model, wherein the left coding layer and the right coding layer have the same structure, the coding structure of a bert (pre-training language model) is adopted, and words in the first word set are coded according to the left coding layer, so that a first coding vector is obtained; and encoding words in the second word set according to the right encoding layer to obtain a second encoding vector.

Step S203, performing attention calculation on the first coding vector based on a first attention layer of the basic question-answering model to obtain a question coding vector corresponding to the training question, and performing calculation on the second coding vector based on an aggregation function in an aggregation layer of the basic question-answering model to obtain a candidate answer characterization vector corresponding to the candidate answer;

in this embodiment, the first attention layer is to pay attention to a first code vector corresponding to a training problem, so as to obtain a problem code vector corresponding to the training problem. Specifically, three different preset weight parameters are obtained, and the preset weight parameters are multiplied by the first coding vector respectively to obtain three corresponding different weight parameters, such as a first weight parameter Q, a second weight parameter K and a third weight parameter V; calculating the dot product between the first weight parameter Q and the second weight parameter K to obtain a dot product result, and dividing the dot product result by a preset value to obtain an output result; then, normalizing the output result, multiplying the normalized result by the third weight parameter, and finally obtaining a subcode vector of the current attention head; and calculating the sub-coding vector of each attention head, and splicing all the sub-coding vectors to obtain the final problem coding vector. In addition, a vector with preset number and dimension can be selected from the first coding vector to be used as an enumeration vector, and the enumeration vector is used as a first weight parameter Q corresponding to the training problem; the second weight parameter K and the third weight parameter V are still obtained through calculation of different preset weight parameters and the first coding vector. And meanwhile, aggregating second coding vectors of the candidate answers corresponding to the training questions based on an aggregation function (such as an Agg function) in an aggregation layer of the basic question-answering model to obtain candidate answer characterization vectors corresponding to the candidate answers.

Step S204, performing attention calculation on the question coding vector and the candidate answer characterization vector according to a second attention layer of the basic question-answering model to obtain a question characterization vector, calculating the similarity of the question characterization vector and the candidate answer characterization vector, and training the basic question-answering model based on the similarity to obtain a target question-answering model;

in this embodiment, when the question coding vector and the candidate answer token vector are obtained, the candidate answer token vector is used as the first information parameter q for performing attention calculation on the second attention layer, and the weight parameters obtained by calculating the question coding vector and two different preset weight parameters are used as the second information parameter k and the third information parameter v. Calculating the dot product between the first information parameter q and the second information parameter k to obtain a dot product result, dividing the dot product result by a preset value to obtain an output result, and enabling the obtained output result to have a more stable gradient through the preset value; then, normalizing the output result, multiplying the normalized result by the third information parameter v, and finally obtaining a subcode vector of the current attention head; calculating the subcode vector of each attention head, and splicing all the subcode vectors to obtain the final candidate answer characterization vector. Calculating the similarity of the question characterization vector and the candidate answer characterization vector, and training the basic training model according to the similarity, wherein the similarity of the question characterization vector and the candidate answer characterization vector can be obtained by calculating the cosine included angle of the question characterization vector and the candidate answer characterization vector.

And training the basic question-answering model according to the similarity of the question characterization vector and the candidate answer characterization vector. Specifically, a preset loss function is obtained, and the loss function can be a cross entropy function, a mean square error loss function, an average absolute error loss function and the like. Calculating a loss value through the loss function according to the label data and the similarity of training data and candidate answers in a preset data set; and adjusting all initial parameters of the basic question-answering model according to the loss value to obtain an adjusted basic question-answering model, then inputting data in a preset data set into the adjusted basic question-answering model, calculating the loss value of the adjusted basic question-answering model according to the mode of calculating the loss value of the basic question-answering model, and repeating the steps to calculate the loss value of the basic question-answering model after each adjustment until the loss value is the minimum value. When the loss value is the minimum value, determining the model with the minimum loss value as a target question-answering model. The structure of the target question-answering model is the same as that of the basic question-answering model, and various parameters are different.

Step S205, when the to-be-processed question is received, inputting the to-be-processed question and the candidate answer into the target question-answering model to obtain a target recall sentence corresponding to the to-be-processed question.

In this embodiment, when a to-be-processed question is received, the matching degree of the to-be-processed question and the candidate answer is calculated according to the target question-answer model, and the candidate answer with the highest matching degree is selected as a target recall sentence of the to-be-processed question.

It is emphasized that, to further ensure the privacy and security of the target recall statement, the target recall statement may also be stored in a node of a blockchain.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The embodiment realizes the efficient recall of the questions and the answers, reduces the duration of the questions and the answers recall while ensuring the accuracy of the questions and the answers, improves the efficiency of the questions and the answers, and further realizes the intelligent questions and the answers of the machine.

In some optional implementations of this embodiment, the step of performing attention calculation on the first code vector based on the first attention layer of the basic question-answering model to obtain a question code vector corresponding to the training question, and performing calculation on the second code vector based on an aggregation function in an aggregation layer of the basic question-answering model to obtain a candidate answer characterization vector corresponding to the candidate answer includes:

In this embodiment, the enumeration vector is a vector selected from the first coding vectors, and when the first coding vectors are obtained, a preset number and dimension of vectors can be randomly selected from the first coding vectors to be used as the enumeration vectors; the dimension and number of the first coding vector can be calculated, the enumeration number of the enumeration vector is trained through training parameters, the final enumeration number is obtained, and the enumeration number is used as the number of the enumeration vectors selected from the first coding vector each time. When an enumeration vector is obtained, the enumeration vector is used as a first weight parameter Q of a training problem, similarity calculation is carried out according to the first weight parameter Q and a second weight parameter K to obtain a similarity result, and weighted summation calculation is carried out on a third weight parameter V according to the similarity result to obtain a problem coding vector corresponding to the training problem. The second weight parameter K and the third weight parameter V are calculated by the first coding vector and different preset weight parameters.

Because the basic question-answering model in this embodiment has a structure of a double-tower structure, that is, the candidate answers can be processed while the training questions are processed. Therefore, the attention calculation is carried out on the enumeration vector and the first coding vector, the problem coding vector corresponding to the training problem is obtained, a preset aggregation function is obtained, the second coding vector corresponding to the candidate answer is aggregated according to the aggregation function, the vector number of the second coding vector is aggregated into a vector, and the candidate answer representation corresponding to the second coding vector is obtained.

In the embodiment, a problem code vector is obtained by calculating a first code vector and an enumeration vector; and calculating the second coding vector through the aggregation function to obtain a candidate answer representation, so that respective semantic representations of the training questions and the candidate answers are realized, the obtained question coding vector and the candidate answer representation vector can more accurately express semantic information of the training questions and the candidate answers, and the accuracy and the efficiency of question-answer recall are further improved.

In some optional implementations of this embodiment, the step of obtaining an enumeration vector corresponding to the first coding vector includes:

Acquiring a random initial value and the dimension of the first coding vector;

In this embodiment, when an enumeration vector corresponding to a first coding vector is calculated, a random initial value and a dimension of the first coding vector are obtained, where the random initial value is a randomly selected initial value, and the dimension of the first coding vector is a vector dimension of the first coding vector. And taking the random initial value as the total vector number of the enumeration vectors, sequentially selecting the vector with the random initial value number from small to large or from large to small from the first coding vector as the enumeration vector, and taking the dimension of the first coding vector as the vector dimension of the enumeration vector.

According to the embodiment, the enumeration vector corresponding to the first coding vector is selected, so that the first coding vector can be accurately subjected to attention calculation through the enumeration vector, and the problem coding vector obtained through the attention calculation can more accurately express semantic information of a training problem.

In some optional implementations of this embodiment, the step of constructing the basic question-answer model includes:

In this embodiment, when the basic question-answering model is constructed, a pre-trained pre-coding layer is obtained, and the pre-coding layer is used as a left coding layer and a right coding layer of the basic question-answering model, where the left coding layer and the right coding layer are parallel coding layers, and can process received data at the same time. For example, when the left encoding layer encodes a training question, the right encoding layer may encode candidate answers simultaneously. Then, acquiring a preset first attention function and an activation function, taking the first attention function as a first attention layer, and taking the activation function as an aggregation layer; a second encoded vector is calculated by the aggregation layer while a first encoded vector is calculated by the first attention layer. Finally, a second attention function is obtained, and the second attention function is used as a second attention layer, wherein the second attention function and the first attention function can be the same function or different functions. And calculating the question coding vector calculated by the first attention layer and the candidate answer characterization vector calculated by the aggregation layer through the second attention layer to obtain a question characterization vector. And combining the first basic coding layer, the second basic coding layer, the first attention layer, the aggregation layer and the second attention layer to obtain the basic question-answering model.

According to the method, the basic question-answering model is built, so that the target question-answering model obtained by training the question and the candidate answers to the basic question-answering model can accurately recall and calculate the question, and the question-answering recall efficiency and accuracy are improved.

In some optional implementations of this embodiment, the step of obtaining the pre-trained pre-encoded layer includes:

In this embodiment, the pre-coding layer is a pre-trained coding layer that may be obtained by training a basic pre-training language model. Specifically, an online corpus (such as a reddit) and a basic pre-training language model (bert model) are obtained, and the basic pre-training language model is trained according to the online corpus; and when the training of the basic pre-training language model is completed, determining the basic pre-training language model after the training is completed as a target pre-training language model. The method comprises the steps of obtaining a coding layer of a target pre-training language model, taking the coding layer of the target pre-training language model as a pre-coding layer of a basic question-answering model, wherein the coding layer of the target pre-training language model is the pre-coding layer after pre-training.

According to the method and the device, the basic pre-training language model is trained through the online corpus to obtain the pre-coding layer, so that the received training questions and candidate answers can be coded more accurately through the pre-coding layer, further, the coding result can represent the training questions and the candidate answers more accurately, and therefore accuracy of question-answer recall is improved.

In some optional implementations of this embodiment, the step of training the basic question-answering model based on the similarity to obtain the target question-answering model includes:

In this embodiment, when the similarity between the question feature vector and the candidate answer feature vector is obtained by calculation, a preset training set is obtained, the preset training set includes matching degree labels of the training questions and the candidate answers, and the matching degree labels are extracted from the preset training set. And calculating a loss function of the basic question-answering model according to the matching degree label and the similarity, and determining that the basic question-answering model is trained when the loss function converges, wherein the trained basic question-answering model is the target question-answering model.

According to the method, the basic question-answering model is trained, so that the trained basic question-answering model can accurately recall questions and answers.

In some optional implementations of this embodiment, the step of inputting the to-be-processed question and the candidate answer into the target question-answering model to obtain a target recall sentence corresponding to the to-be-processed question includes:

In this embodiment, the to-be-processed problem is a received problem that needs to be recalled, and when the to-be-processed problem is received, word segmentation processing is performed on the to-be-processed problem, so as to obtain a word set composed of a plurality of words corresponding to the to-be-processed problem. And inputting the word set to a target coding layer (namely a left coding layer) of the target question-answering model, and coding the word set according to the target coding layer to obtain a third coding vector. Selecting a plurality of vectors from the third coding vector as problem enumeration vectors corresponding to the problem to be processed, wherein the selection mode of the problem enumeration vectors is the same as that of enumeration vectors corresponding to the first coding vector; and performing attention calculation on the third coding vector and the problem enumeration vector according to a first attention layer in the target question-answering model to obtain a coding vector to be processed. And acquiring the stored candidate answer characterization vectors corresponding to all the candidate answers, or inputting the candidate answers and the questions to be processed into a target question-answering model at the same time, and calculating the candidate answers based on a right coding layer and an aggregation layer of the target question-answering model to obtain the candidate answer characterization vectors. Calculating the coding vector to be processed and the candidate answer characterization vector according to a second attention layer in the target question-answering model to obtain a characterization vector to be processed of the question to be processed; and calculating the target similarity of the to-be-processed characterization vector and the candidate answer characterization vector, wherein the candidate answer corresponding to the maximum target similarity is the target recall sentence of the to-be-processed question.

According to the method and the device for processing the questions, the questions to be processed are calculated through the target question-answering model, accurate recall of the questions to be processed is achieved, and efficiency of recall of the questions is improved.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an intelligent question-and-answer recall apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 3, the intelligent question-and-answer recall apparatus 300 according to the present embodiment includes: word segmentation module 301, encoding module 302, calculation module 303, training module 304, and recall module 305. Wherein:

the word segmentation module 301 is configured to obtain a preset data set, perform word segmentation on training questions in the preset data set to obtain a first word set, and perform word segmentation on candidate answers in the preset data set to obtain a second word set;

The coding module 302 is configured to construct a basic question-answering model, where the basic question-answering model includes a left coding layer, a right coding layer, a first attention layer, an aggregation layer, and a second attention layer, input the first word set to the left coding layer of the basic question-answering model, code to obtain a first coding vector, and input the second word set to the right coding layer of the basic question-answering model, and code to obtain a second coding vector;

in some alternative implementations of the present embodiment, the encoding module 302 includes:

the first acquisition unit is used for acquiring a pre-trained pre-coding layer, and taking the pre-coding layer as a left coding layer and a right coding layer of the basic question-answering model;

the construction unit is used for acquiring a preset first attention function and an activation function, constructing a first attention layer according to the first attention function and constructing an aggregation layer according to the activation function;

and the combination unit is used for acquiring a second attention function, constructing a second attention layer according to the second attention function, and combining the left coding layer, the right coding layer, the first attention layer, the aggregation layer and the second attention layer to form the basic question-answering model.

In some optional implementations of the present embodiment, the first obtaining unit includes:

a training subunit, configured to obtain an online corpus and a basic pre-training language model, train the basic pre-training language model according to the online corpus to obtain a target pre-training language model, and use a coding layer of the target pre-training language model as a pre-coding layer after the pre-training is completed

The calculation module 303 is configured to perform attention calculation on the first code vector based on a first attention layer of the basic question-answering model to obtain a question code vector corresponding to the training question, and calculate the second code vector based on an aggregation function in an aggregation layer of the basic question-answering model to obtain a candidate answer characterization vector corresponding to the candidate answer;

In some alternative implementations of the present embodiment, the computing module 303 includes:

the second acquisition unit is used for acquiring an enumeration vector corresponding to the first coding vector, and performing attention calculation according to the first coding vector and the enumeration vector to obtain the problem coding vector;

the aggregation unit is used for acquiring a preset aggregation function, and aggregating the second coding vector according to the aggregation function to obtain candidate answer characterizations of the second coding vector.

In some optional implementations of the present embodiment, the second obtaining unit includes:

an acquisition subunit, configured to acquire a random initial value and a dimension of the first encoding vector;

the confirming unit is used for selecting the vectors with the random initial value numbers from the first coding vectors as the enumeration vectors, and taking the dimension of the first coding vectors as the vector dimension of the enumeration vectors.

In this embodiment, the first attention layer is to pay attention to a first code vector corresponding to a training problem, so as to obtain a problem code vector corresponding to the training problem. Specifically, three different preset weight parameters are obtained, and the preset weight parameters are multiplied by the first coding vector respectively to obtain three corresponding different weight parameters, such as a first weight parameter Q, a second weight parameter K and a third weight parameter V; calculating the point multiplication between the first weight parameter Q and the second weight parameter K to obtain a point multiplication result, dividing the point multiplication result by a preset value to obtain an output result, normalizing the output result, and multiplying the normalized result by the third weight parameter to finally obtain a subcode vector of the current attention head; and calculating the sub-coding vector of each attention head, and splicing all the sub-coding vectors to obtain the final problem coding vector. In addition, a vector with preset number and dimension can be selected from the first coding vector to be used as an enumeration vector, and the enumeration vector is used as a first weight parameter Q corresponding to the training problem; the second weight parameter K and the third weight parameter V are still obtained through calculation of different preset weight parameters and the first coding vector. And meanwhile, aggregating second coding vectors of the candidate answers corresponding to the training questions based on an aggregation function (such as an Agg function) in an aggregation layer of the basic question-answering model to obtain candidate answer characterization vectors corresponding to the candidate answers.

The training module 304 is configured to perform attention computation on the question coding vector and the candidate answer token vector according to a second attention layer of the basic question-answering model to obtain a question token vector, calculate similarity between the question token vector and the candidate answer token vector, and train the basic question-answering model based on the similarity to obtain a target question-answering model;

in some alternative implementations of the present embodiment, the training module 304 includes:

the third acquisition unit is used for acquiring the matching degree label of the training question and the candidate answer according to the preset training set;

and the first calculation unit is used for calculating a loss function of the basic question-answering model according to the matching degree label and the similarity, and determining that the basic question-answering model is trained when the loss function converges to obtain the target question-answering model.

And the recall module 305 is configured to input the question to be processed and the candidate answer into the target question-answer model when the question to be processed is received, so as to obtain a target recall sentence corresponding to the question to be processed.

In some alternative implementations of the present embodiment, recall module 305 includes:

the splitting unit is used for splitting the to-be-processed problem into a word set composed of a plurality of words, and encoding the word set according to a target encoding layer of the target question-answering model to obtain a third encoding vector;

the second calculation unit is used for selecting a problem enumeration vector of the third coding vector, and performing attention calculation on the third coding vector and the problem enumeration vector according to a first attention layer in the target question-answering model to obtain a coding vector to be processed;

the third calculation unit is used for obtaining all stored candidate answer characterization vectors, and calculating the to-be-processed coding vector and the candidate answer characterization vectors according to a second attention layer in the target question-answering model to obtain to-be-processed characterization vectors of the to-be-processed questions;

and the fourth calculation unit is used for calculating the target similarity of the to-be-processed characterization vector and the candidate answer characterization vector, and determining that the candidate answer corresponding to the maximum target similarity is the target recall statement of the to-be-processed problem.

The intelligent question-answer recall device provided by the embodiment realizes the efficient recall of questions and answers, so that the accuracy of the questions and answers is ensured, the time length of the questions and answers recall is reduced, the efficiency of the questions and answers recall is improved, and the intelligent questions and answers of the machine are further realized.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only computer device 6 having components 61-63 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 61 includes at least one type of readable storage media including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal memory unit of the computer device 6 and an external memory device. In this embodiment, the memory 61 is typically used to store an operating system and various application software installed on the computer device 6, such as computer readable instructions of an intelligent question-and-answer recall method. Further, the memory 61 may be used to temporarily store various types of data that have been output or are to be output.

The processor 62 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to execute computer readable instructions stored in the memory 61 or process data, such as computer readable instructions for executing the intelligent question-answer recall method.

The network interface 63 may comprise a wireless network interface or a wired network interface, which network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.

The computer equipment provided by the embodiment realizes the efficient recall of the questions and the answers, reduces the duration of the questions and the answers recall while ensuring the accuracy of the questions and the answers, improves the efficiency of the questions and the answers, and further realizes the intelligent questions and the answers of the machine.

The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the intelligent question-and-answer recall method as described above.

The computer readable storage medium provided by the embodiment realizes the efficient recall of the questions and the answers, so that the accuracy rate of the questions and the answers is ensured, the duration of the questions and the answers is reduced, the efficiency of the questions and the answers is improved, and the intelligent questions and the answers of the machine are further realized.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims

1. An intelligent question-answer recall method is characterized by comprising the following steps:

When a to-be-processed question is received, inputting the to-be-processed question and the candidate answer into the target question-answering model to obtain a target recall sentence corresponding to the to-be-processed question;

the step of calculating the second coding vector based on the aggregation function in the aggregation layer of the basic question-answering model to obtain the candidate answer characterization vector corresponding to the candidate answer comprises the following steps:

acquiring a random initial value and the dimension of the first coding vector;

selecting the vectors with the random initial value numbers from the first coding vectors as enumeration vectors, and taking the dimension of the first coding vectors as the vector dimension of the enumeration vectors;

determining the enumeration vector as a first weight parameter corresponding to the training problem, and multiplying the first coding vector by different preset weight parameters to obtain a second weight parameter and a third weight parameter;

calculating the dot product of the first weight parameter and the second weight parameter to obtain a dot product result;

Dividing the dot multiplication result by a preset value to obtain an output result, and normalizing the output result to obtain a normalized result;

multiplying the normalization result with the third weight parameter to obtain a subcode vector of the current attention head;

calculating sub-coding vectors of each attention head, and splicing all the sub-coding vectors to obtain a problem coding vector;

acquiring a preset aggregation function, and aggregating the second coding vector according to the aggregation function to obtain a candidate answer characterization vector of the second coding vector.

2. The intelligent question-answering method according to claim 1, wherein the step of constructing a basic question-answering model includes:

3. The intelligent question-and-answer recall method of claim 2 wherein the step of obtaining a pre-trained pre-encoded layer comprises:

4. The intelligent question-answering method according to claim 1, wherein the step of training the basic question-answering model based on the similarity to obtain a target question-answering model includes:

5. The intelligent question-answering method according to claim 1, wherein the step of inputting the question to be processed and the candidate answer into the target question-answering model to obtain a target recall sentence corresponding to the question to be processed includes:

6. An intelligent question-answering recall device, comprising:

The recall module is used for inputting the to-be-processed problem and the candidate answer into the target question-answer model when the to-be-processed problem is received, so as to obtain a target recall sentence corresponding to the to-be-processed problem;

the calculation module is also used for acquiring a random initial value and the dimension of the first coding vector; selecting the vectors with the random initial value numbers from the first coding vectors as enumeration vectors, and taking the dimension of the first coding vectors as the vector dimension of the enumeration vectors; determining the enumeration vector as a first weight parameter corresponding to the training problem, and multiplying the first coding vector by different preset weight parameters to obtain a second weight parameter and a third weight parameter; calculating the dot product of the first weight parameter and the second weight parameter to obtain a dot product result; dividing the dot multiplication result by a preset value to obtain an output result, and normalizing the output result to obtain a normalized result; multiplying the normalization result with the third weight parameter to obtain a subcode vector of the current attention head; calculating sub-coding vectors of each attention head, and splicing all the sub-coding vectors to obtain a problem coding vector; acquiring a preset aggregation function, and aggregating the second coding vector according to the aggregation function to obtain a candidate answer characterization vector of the second coding vector.

7. A computer device comprising a memory having stored therein computer readable instructions which when executed by the processor implement the steps of the intelligent question-and-answer recall method of any one of claims 1 to 5.

8. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the intelligent question-answer recall method of any one of claims 1 to 5.