CN110825857B

CN110825857B - Multi-round question and answer identification method and device, computer equipment and storage medium

Info

Publication number: CN110825857B
Application number: CN201910906819.7A
Authority: CN
Inventors: 邓悦; 金戈; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2023-07-21
Anticipated expiration: 2039-09-24
Also published as: WO2021056710A1; CN110825857A

Abstract

The invention relates to the technical field of artificial intelligence, and provides a multi-round question-answer identification method, a device, computer equipment and a storage medium, wherein the multi-round question-answer identification method comprises the following steps: the acquired user history questions, user history answers and user current questions are imported into a pre-trained target multi-round question-answering model; vector feature conversion processing is carried out by utilizing a coding unit in the target multi-round question-answering model, and a first vector feature, a second vector feature and a third vector feature are obtained; introducing the first vector feature, the second vector feature and the third vector feature into a long-term and short-term memory unit for semantic feature extraction to obtain target semantic features; and importing the target semantic features into the full-connection unit for similarity calculation, and outputting the recognition result with the maximum similarity. The technical scheme of the invention improves the accuracy and efficiency of the user to acquire the information according to the target multi-round question-answering model.

Description

Multi-round question and answer identification method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for identifying multiple rounds of questions and answers, a computer device, and a storage medium.

Background

The traditional multi-round question-answering model mainly directly splices dialogue information of the previous rounds and regards the dialogue information as a sentence as input, and because the relation between sentences is not considered, only semantic information of a word level can be learned but semantic information of a grammar level or a sentence level can not be learned, so that the semantic information which can be expressed by the model is incomplete, the recognition accuracy of the multi-round question-answering model is low, and the accuracy and the efficiency of a user for acquiring information according to the multi-round question-answering model are further affected.

Disclosure of Invention

The embodiment of the invention provides a multi-round question-answering recognition method, a multi-round question-answering recognition device, computer equipment and a storage medium, which are used for solving the problems that the accuracy of the traditional multi-round question-answering model recognition is not high and the accuracy and the efficiency of a user for acquiring information according to the multi-round question-answering model are affected.

A multi-round question and answer recognition method comprises the following steps:

acquiring a user history question, a user history answer and a user current question from a user database;

importing the user history questions, the user history answers and the user current questions into a pre-trained target multi-round question-answering model, wherein the target multi-round question-answering model comprises a coding unit, a long-short-term memory unit and a full-connection unit;

Performing vector feature conversion processing on the user history questions, the user history answers and the user current questions through the coding unit to obtain first vector features corresponding to the user history questions, second vector features corresponding to the user history answers and third vector features corresponding to the user current questions;

importing the first vector feature, the second vector feature and the third vector feature into the long-term and short-term memory unit for semantic feature extraction to obtain target semantic features;

and importing the target semantic features into the fully-connected unit to perform similarity calculation, and outputting a recognition result with the maximum similarity.

A multi-round question and answer identification device comprising:

the first acquisition module is used for acquiring user history questions, user history answers and user current questions from a user database;

the importing module is used for importing the user history questions, the user history answers and the user current questions into a pre-trained target multi-round question-answering model, wherein the target multi-round question-answering model comprises a coding unit, a long-short-term memory unit and a full-connection unit;

The conversion module is used for carrying out vector feature conversion processing on the user history questions, the user history answers and the user current questions through the coding unit to obtain first vector features corresponding to the user history questions, second vector features corresponding to the user history answers and third vector features corresponding to the user current questions;

the extraction module is used for guiding the first vector feature, the second vector feature and the third vector feature into the long-term and short-term memory unit to extract semantic features, so as to obtain target semantic features;

and the output module is used for importing the target semantic features into the full-connection unit to perform similarity calculation and outputting an identification result with the maximum similarity.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the multi-round question-answer identification method described above when the computer program is executed.

A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the multi-round question-answer identification method described above.

In the above multi-round question-answering identification method, device, computer equipment and storage medium, in this embodiment, the obtained user history questions, user history answers and user current questions are imported into a pre-trained target multi-round question-answering model, vector feature conversion processing is performed by using a coding unit in the target multi-round question-answering model to obtain a first vector feature, a second vector feature and a third vector feature, semantic feature extraction is performed on the first vector feature, the second vector feature and the third vector feature according to a long-short term memory unit to obtain target semantic features, similarity calculation is performed on the target semantic features through a full-connection unit, and an identification result with maximum similarity is output. The recognition result corresponding to the user current problem can be rapidly and accurately judged according to the user history problem, the user history answer and the user current problem by utilizing the pre-trained target multi-round question-answering model, semantic feature extraction is performed by utilizing the long-short-period memory unit in the pre-trained target multi-round question-answering model, information interaction among the user history problem, the user history answer and the user current problem can be enhanced, the accuracy of recognition of the target multi-round question-answering model is higher, and the accuracy and the efficiency of information acquisition by a user according to the target multi-round question-answering model are further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a multi-round question and answer identification method provided by an embodiment of the invention;

FIG. 2 is a flowchart for training a target multi-round question-answering model in the multi-round question-answering recognition method provided by the embodiment of the present invention;

fig. 3 is a flowchart of step S2 in the multi-round question-answering recognition method according to the embodiment of the present invention;

fig. 4 is a flowchart of step S21 in the multi-round question and answer recognition method provided by the embodiment of the present invention;

fig. 5 is a flowchart of step S3 in the multi-round question-answering recognition method according to the embodiment of the present invention;

fig. 6 is a flowchart of step S6 in the multi-round question-answering recognition method according to the embodiment of the present invention;

fig. 7 is a schematic diagram of a multi-round question and answer recognition device according to an embodiment of the present invention;

fig. 8 is a block diagram of the basic mechanism of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The multi-round question and answer identification method is applied to a server, and the server can be realized by an independent server or a server cluster formed by a plurality of servers. In one embodiment, as shown in fig. 1, a multi-round question-answer recognition method is provided, which includes the following steps:

s101: user history questions, user history answers, and user current questions are obtained from a user database.

In the embodiment of the invention, the user history questions, the user history answers and the user current questions are obtained directly from the user database, wherein the user database is a database specially used for storing the user history questions, the user history answers and the user current questions.

S102: and importing the user history questions, the user history answers and the user current questions into a pre-trained target multi-round question-answering model, wherein the target multi-round question-answering model comprises a coding unit, a long-short-term memory unit and a full-connection unit.

In the embodiment of the invention, the pre-trained target multi-round question-answering model is a neural network model which is obtained by training the neural network model according to training data set by a user, and can quickly identify the current answer of the user corresponding to the current question of the user aiming at the current question of the user after the multi-round question-answering under the condition of multi-round question-answering.

Specifically, the user history questions, the user history answers and the user current questions obtained in the step S101 are directly imported into a pre-trained target multi-round question-answering model.

S103: and carrying out vector feature conversion processing on the user history questions, the user history answers and the user current questions through the coding unit to obtain first vector features corresponding to the user history questions, second vector features corresponding to the user history answers and third vector features corresponding to the user current questions.

In the embodiment of the invention, a vector conversion port for carrying out vector feature conversion processing on the user history questions, the user history answers and the user current questions is arranged in the coding unit, and the first vector features corresponding to the user history questions, the second vector features corresponding to the user history answers and the third vector features corresponding to the user current questions are obtained by directly leading the user history questions, the user history answers and the user current questions into the vector conversion port in the coding unit for vector feature conversion processing.

S104: and importing the first vector feature, the second vector feature and the third vector feature into a long-term and short-term memory unit to extract semantic features, so as to obtain target semantic features.

Specifically, a semantic feature port for extracting semantic features of the first vector feature, the second vector feature and the third vector feature exists in the long-short-term memory unit, and the target semantic feature is obtained by directly introducing the first vector feature, the second vector feature and the third vector feature together into the semantic feature port in the long-short-term memory unit for semantic feature extraction.

S105: and importing the target semantic features into the full-connection unit for similarity calculation, and outputting the recognition result with the maximum similarity.

Specifically, the full-connection unit comprises a preset classifier, the target semantic features are imported into the full-connection unit, when the full-connection unit receives the target semantic features, the similarity calculation is carried out on the target semantic features by using the preset classifier, and the recognition result with the maximum similarity is output, namely the recognition result is an answer corresponding to the current problem of the user. Wherein, the classifier is specially used for carrying out similarity calculation.

In this embodiment, the obtained user history questions, user history answers and user current questions are imported into a pre-trained target multi-round question-answering model, vector feature conversion processing is performed by using a coding unit in the target multi-round question-answering model to obtain a first vector feature, a second vector feature and a third vector feature, semantic feature extraction is performed on the first vector feature, the second vector feature and the third vector feature according to a long-short-term memory unit to obtain target semantic features, similarity calculation is performed on the target semantic features through a full-connection unit, and a recognition result with the maximum similarity is output. The recognition result corresponding to the user current problem can be rapidly and accurately judged according to the user history problem, the user history answer and the user current problem by utilizing the pre-trained target multi-round question-answering model, semantic feature extraction is performed by utilizing the long-short-period memory unit in the pre-trained target multi-round question-answering model, information interaction among the user history problem, the user history answer and the user current problem can be enhanced, the accuracy of recognition of the target multi-round question-answering model is higher, and the accuracy and the efficiency of information acquisition by a user according to the target multi-round question-answering model are further improved.

In one embodiment, as shown in fig. 2, before step S101, the multi-round question-answer identification method further includes the following steps:

s1: and acquiring the historical questions, the historical answers and the current questions from a preset sample library to serve as positive samples, and acquiring the current answers to serve as negative samples.

In the embodiment of the invention, by detecting the label information in the preset sample library, when the label information is detected to be the first label, the second label and the third label respectively, the history question corresponding to the first label is acquired, the history answer corresponding to the second label is acquired, the current question corresponding to the third label is acquired, and the history question, the history answer and the current question are all determined to be positive samples; when the label information is detected to be the label four, acquiring a current answer corresponding to the label four, and determining the current answer as a negative sample.

The preset sample library is a database specially used for storing different tag information and data information corresponding to the tag information, the tag information comprises a tag I, a tag II, a tag III and a tag IV, the data information comprises a historical question, a historical answer, a current question and a current answer, the data information corresponding to the tag I is the historical question, the data information corresponding to the tag II is the historical answer, the data information corresponding to the tag III is the current question, and the data information corresponding to the tag IV is the current answer.

It should be noted that, there is a mapping relationship between the historical questions and the historical answers, that is, each historical question has its corresponding historical answer, and there are at least 5 historical questions and historical answers.

S2: and respectively introducing the positive samples and the negative samples into a coding layer in an initial multi-round question-answering model to perform vector feature conversion processing to obtain positive vector features corresponding to the positive samples and negative vector features corresponding to the negative samples, wherein the initial multi-round question-answering model comprises the coding layer, a long-short-term memory network and a convolution network.

In the embodiment of the invention, an initial multi-round question-answering model comprises a coding layer, a long-term and short-term memory network and a convolution network, wherein a conversion database for carrying out vector feature conversion processing on positive samples and negative samples exists in the coding layer, and positive vector features corresponding to the positive samples and negative vector features corresponding to the negative samples after vector feature conversion processing are obtained by respectively leading the positive samples and the negative samples into the conversion database in the coding layer to carry out vector feature conversion processing according to the positive samples and the negative samples obtained in the step S1.

S3: and extracting semantic features from the positive vector features and the negative vector features through the long-short term memory network to obtain first semantic features corresponding to the positive vector features and second semantic features corresponding to the negative vector features.

In the embodiment of the invention, a semantic feature library for extracting semantic features of positive vector features and negative vector features exists in a long-term and short-term memory network, and the positive vector features and the negative vector features are respectively imported into the semantic feature library for extracting the semantic features according to the positive vector features and the negative vector features obtained in the step S2, so that first semantic features corresponding to the positive vector features and second semantic features corresponding to the negative vector features after the semantic features are extracted are obtained.

The Long Short-Term Memory (LSTM) is a time-circulating neural network, which is specially designed for solving the Long-Term dependence problem of the common circulating neural network, and all the circulating neural networks have a chained form of repeated neural network modules.

S4: and inquiring standard questions matched with the first semantic features from a preset standard library, and acquiring standard answer vectors corresponding to the standard questions.

Specifically, according to the first semantic feature obtained in step S3, a legal semantic feature identical to the first semantic feature is queried from a preset standard library, when the legal semantic feature identical to the first semantic feature is queried, a legal question corresponding to the legal semantic feature is obtained as a standard question, and a standard answer vector corresponding to a target legal question identical to the standard question is extracted from a preset vector library.

The preset standard library is a database specially used for storing different legal semantic features and legal problems corresponding to the legal semantic features, and legal semantic features identical to the first semantic features are preset in the preset standard library.

The preset vector library is a database which is specially used for storing target legal questions identical to legal questions in the preset standard library and standard answer vectors corresponding to the target legal questions.

S5: and importing the second semantic features into a convolution network for convolution processing to obtain a target vector.

In the embodiment of the invention, the convolution network comprises a preset convolution kernel, and the second semantic features obtained in the step S3 are subjected to convolution processing by utilizing the preset convolution kernel in the convolution network to obtain the target vector after the convolution processing. The preset convolution kernel refers to a kernel function which is set according to the actual requirement of a user and used for converting the second semantic features into target vectors.

S6: and carrying out loss calculation according to the standard answer vector and the target vector to obtain a loss value.

Specifically, the standard answer vector and the target vector are imported into a preset loss calculation port to carry out loss calculation processing, and the loss value after the loss calculation processing is output. The preset loss calculation port is a processing port specially used for carrying out loss calculation.

S7: and comparing the loss value with a preset threshold, if the loss value is larger than the preset threshold, iteratively updating the initial multi-round question-answering model until the loss value is smaller than or equal to the preset threshold, and taking the updated initial multi-round question-answering model as a target multi-round question-answering model.

And (3) comparing the loss value obtained in the step (S6) with a preset threshold, if the loss value is larger than the preset threshold, performing iterative updating by utilizing a preset loss function and adjusting initial parameters of each network layer in the initial multi-round question-answering model, and if the loss value is smaller than or equal to the preset threshold, stopping iteration, and determining the initial multi-round question-answering model corresponding to the loss value as a target multi-round question-answering model.

It should be noted that, the initial parameter is only one parameter preset for facilitating the operation of the initial multi-round question-answering model, so that an error necessarily exists between the standard answer vector and the target vector obtained according to the positive and negative samples, the error information needs to be transmitted back to each layer of network structure in the initial multi-round question-answering model layer by layer, and each layer of network structure is enabled to adjust the preset initial parameter, so that the target multi-round question-answering model with better recognition effect can be obtained.

In this embodiment, the historical questions, the historical answers and the current questions are obtained as positive samples, the current answers are obtained as negative samples, the positive samples and the negative samples are subjected to vector feature conversion processing by using a coding layer in an initial multi-round question-answering model to obtain positive vector features and negative vector features, semantic feature extraction is performed on the positive vector features and the negative vector features by using a long-short-term memory network to obtain first semantic features and second semantic features, standard answer vectors corresponding to standard questions matched with the first semantic features are obtained, the second semantic features are imported into a convolution network to be subjected to convolution processing to obtain target vectors, loss values are obtained by carrying out loss calculation on the basis of the standard answer vectors and the target vectors, the loss values are compared with preset thresholds, and if the loss values are larger than the preset thresholds, the initial multi-round question-answering model is subjected to iterative updating until the loss values are smaller than or equal to the preset thresholds, and the target multi-round question-answering model is obtained. By means of the method for extracting the first semantic features and the second semantic features through the long-short-term memory network, information interaction between context information in positive and negative samples can be enhanced, accuracy and training efficiency of model training are effectively improved, accuracy of model training is improved based on the mode that loss values are compared with a preset threshold value, training efficiency and recognition accuracy of a target multi-round question-answering model are further improved, and accuracy and efficiency of acquiring information according to the target multi-round question-answering model by a user are guaranteed.

In one embodiment, as shown in fig. 3, in step S2, a positive sample and a negative sample are respectively introduced into a coding layer in an initial multi-round question-answering model to perform vector feature conversion processing, so as to obtain positive vector features corresponding to the positive sample, and negative vector features corresponding to the negative sample include the following steps:

s21: and performing word segmentation on the positive sample and the negative sample to obtain a first word segmentation result corresponding to the positive sample and a second word segmentation result corresponding to the negative sample.

In the embodiment of the invention, the word segmentation process refers to a process of recombining continuous word sequences into word sequences according to a certain specification, for example, continuous word sequences "ABCD" are obtained by the word segmentation process to obtain "AB" and "CD".

Specifically, according to the positive sample and the negative sample obtained in the step S1, word segmentation processing is performed on both the positive sample and the negative sample by using a mechanical word segmentation method, and a first word segmentation result obtained after word segmentation processing of the positive sample and a second word segmentation result obtained after word segmentation processing of the negative sample are obtained.

The mechanical word segmentation method mainly comprises four methods of forward maximum matching, forward minimum matching, reverse maximum matching and reverse minimum matching. Preferably, the present proposal employs a forward maximum matching algorithm.

It should be noted that, because the positive sample includes the historical questions, the historical answers and the current questions, when the positive sample is subjected to word segmentation, the word segmentation is performed on each historical question, each historical answer and the current question in the positive sample, and the obtained first word segmentation result includes a plurality of word segmentation results, namely includes the word segmentation result corresponding to each historical question, and the word segmentation result corresponding to each historical answer and the word segmentation result corresponding to the current question.

S22: and carrying out vector feature conversion processing on the first word segmentation result and the second word segmentation result by using the coding layer to obtain positive vector features and negative vector features.

In the embodiment of the present invention, according to step S2, since a conversion database for performing vector feature conversion processing on the positive sample and the negative sample exists in the coding layer, the conversion database includes a preset processing library for performing vector feature conversion processing on the first word segmentation result and the second word segmentation result.

Specifically, the first word segmentation result and the second word segmentation result are directly and respectively imported into a preset processing library to be subjected to vector feature conversion processing, so that the first word segmentation result corresponding to the positive vector feature and the second word segmentation result corresponding to the negative vector feature are obtained.

The preset processing library specifically uses a word2vec model to perform vector feature conversion processing on the first word segmentation result and the second word segmentation result.

In the embodiment, positive samples and negative samples can be quickly and accurately converted into a first word segmentation result and a second word segmentation result in a word segmentation processing mode, and then the first word segmentation result and the second word segmentation result are converted into positive vector features and negative vector features, so that the positive vector features and the negative vector features are accurately acquired, and the accuracy of semantic feature extraction by using the positive vector features and the negative vector features is improved.

In an embodiment, each historical question, each historical answer and the current question in the positive sample are respectively used as a corpus, the current answer in the negative sample is used as a corpus, as shown in fig. 4, in step S21, word segmentation is performed on the positive sample and the negative sample to obtain a first word segmentation result corresponding to the positive sample, and a second word segmentation result corresponding to the negative sample includes the following steps:

s211: and setting a character string index value and a maximum length value of the word segmentation according to preset requirements.

In the embodiment of the present invention, the character string index value is specially used for locating the position of the character to start scanning, and if the character string index value is 0, the position of the first character is indicated as the position of the character to start scanning. The maximum length value is a maximum range dedicated to scanning characters, and if the maximum length value is 2, it means that the characters are scanned at most 2, and if the maximum length value is 3, it means that the characters are scanned at most 3.

Specifically, the character string index value and the maximum length value of the word segmentation are set according to preset requirements, wherein the preset requirements can be specifically that the character string index value is set to 0, the maximum length value is set to 2, and specific setting requirements can be set according to actual requirements of users, and the method is not limited.

S212: and extracting target characters from the corpus according to the character string index value and the maximum length value aiming at each corpus in the positive sample and the negative sample.

Specifically, for each corpus in the positive sample and the negative sample, the corpus is scanned according to the character string index value and the maximum length value obtained in step S211 in a scanning manner from left to right, when the character with the maximum length value is scanned, the character from the character at the scanning starting position to the character with the maximum length value is identified as a target character, and the target character is extracted.

For example, the corpus is "Changjiang bridge of Nanjing city", the maximum length value is 3, the initial value of the character string index is 0, the corpus is scanned in a left-to-right manner, that is, the character scanned to the maximum length value is "Nanjing city", the character "Nanjing city" of the maximum length value is identified as the target character, and the target character is extracted.

S213: and matching the target character with legal characters in a preset dictionary library.

Specifically, the target character obtained in step S212 is matched with a legal character in a preset dictionary library. The preset dictionary library is a database specially used for storing legal characters set by a user.

S214: if the matching is successful, the target character is determined to be the target word, the character string index value is updated to be the current character string index value plus the current maximum length value, and the target character is extracted from the corpus based on the updated character string index value and the maximum length value for matching until the word segmentation operation of the corpus is completed.

Specifically, the target character obtained in the step S212 is matched with legal characters in a preset dictionary library, when the target character is matched with the legal characters in the preset dictionary library, the matching is successful, the target character is determined to be a target word, the character string index value is updated to be the character string index value in the current step S212 plus the maximum length value in the current step S212, and the target character is extracted from the corpus to be matched based on the updated character string index value and the maximum length value until the word segmentation operation of the corpus is completed.

For example, as described in the example in step S212, if the target character "nanjing city" matches the character in the preset dictionary, the target character "nanjing city" is confirmed as the target word, the character string index value is updated to the current character string index value 0+the current maximum length value 3, that is, the character string index value is updated to 3, and the target character is extracted from the corpus based on the updated character string index value 3 and the maximum length value 3 for matching, that is, for the corpus "nanjing city Yangjiang bridge", scanning is started from the "long" character. Until the word segmentation operation of the corpus is completed.

S215: if the matching fails, the maximum length value is decremented, and the target character is extracted from the corpus based on the updated maximum length value and the character string index value for matching until the word segmentation operation of the corpus is completed.

Specifically, the target character obtained in the step S212 is matched with a legal character in a preset dictionary library, when the target character is not matched with the legal character in the preset dictionary library, the maximum length value is updated to be the maximum length value minus 1 in the current step S212, and the target character is extracted from the corpus based on the updated maximum length value and the character string index value for matching until the word segmentation operation of the corpus is completed.

It should be noted that, when all the target characters with the maximum length value greater than 1 are not matched to the same characters in the preset dictionary library, the single character is confirmed as the target word.

For example: as described in the example in step S212, if the target character "nanjing city" is not matched with the characters in the preset dictionary, the maximum length value is updated to the current maximum length value 3 minus 1, that is, the maximum length value is updated to 2, and the target character is extracted from the corpus based on the updated maximum length value 2 and the character string index value 0 for matching until the word segmentation operation of the corpus is completed.

S216: if the corpus in the positive sample completes word segmentation operation, a first word segmentation result corresponding to the positive sample is obtained, and if the corpus in the negative sample completes word segmentation operation, a second word segmentation result corresponding to the negative sample is obtained.

Specifically, when each corpus in the positive sample completes word segmentation operation, the word segmentation result corresponding to each corpus is used as a first word segmentation result corresponding to the positive sample, and when the corpus in the negative sample completes word segmentation operation, the word segmentation result corresponding to the corpus is used as a second word segmentation result corresponding to the negative sample.

In this embodiment, word segmentation processing is performed on each corpus in the positive sample and the negative sample by setting a character string index value and a maximum length value of the word segmentation, and a first word segmentation result and a second word segmentation result are obtained by matching the character string index value and the maximum length value with legal characters. Therefore, accurate word segmentation of each corpus in the positive sample and the negative sample is realized, and the accuracy of vector feature conversion processing of a first word segmentation result and a second word segmentation result which are processed by word segmentation in the follow-up process is improved.

In an embodiment, the long-term memory network includes n+1 first long-term memory network layers and 2 second long-term memory network layers, where n is a positive integer greater than 1, and as shown in fig. 5, in step S3, semantic feature extraction is performed on the positive-term memory network and the negative-term memory network, and the obtaining the first semantic feature corresponding to the positive-term memory network and the second semantic feature corresponding to the negative-term memory network includes the following steps:

s31: respectively importing n positive vector features and negative vector features into n+1 first long-short-term memory network layers for semantic recognition to obtain n first recognition results corresponding to the n positive vector features and second recognition results corresponding to the negative vector features.

In the embodiment of the invention, the first LSTM network layer is a network structure specially used for carrying out semantic recognition on positive vector features and negative vector features, n positive vector features and negative vector features are respectively imported into n+1 first long-short-term memory network layers for carrying out semantic recognition, namely each positive vector feature and each negative vector feature are respectively imported into one first long-short-term memory network layer for carrying out semantic recognition, and n first recognition results and n second recognition results output by n+1 first long-short-term memory network layers are obtained. The first recognition result corresponds to the forward vector feature, and the second recognition result corresponds to the second recognition result.

By introducing each positive vector feature and each negative vector feature into one first long-short-term memory network layer for semantic recognition, the n positive vector features and the n negative vector features can be simultaneously subjected to semantic recognition, and the recognition efficiency of the semantic recognition is improved.

S32: respectively importing n first recognition results and second recognition results into 2 second long-term and short-term memory network layers to extract semantic features, and obtaining first semantic features and second semantic features.

In the embodiment of the invention, the second LSTM network layer refers to a network structure specifically configured to perform semantic feature extraction for the first recognition result and the second recognition result, and the second LSTM network layer is a bidirectional LSTM. The two-way LSTM consists of two LSTMs with different directions, one LSTM reads data from front to back according to the word sequence in the sentence, the other LSTM reads data from back to front according to the reverse direction of the word sequence of the sentence, so that the first LSTM obtains the context information, the other LSTM obtains the context information, and the joint speaking of the two LSTMs is the context information of the whole sentence.

After bidirectional LSTM encoding, only the vector marking the corresponding position of the entity is output at the hidden layer of the bidirectional LSTM neuron instead of the whole encoded vector of the whole sentence, and the method has the advantages that the interference of redundant information on relation classification can be removed, and only the most critical information is reserved; and outputting semantic features corresponding to sentences through the extraction of the bidirectional LSTM.

Specifically, inputting n first recognition results into a second long-term and short-term memory network layer for semantic feature extraction, and outputting first semantic features obtained after semantic feature extraction for the n first recognition results; inputting the second recognition result into another second long-term and short-term memory network layer for semantic feature extraction, and outputting second semantic features obtained by extracting the semantic features of the second recognition result.

It should be noted that, because the first recognition result is obtained based on the historical question, the historical answer and the current question, when the semantic feature extraction is performed, the combination of the historical question, the historical answer and the current question can enable the semantic feature extraction to be more accurate.

In this embodiment, semantic recognition is performed on positive vector features and negative vector features through a first long-short-term memory network layer to obtain a first recognition result and a second recognition result, and semantic feature extraction is performed on the first recognition result and the second recognition result by using a second long-short-term memory network layer to obtain a first semantic feature and a second semantic feature. Therefore, the accurate extraction of the first semantic features and the second semantic features is realized, the accuracy of calculation by using the first semantic features and the second semantic features is improved, and the accuracy of model training is further improved.

In one embodiment, as shown in fig. 6, in step S6, performing a loss calculation according to the standard answer vector and the target vector to obtain a loss value includes the following steps:

s61: and carrying out cosine similarity calculation on the standard answer vector and the target vector to obtain a cosine calculation result.

Specifically, according to the standard answer vector and the target vector, the cosine calculation result is calculated according to the formula (1):

wherein X is the cosine calculation result, A is the standard answer vector, and B is the target vector.

S62: and carrying out loss calculation according to the cosine calculation result and the cross entropy loss function to obtain a loss value.

In the embodiment of the invention, the cosine calculation result represents the probability that the initial multi-round question-answering model predicts the matching of the current question and the current answer, the matching of the current question and the current answer is represented when the probability that the initial multi-round question-answering model predicts reaches a preset target value, and the unmatched current question and the unmatched current answer is represented when the probability that the initial multi-round question-answering model predicts does not reach the preset target value. The preset target value may be specifically 0.8, or may be set according to the actual requirement of the user, which is not limited herein.

Specifically, from the cosine calculation result, a loss value is calculated using a cross entropy loss function such as formula (2):

Wherein H (p, q) is a loss value, x is 0 or 1, p (x) is an actual state corresponding to x, if x is 0, it indicates that the current question and the current answer are not matched, p (x) is 0, if x is 1, it indicates that the current question and the current answer are matched, p (x) is 1, and q (x) is a cosine calculation result.

In this embodiment, the cosine calculation result between the standard answer vector and the target vector can be quickly and accurately calculated through the formula (1), and the corresponding loss value can be quickly and accurately calculated according to the cosine calculation result through the formula (2), so that the accuracy of determining the target multi-round question-answer model by using the loss value in the following process is further ensured.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In one embodiment, a multi-round question and answer recognition apparatus is provided, which corresponds to the multi-round question and answer recognition method in the above embodiment one by one. As shown in fig. 7, the multi-round question and answer recognition apparatus includes a first acquisition module 71, an import module 72, a conversion module 73, an extraction module 74, and an output module 75.

The functional modules are described in detail as follows:

a first obtaining module 71, configured to obtain a user history question, a user history answer, and a user current question from a user database;

an importing module 72, configured to import a user history question, a user history answer, and a user current question into a pre-trained target multi-round question-answering model, where the target multi-round question-answering model includes a coding unit, a long-short-term memory unit, and a full-connection unit;

the conversion module 73 is configured to perform vector feature conversion processing on the user history question, the user history answer, and the user current question through the encoding unit, so as to obtain a first vector feature corresponding to the user history question, a second vector feature corresponding to the user history answer, and a third vector feature corresponding to the user current question;

an extracting module 74, configured to introduce the first vector feature, the second vector feature, and the third vector feature into the long-short-term memory unit for extracting semantic features, so as to obtain target semantic features;

and the output module 75 is used for importing the target semantic features into the full-connection unit to perform similarity calculation and outputting the recognition result with the maximum similarity.

Further, the multi-round question and answer recognition device further includes:

The second acquisition module is used for acquiring historical questions, historical answers and current questions from a preset sample library to serve as positive samples, and acquiring current answers to serve as negative samples;

the vector feature conversion module is used for respectively leading the positive samples and the negative samples into the coding layers in the initial multi-round question-answering model to carry out vector feature conversion treatment to obtain positive vector features corresponding to the positive samples and negative vector features corresponding to the negative samples, wherein the initial multi-round question-answering model comprises the coding layers, a long-term memory network and a convolution network;

the semantic feature extraction module is used for extracting semantic features of the positive vector features and the negative vector features through the long-term and short-term memory network to obtain first semantic features corresponding to the positive vector features and second semantic features corresponding to the negative vector features;

the query module is used for querying the standard questions matched with the first semantic features from a preset standard library and obtaining standard answer vectors corresponding to the standard questions;

the convolution module is used for importing the second semantic features into a convolution network to carry out convolution processing to obtain a target vector;

the loss calculation module is used for carrying out loss calculation according to the standard answer vector and the target vector to obtain a loss value;

And the iteration updating module is used for comparing the loss value with a preset threshold value, if the loss value is larger than the preset threshold value, carrying out iteration updating on the initial multi-round question-answering model until the loss value is smaller than or equal to the preset threshold value, and taking the updated initial multi-round question-answering model as a target multi-round question-answering model.

Further, the vector feature conversion module includes:

the word segmentation sub-module is used for carrying out word segmentation on the positive sample and the negative sample to obtain a first word segmentation result corresponding to the positive sample and a second word segmentation result corresponding to the negative sample;

and the initial conversion sub-module is used for carrying out vector feature conversion processing on the first word segmentation result and the second word segmentation result by utilizing the coding layer to obtain positive vector features and negative vector features.

Further, the word segmentation sub-module includes:

the setting unit is used for setting a character string index value and a maximum length value of the word segmentation according to preset requirements;

the character extraction unit is used for extracting target characters from the corpus according to the character string index value and the maximum length value aiming at each corpus in the positive sample and the negative sample;

the matching unit is used for matching the target character with legal characters in a preset dictionary library;

The matching success unit is used for determining the target character as the target word, updating the character string index value to the current character string index value plus the current maximum length value if the matching is successful, and extracting the target character from the corpus based on the updated character string index value and the maximum length value for matching until the word segmentation operation of the corpus is completed;

the matching failure unit is used for decrementing the maximum length value if the matching fails, extracting target characters from the corpus based on the updated maximum length value and the character string index value, and matching the target characters until the word segmentation operation of the corpus is completed;

the word segmentation operation completion unit is used for obtaining a first word segmentation result corresponding to the positive sample if each corpus in the positive sample completes word segmentation operation, and obtaining a second word segmentation result corresponding to the negative sample if the corpus in the negative sample completes word segmentation operation.

Further, the semantic feature extraction module includes:

the semantic recognition sub-module is used for respectively importing n positive vector features and negative vector features into n+1 first long-term and short-term memory network layers for semantic recognition to obtain n first recognition results corresponding to the n positive vector features and second recognition results corresponding to the negative vector features;

The feature extraction sub-module is used for respectively importing n first recognition results and second recognition results into 2 second long-term and short-term memory network layers to extract semantic features, so as to obtain first semantic features and second semantic features.

Further, the loss calculation module includes:

the cosine calculation sub-module is used for obtaining a cosine calculation result by carrying out cosine similarity calculation on the standard answer vector and the target vector;

and the loss value acquisition sub-module is used for carrying out loss calculation according to the cosine calculation result and the cross entropy loss function to obtain a loss value.

Some embodiments of the present application disclose a computer device. Referring specifically to FIG. 8, a basic block diagram of a computer device 90 in one embodiment of the present application is shown.

As illustrated in fig. 8, the computer device 90 includes a memory 91, a processor 92, and a network interface 93 communicatively coupled to each other via a system bus. It should be noted that only computer device 90 having components 91-93 is shown in FIG. 8, but it should be understood that not all of the illustrated components need be implemented and that more or fewer components may alternatively be implemented. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 91 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 91 may be an internal storage unit of the computer device 90, such as a hard disk or a memory of the computer device 90. In other embodiments, the memory 91 may also be an external storage device of the computer device 90, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device 90. Of course, the memory 91 may also include both an internal memory unit and an external memory device of the computer device 90. In this embodiment, the memory 91 is generally used to store an operating system and various application software installed on the computer device 90, such as program codes of the multiple-round question-answer recognition method. Further, the memory 91 may be used to temporarily store various types of data that have been output or are to be output.

The processor 92 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 92 is generally used to control the overall operation of the computer device 90. In this embodiment, the processor 92 is configured to execute a program code stored in the memory 91 or process data, for example, a program code for executing the multi-round question-answer recognition method.

The network interface 93 may include a wireless network interface or a wired network interface, the network interface 93 typically being used to establish communication connections between the computer device 90 and other electronic devices.

The present application also provides another embodiment, namely, provides a computer readable storage medium, where a user current question information input program is stored, where the user current question information input program can be executed by at least one processor, so that the at least one processor performs the steps of any of the multiple round question and answer identification methods.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a computer device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

Finally, it should be noted that the above-described embodiments are merely some, but not all, embodiments of the present application, and that the preferred embodiments of the present application are shown in the drawings and do not limit the scope of the patent. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a more thorough understanding of the present disclosure. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing, or equivalents may be substituted for elements thereof. All equivalent structures made by the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the protection scope of the application.

Claims

1. The multi-round question and answer recognition method is characterized by comprising the following steps of:

importing the target semantic features into the fully-connected unit for similarity calculation, and outputting a recognition result with the maximum similarity;

before the step of obtaining the user history questions, the user history answers and the user current questions from the user database, the multi-round question-answer identification method further comprises the following steps:

acquiring historical questions, historical answers and current questions from a preset sample library as positive samples, and acquiring current answers as negative samples;

respectively leading the positive sample and the negative sample into a coding layer in an initial multi-round question-answering model to perform vector feature conversion processing to obtain positive vector features corresponding to the positive sample and negative vector features corresponding to the negative sample, wherein the initial multi-round question-answering model comprises the coding layer, a long-short-term memory network and a convolution network;

Extracting semantic features from the positive vector features and the negative vector features through the long-short term memory network to obtain first semantic features corresponding to the positive vector features and second semantic features corresponding to the negative vector features;

inquiring a standard question matched with the first semantic feature from a preset standard library, and acquiring a standard answer vector corresponding to the standard question;

importing the second semantic features into the convolution network to carry out convolution processing to obtain a target vector;

carrying out loss calculation according to the standard answer vector and the target vector to obtain a loss value;

and comparing the loss value with a preset threshold, and if the loss value is larger than the preset threshold, iteratively updating the initial multi-round question-answering model until the loss value is smaller than or equal to the preset threshold, wherein the updated initial multi-round question-answering model is used as the target multi-round question-answering model.

2. The multi-round question-answering recognition method according to claim 1, wherein the step of respectively introducing the positive sample and the negative sample into a coding layer in an initial multi-round question-answering model to perform vector feature conversion processing to obtain positive vector features corresponding to the positive sample, and the step of obtaining negative vector features corresponding to the negative sample comprises the steps of:

Performing word segmentation on the positive sample and the negative sample to obtain a first word segmentation result corresponding to the positive sample and a second word segmentation result corresponding to the negative sample;

and carrying out vector feature conversion processing on the first word segmentation result and the second word segmentation result by utilizing the coding layer to obtain the positive vector feature and the negative vector feature.

3. The multi-round question-answering recognition method according to claim 2, wherein each of the history questions, each of the history answers, and the current questions in the positive sample are respectively used as a corpus, the current answer in the negative sample is used as a corpus, the positive sample and the negative sample are subjected to word segmentation processing to obtain a first word segmentation result corresponding to the positive sample, and the step of obtaining a second word segmentation result corresponding to the negative sample comprises:

setting a character string index value and a maximum length value of the word segmentation according to preset requirements;

extracting target characters from the corpus according to the character string index value and the maximum length value for each corpus in the positive sample and the negative sample;

matching the target character with legal characters in a preset dictionary library;

If the matching is successful, determining the target character as a target word, updating the character string index value to be the current character string index value plus the current maximum length value, and extracting the target character from the corpus based on the updated character string index value and the updated maximum length value to match until the word segmentation operation of the corpus is completed;

if the matching fails, the maximum length value is decremented, and target characters are extracted from the corpus based on the updated maximum length value and the character string index value for matching until word segmentation operation of the corpus is completed;

if each corpus in the positive sample completes word segmentation operation, the first word segmentation result corresponding to the positive sample is obtained, and if the corpus in the negative sample completes word segmentation operation, the second word segmentation result corresponding to the negative sample is obtained.

4. The multi-round question-answering identification method according to claim 1, wherein the long-short term memory network comprises n+1 first long-short term memory network layers and 2 second long-short term memory network layers, the positive vector features are n, wherein n is a positive integer greater than 1, the step of extracting semantic features from the positive vector features and the negative vector features through the long-short term memory network to obtain first semantic features corresponding to the positive vector features and second semantic features corresponding to the negative vector features comprises:

Respectively importing n positive vector features and the negative vector features into n+1 first long-short-term memory network layers for semantic recognition to obtain n first recognition results corresponding to the n positive vector features and second recognition results corresponding to the negative vector features;

and respectively importing n first recognition results and second recognition results into 2 second long-short-term memory network layers to extract semantic features, so as to obtain the first semantic features and the second semantic features.

5. The multi-round question-answering recognition method according to claim 1, wherein the step of performing a loss calculation based on the standard answer vector and the target vector to obtain a loss value comprises:

cosine calculation results are obtained through cosine similarity calculation on the standard answer vector and the target vector;

and carrying out loss calculation according to the cosine calculation result and the cross entropy loss function to obtain the loss value.

6. A multi-round question-answer recognition apparatus, characterized in that the multi-round question-answer recognition apparatus comprises:

the output module is used for importing the target semantic features into the full-connection unit to perform similarity calculation and outputting an identification result with the maximum similarity;

The vector feature conversion module is used for respectively leading the positive sample and the negative sample into a coding layer in an initial multi-round question-answering model to carry out vector feature conversion processing to obtain positive vector features corresponding to the positive sample and negative vector features corresponding to the negative sample, wherein the initial multi-round question-answering model comprises the coding layer, a long-short-term memory network and a convolution network;

the semantic feature extraction module is used for extracting semantic features of the positive vector features and the negative vector features through the long-short-term memory network, and obtaining first semantic features corresponding to the positive vector features and second semantic features corresponding to the negative vector features;

the query module is used for querying a standard question matched with the first semantic feature from a preset standard library and acquiring a standard answer vector corresponding to the standard question;

the convolution module is used for importing the second semantic features into the convolution network to carry out convolution processing to obtain a target vector;

and the iteration updating module is used for comparing the loss value with a preset threshold value, if the loss value is larger than the preset threshold value, carrying out iteration updating on the initial multi-round question-answering model until the loss value is smaller than or equal to the preset threshold value, and taking the updated initial multi-round question-answering model as the target multi-round question-answering model.

7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the multi-round question-answer identification method of any one of claims 1 to 5 when the computer program is executed.

8. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the multi-round question-answer identification method of any one of claims 1 to 5.