CN112507078B

CN112507078B - Semantic question and answer method and device, electronic equipment and storage medium

Info

Publication number: CN112507078B
Application number: CN202011476868.0A
Authority: CN
Inventors: 李利娟; 王梦婷
Original assignee: Zhejiang Nuonuo Network Technology Co ltd
Current assignee: Zhejiang Nuonuo Network Technology Co ltd
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2022-05-10
Anticipated expiration: 2040-12-15
Also published as: CN112507078A

Abstract

The application discloses a semantic question-answering method, which comprises the following steps: creating an inverted sorting index library corresponding to preset problem data in a problem library; generating problem pair sample data by utilizing a reverse sorting index library, and converting the problem pair sample data into a character coding sequence; determining a loss function of the neural network model according to the character coding sequence and the part of speech coding sequence; determining a preset problem vector corresponding to preset problem data by using a neural network model; determining a set of candidate problem data similar to the user problem data in the problem database according to the inverted sorting index database; calculating the text vector similarity of the user problem data and the candidate problem data; and setting the candidate question data with the highest vector similarity as target question data, and outputting preset answer data corresponding to the target question data. The method and the device can improve the accuracy of semantic question answering. The application also discloses a semantic question answering device, electronic equipment and a storage medium, which have the beneficial effects.

Description

Semantic question and answer method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of pattern recognition and machine learning technologies, and in particular, to a semantic question answering method and apparatus, an electronic device, and a storage medium.

Background

With the vigorous development of artificial intelligence technology, neural network models have achieved great success in question-answering tasks. The continuous innovation of question-answering technology has prompted the technology to receive wide attention in the fields of information retrieval, text matching, and retrieval-based dialogue.

The question-answering algorithm can be generally classified into a generator-based method and a searcher-based method. The generative-based approach aims to learn the relations between question-answer pairs from large-scale corpora and predict the context using the above information. The search-based approach aims to learn the correlations between texts and to retrieve the question-answer pairs from the question-answer library that are most relevant to the user's question. However, the question-answering algorithm has the problems of obvious text semantic representation difference, poor text semantic recognition capability in the professional field, under-fitting of a small data sample set in a depth model and the like in the transfer learning process.

Therefore, how to improve the accuracy of semantic question answering is a technical problem that needs to be solved by those skilled in the art at present.

Disclosure of Invention

The application aims to provide a semantic question and answer method, a semantic question and answer device, electronic equipment and a storage medium, and the semantic question and answer accuracy can be improved.

In order to solve the technical problems, the present application provides a semantic question-answering method, which includes:

creating an inverted sorting index library corresponding to preset problem data in a problem library;

receiving user problem data, generating problem pair sample data by using the reverse sorting index library, and converting the problem pair sample data into a character coding sequence;

extracting part-of-speech information of a text corresponding to the preset problem data, and converting the part-of-speech information into a part-of-speech coded sequence;

determining a loss function of a neural network model according to the character coding sequence and the part of speech coding sequence; the neural network model is based on the combination of a Transformer and IDCNN;

training the neural network model by using a sample in the problem pair sample data and a label of the sample, and determining a preset problem vector corresponding to the preset problem data by using the neural network model;

determining a set of candidate question data similar to the user question data in the question bank according to the inverted sorting index bank;

calculating text vector similarity of the user question data and the candidate question data by using the neural network model;

and setting the candidate question data with the highest vector similarity as target question data, and outputting preset answer data corresponding to the target question data.

Optionally, the creating a reverse sorting index library corresponding to preset question data in the question library includes:

performing preprocessing operation on the preset question data in the question bank; wherein the preprocessing operation comprises any one or a combination of any several of a synonym conversion operation, a participle operation and a stop word removal operation;

counting the inverse text frequency index, word frequency and part-of-speech weight of each word in all the preset problem data;

and creating the reverse ordering index library corresponding to preset question data in the question library according to the inverse text frequency index, the word frequency and the part of speech weight.

Optionally, generating problem pair sample data by using the reverse sorting index library includes:

determining a first question combination of preset question data corresponding to the same answer in the question bank, and setting the first question combination as a positive sample of the question to sample data;

calculating the problem similarity of each preset problem data and the user problem data according to the inverted sorting index library, and removing the preset problem data which are completely the same as the user problem data to obtain a similarity sorting;

and taking the preset problem data with N bits before the similarity sorting and the user problem data as a second problem combination, and setting the second problem combination as a negative sample of the problem pair sample data.

Optionally, after setting the first question combination as a positive sample of the question-pair sample data, the method further includes:

marking the label corresponding to the first question combination as 1;

correspondingly, after the setting of the second question combination as a negative sample of the question pair sample data, the method further includes:

and marking the label corresponding to the second question combination as 0.

Optionally, converting the sample data of the question pair into a character encoding sequence, including:

and converting the characters corresponding to the sample data into character codes according to a character dictionary, and performing bit filling on all the character codes to enable all the character codes to be identical in length, so as to obtain the character code sequence.

Optionally, the converting the part-of-speech information into a part-of-speech coding sequence includes:

and converting the part-of-speech information corresponding to the sample data of the question into part-of-speech codes according to the part-of-speech dictionary, and performing bit filling on all the part-of-speech codes to enable all the part-of-speech codes to be identical in length, so as to obtain the part-of-speech coded sequence.

Optionally, determining a loss function of the neural network model according to the character coding sequence and the part-of-speech coding sequence includes:

generating text vectors corresponding to the character coding sequences by using an Embedding layer and a transform network of the neural network model, and calculating text relevancy loss according to the distance between the text vectors;

generating a characteristic vector of the part of speech coding sequence by utilizing an Attention network and a multilayer convolution neural network of the neural network model, and determining Softmax loss of the characteristic vector;

and performing weighted calculation on the text relevancy loss and the Softmax loss to obtain a loss function of the neural network model.

The application also provides a semantic question answering device, which comprises:

the reverse sorting index database creating module is used for creating a reverse sorting index database corresponding to preset problem data in the problem database;

the character coding sequence generation module is used for receiving user problem data, generating problem pair sample data by using the reverse sorting index library and converting the problem pair sample data into a character coding sequence;

the part-of-speech coded sequence generating module is used for extracting part-of-speech information of the text corresponding to the preset problem data and converting the part-of-speech information into a part-of-speech coded sequence;

the loss function determining module is used for determining a loss function of the neural network model according to the character coding sequence and the part of speech coding sequence; the neural network model is based on the combination of a Transformer and IDCNN;

the vector determination module is used for training the neural network model by using a sample in the problem pair sample data and a label of the sample, and determining a preset problem vector corresponding to the preset problem data by using the neural network model;

a similar problem determining module, configured to determine, according to the reverse sorting index library, a set of candidate problem data similar to the user problem data in the problem library;

the vector similarity calculation module is used for calculating the text vector similarity of the user question data and the candidate question data by utilizing the neural network model;

and the answer output module is used for setting the candidate question data with the highest vector similarity as target question data and outputting preset answer data corresponding to the target question data.

The application also provides a storage medium, on which a computer program is stored, which when executed implements the steps performed by the semantic question answering method.

The application also provides an electronic device, which comprises a memory and a processor, wherein the memory is stored with a computer program, and the processor realizes the steps executed by the semantic question answering method when calling the computer program in the memory.

The application provides a semantic question answering method which comprises the steps of establishing a reverse ordering index library corresponding to preset question data in a question library; receiving user problem data, generating problem pair sample data by using the reverse sorting index library, and converting the problem pair sample data into a character coding sequence; extracting part-of-speech information of a text corresponding to the preset problem data, and converting the part-of-speech information into a part-of-speech coded sequence; determining a loss function of a neural network model according to the character coding sequence and the part of speech coding sequence; the neural network model is based on the combination of a Transformer and IDCNN; training the neural network model by using a sample in the problem pair sample data and a label of the sample, and determining a preset problem vector corresponding to the preset problem data by using the neural network model; determining a set of candidate question data similar to the user question data in the question bank according to the inverted sorting index bank; calculating text vector similarity of the user question data and the candidate question data by using the neural network model; and setting the candidate question data with the highest vector similarity as target question data, and outputting preset answer data corresponding to the target question data.

According to the method and the device, an inverted sorting index library corresponding to preset problem data in a problem library is created, problem pair sample data is generated based on the inverted sorting index library, and then the problem pair sample data is converted into a character coding sequence. The method and the device also convert part-of-speech information of preset question data in the question bank into part-of-speech coded sequences. And determining a loss function of the neural network model through the character coding sequence and the part-of-speech coding sequence, and further determining preset answer data corresponding to the user question by using the neural network model. The method and the device determine the loss function of the neural network model by utilizing the character coding sequence and the part-of-speech coding sequence, can realize dynamic semantic question answering based on Siamese, and improve the accuracy of the semantic question answering. The application also provides a semantic question answering device, electronic equipment and a storage medium, which have the beneficial effects and are not repeated herein.

Drawings

In order to more clearly illustrate the embodiments of the present application, the drawings needed for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

Fig. 1 is a flowchart of a semantic question answering method according to an embodiment of the present disclosure;

FIG. 2 is a partial flow chart of a neural network model preparation provided by an embodiment of the present application;

fig. 3 is a diagram of a neural network model structure based on the combination of a Transformer and IDCNN according to an embodiment of the present application;

FIG. 4 is a flow chart of a data prediction method according to an embodiment of the present application;

fig. 5 is a flowchart of a data preprocessing method for creating a reverse sorting index library according to an embodiment of the present application;

fig. 6 is a flowchart of deep learning sample data preprocessing according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart of a semantic question answering method according to an embodiment of the present disclosure.

The specific steps may include:

s101: creating an inverted sorting index library corresponding to preset problem data in a problem library;

before this step, there may also be an operation of sorting the question and answer data to create a question bank and an answer bank, where preset question data in the question bank may be represented as Q and preset answer data in the answer bank may be represented as a. The embodiment can record the corresponding relationship between the preset question data in the question bank and the preset answer data in the answer bank in advance.

The embodiment can perform preprocessing operation on preset question data in the question bank to create the inverted sorting index bank. Specifically, the above-mentioned preprocessing operation may include any one or a combination of any several of a synonym conversion operation, a participle operation, and a stop word removal operation. The present embodiment may also create an inverted sorting index library by using the BM25 algorithm, and the specific process is as follows: counting Inverse text Frequency Indexes (IDF), Term Frequencies (TF) and part-of-speech weights of all terms in the preset problem data; and creating the reverse ordering index library corresponding to preset question data in the question library according to the inverse text frequency index, the word frequency and the part of speech weight. The part-of-speech weight refers to the weight of each word in the preset question data, and the weight is related to the part of speech of the word. In this embodiment, the correspondence between the part of speech and the part of speech weight may be preset, for example, the weight of the noun is 0.5, the weight of the verb is 0.8, and the weight of the inflection word is 0.1. The inverse text frequency index, word frequency and part-of-speech weight of each preset question data can be included in the inverse sorting index database.

S102: receiving user problem data, generating problem pair sample data by using the reverse sorting index library, and converting the problem pair sample data into a character coding sequence;

after receiving the user question data, generating question pair sample data by using the reverse sorting index library. The problem pair sample data, which may include positive and negative examples, is a combination of multiple problem data. Specifically, the problem pair sample data may be generated in the following manner: determining a first question combination of preset question data corresponding to the same answer in the question bank, and setting the first question combination as a positive sample of the question to sample data; calculating the problem similarity of each preset problem data and the user problem data according to the inverted sorting index library, and removing the preset problem data which are completely the same as the user problem data to obtain a similarity sorting; and taking the preset problem data with N bits before the similarity sorting and the user problem data as a second problem combination, and setting the second problem combination as a negative sample of the problem pair sample data.

Further, after generating the question pair data according to the question bank, the embodiment may further add a corresponding tag to the question pair according to whether the question is related. Specifically, in this embodiment, after the first question combination is set as a positive sample of the question-to-sample data, a label corresponding to the first question combination may also be labeled as 1; correspondingly, in this embodiment, after the second question combination is set as the negative sample of the question pair sample data, the label corresponding to the second question combination may also be labeled as 0.

After the problem pair sample data is obtained, the embodiment may also preprocess the problem pair sample data to obtain a character encoding sequence, and convert the character encoding sequence into a dictionary ID sequence representation form that can be recognized by an algorithm. Further, this embodiment may convert the problem pair sample data into a character encoding sequence by: and converting the characters corresponding to the sample data into character codes according to a character dictionary, and performing bit filling on all the character codes to enable all the character codes to be identical in length, so as to obtain the character code sequence.

S103: extracting part-of-speech information of a text corresponding to the preset problem data, and converting the part-of-speech information into a part-of-speech coded sequence;

in the step, the text corresponding to the preset problem data can be determined, word segmentation processing is carried out on the text to obtain the part-of-speech information of each word, and then the part-of-speech information is converted into a part-of-speech coding sequence. Specifically, the present embodiment may extract part-of-speech information of the question text, convert the part-of-speech information into an OneHot sparse feature, and use the one-hot sparse feature as an input feature of the convolutional network, for example, the present embodiment may convert the part-of-speech information corresponding to the sample data of the question into part-of-speech codes according to a part-of-speech dictionary, and perform bit filling on all the part-of-speech codes to make all the part-of-speech codes have the same length, so as to obtain the part-of-speech coded sequence.

S104: determining a loss function of a neural network model according to the character coding sequence and the part of speech coding sequence;

the neural network model mentioned in this embodiment is a neural network model based on a combination of a Transformer and IDCNN (iterative scaled cnn), and before this step, there may be an operation of building a network model based on a combination of a Transformer and IDCNN. The character coding sequence and the part-of-speech coding sequence can be used as the input of a neural network model in the step, the neural network is trained, and the parameters of the model are stored.

The embodiment can build a neural network model based on the combination of a Transformer and IDCNN in the following way: reserving placeholders for input data of a network, the input data comprising: character information, word information, and label information. A network of character feature learning components is created. And after the character codes are subjected to Embedding, the character codes are sent into a plurality of layers of Transformer networks, and the output of the last layer of Transformer network is taken as text characteristic representation after being processed. And calculating the correlation between the text pairs by using the obtained text features, and calculating the correlation loss according to whether the labels are correlated or not. A network of word feature learning components is created. Firstly, an Attention mechanism is used for calculating the importance of each word between text pairs and weighting, and the vector of each word is reconstructed according to the importance. And then inputting the reconstructed character features into a multilayer convolution network, expanding a perception domain of the network by using an expansion convolution network, and setting a plurality of convolution kernels to extract the multi-dimensional features of the text. And finally, sending the multi-dimensional features into a full connection layer to reduce the dimensions of the features. And calculating the classification loss of whether the text pairs are matched or not by using the obtained features. The two parts of the loss are weighted and added as the final loss calculation function.

Specifically, the present embodiment may determine the loss function of the neural network model by the following method: generating text vectors corresponding to the character coding sequences by using an Embedding layer and a transform network of the neural network model, and calculating text relevancy loss according to the distance between the text vectors; generating a characteristic vector of the part of speech coding sequence by utilizing an Attention network and a multilayer convolution neural network of the neural network model, and determining Softmax loss of the characteristic vector; and performing weighted calculation on the text relevancy loss and the Softmax loss to obtain a loss function of the neural network model.

S105: training the neural network model by using a sample in the problem pair sample data and a label of the sample, and determining a preset problem vector corresponding to the preset problem data by using the neural network model;

in this step, the neural network model may be trained by using a sample in the question-to-sample data and a label of the sample, and preset question data in the question bank may be input to the neural network model to obtain a corresponding preset question vector on the basis that the neural network model is trained, so as to obtain a question vector bank corresponding to the question bank. After the problem vector library is obtained, the problem vector library may be stored. Specifically, the embodiment may load a trained neural network model, and extract the output of the penultimate neural network of the neural network model as the representation of the text vector. And traversing the problem data, and converting the problem into a vector to be represented to obtain a problem vector library Q _ Vectors.

S106: determining a set of candidate question data similar to the user question data in the question bank according to the inverted sorting index bank;

in this embodiment, the reverse sorting index library may be used to calculate a text similarity between the user question data and each preset question data in the question library, and the preset question data with the text similarity greater than the preset value and K bits before ranking is used as candidate question data similar to the user question data, so as to obtain a candidate question data set. That is, in this embodiment, the user question may be matched with the data in the reverse sorting index library, and the K questions with the highest scores are obtained as the candidate question set Q _ candidate.

S107: calculating text vector similarity of the user question data and the candidate question data by using the neural network model;

in this embodiment, the user question data and each candidate question data may be combined into a question pair, and then the neural network model is used to calculate the text vector similarity between the user question data and the candidate question data in each question pair.

S108: and setting the candidate question data with the highest vector similarity as target question data, and outputting preset answer data corresponding to the target question data.

The method can calculate the similarity between the vector of the user question and the vector corresponding to the candidate question set, and predict the probability of whether the two texts are related by utilizing a neural network; and judging whether the two questions are the same type of question or not according to the similarity and the relevance probability between the texts, and if so, extracting an answer corresponding to the question and returning the answer to the user.

In this embodiment, a reverse sorting index library corresponding to preset problem data in a problem library is created first, problem pair sample data is generated based on the reverse sorting index library, and then the problem pair sample data is converted into a character coding sequence. The embodiment also converts the part-of-speech information of the preset question data in the question bank into the part-of-speech coded sequence. And determining a loss function of the neural network model through the character coding sequence and the part-of-speech coding sequence, and further determining preset answer data corresponding to the user question by using the neural network model. In the embodiment, the loss function of the neural network model is determined by utilizing the character coding sequence and the part-of-speech coding sequence, dynamic semantic question answering based on Siamese can be realized, and the accuracy of the semantic question answering is improved.

Referring to fig. 2, fig. 2 is a flow chart of a neural network model preparation part provided in an embodiment of the present application, as shown in fig. 2, the neural network model preparation part mainly includes two aspects, on one hand, based on a training part combining a transform and an IDCNN with a deep network model, and in a training process, a preprocessed problem sequence is sent to the neural network model to learn data characteristics and network parameters to obtain a problem vector library. On the other hand, the problem sequences in the problem library are sent into a reverse sorting index algorithm after being preprocessed to create a reverse sorting index library.

The embodiment provides a method for building a neural network model based on the combination of a Transformer and an IDCNN, and a further embodiment can be obtained by combining the embodiment with the embodiment corresponding to fig. 1, and the embodiment may include the following steps:

step A1: place holders are set for the inputs of the neural network model.

Inputs to the neural network model include: character encoding sequence, part of speech encoding sequence and label whether text pair is related.

Step A2: and after the character codes are subjected to Embedding, the character codes are sent into a plurality of layers of Transformer networks, and the output of the last layer of Transformer network is taken as text characteristic representation after being processed.

Specifically, the implementation process of step a2 may include: and sending the character coding sequence of the text pair into an Embedding layer to obtain a text character vector matrix. And (3) sending the text character vector matrix into a neural network of a multilayer Transformer, and learning the dynamic semantic features of the text character vector matrix by using a deep network. And sending the output of the last layer of the Transformer into an Average Pooling layer to reduce the dimension of the characteristics, and extracting the output of the Pooling layer as the representation of the text vector.

Step A3: and calculating text vector distance and defining text relevancy loss.

Specifically, the distance between the text vectors obtained after the text pair passes through Embedding and multiple layers of transformers can be calculated in this embodiment, and the distance calculation formula can be an euclidean distance, a manhattan distance, an included angle cosine distance, and the like. The distance calculation formula may be:

wherein u is_i,v_iVectors representing two texts of a text pair, respectively, d_iIs the text vector distance.

After the distance between the texts and the labels of the text pairs are obtained, two text relevancy losses Loss1 can be calculated, and the Loss calculation formula is:

wherein N represents the total number of samples, y_iIndicates the label corresponding to the sample i, d_iThe vector distance between two texts in the sample i is shown, and margin is a preset value.

Step A4: and learning part-of-speech characteristics.

The present embodiment may learn part-of-speech features in the following manner: and (5) sending the part-of-speech coded sequence into an Attention network, and calculating the importance of the words in the text. A new characterization of each word is derived based on importance. And sending the new feature representation into a multilayer convolutional neural network, expanding a perception domain by using the multilayer neural network, and extracting more context information. And connecting output features of the multilayer convolution, sending the output features into a full-connection layer, extracting more important features, and achieving the purpose of reducing the dimension of the features to obtain a vector after dimension reduction.

Step A5: and calculating the Softmax loss of the vector after dimensionality reduction.

The calculation formula for the Softmax loss is:

wherein N represents the total number of samples, y_iIndicates the label corresponding to the sample i, a_iRepresenting the features after dimensionality reduction.

Step A6: and accumulating the two losses obtained in the step A3 and the step A5 as a loss function of the neural network model.

Wherein, the calculation formula of the loss function is as follows: loss ═ w₁*Loss1+w₂*Loss2；w₁And w₂Is a weighting factor.

Referring to fig. 3, fig. 3 is a diagram of a neural network model structure based on the combination of a Transformer and an IDCNN according to an embodiment of the present application; as shown in fig. 3, a model structure diagram based on the combination of the Transformer and IDCNN shows a network structure of the text matching model, and the network model mainly includes two parts, one part is a classification loss network formed by convolving part-of-speech features, and the other part is a semantic loss network formed by convolving word vector features. The two parts of networks both adopt a parameter sharing mechanism, so that the correlation characteristics between text pairs can be better mined, and the parameter quantity of the model can be effectively reduced. In addition, the loss of the whole network is the result of adding two loss functions, and the combined loss calculation mode can better learn the correlation between text pairs. As shown in fig. 3, for two question texts, "do you go late to work and deduct wages", a loss function is determined, the question text is first participled and the part of speech of each word is determined, v denotes verbs, vn denotes nouns, y denotes word vitality words, and vector representations u of the two texts are obtained through an Attention network₁～u₅And v₁～v₅And processing the vectors through the 3 rd to 5 th layers of the Pooling layer, the catenate layer and the Dense layer of the CNN network to obtain the Loss 1. In this embodiment, the character coding sequence of the text pair may also be sent to an Embedding layer to obtain a text character vector matrix, the text character vector matrix is sent to a neural network of multiple layers of transformers, a deep network is used to learn the dynamic semantic features of the text character vector matrix, the output of the last layer of the transformers is sent to an Average Pooling layer (i.e., Avg Pooling in the figure) to perform dimensionality reduction on the features, and the output of the Pooling layer is extracted as the representation of the text vector to obtain Loss 2.

Referring to fig. 4, fig. 4 is a flowchart of a data prediction method according to an embodiment of the present application, and as shown in fig. 4, the flowchart of the data prediction part shows a rough flow from a user inputting a question to a result obtained by a question matching model. In the process, a candidate problem set is obtained by a user problem through a reverse sorting algorithm, so that on one hand, the purpose of sorting the texts according to the character features is achieved, and on the other hand, the calculation amount among the text vectors can be reduced. And after the candidate problem set is obtained, calculating the similarity between texts by using the text vectors obtained by the deep network, and obtaining the probability of whether the texts are matched or not.

Based on the scheme shown in fig. 4, the embodiment of the present application further provides a dynamic semantic question-answering scheme based on Siamese, which includes the following specific implementation steps:

step B1: sorting the data in the question-answer library, and creating a question library and an answer library;

step B2: b, using the problem data obtained in the step B1 to create a reverse sorting index library by using a BM25 algorithm;

specifically, the embodiment may load common stop words and integrate words in the problem set that have little influence on the result as the stop word library. Reading problem sequence data and preprocessing the problem data. Referring to fig. 5, fig. 5 is a flowchart of a data preprocessing method for creating a reverse sorting index library according to an embodiment of the present application, where the processing steps mainly include: and carrying out synonym conversion on the text, segmenting the text by using a segmentation tool, and removing stop words in the text according to the stop word bank. And (5) counting the IDF values of all words in the question bank. The formula for the IDF is as follows:

wherein q is_iRepresenting any word in the question set, N representing the total number of question texts, N (q)_i) Meaning the word q_iThe number of occurrences.

The embodiment can traverse the text of the problem data, perform word segmentation on the text of the problem data and count TF values, TF, of each word in the text_iRepresenting the frequency with which word i appears in text d. Setting a weight (namely a part-of-speech weight) pw to each word according to the part-of-speech of the word i_i。

Step B3: and generating question pair sample data according to the data in the question bank, and adding a corresponding label to the question pair sample data.

Specifically, the implementation manner of step B3 is as follows:

and loading question data, combining the questions corresponding to all the same answers to be used as a positive sample, and marking the corresponding label as 1.

Traversing the question data, and calculating a similarity Score (Q, Q) between the question and the question in the question bank by using the index bank to be sorted created in step B2. The calculation formula is as follows:

wherein k is₁、k₂B is a regulatory factor, qf_iIndicating the frequency of occurrence of the word i in the current question.

And sorting the scores to remove data belonging to the same type of problem as the problem, selecting the first K problems and the problem combination as negative samples, and setting the corresponding labels as 0.

Step B4: processing the sample data obtained in the step 3, and transferring the sample data into a representation form which can be understood by a neural network;

specifically, the implementation manner of step B4 is as follows: sample data and a character dictionary are loaded. And traversing sample data, splitting each question according to characters, and converting the characters into codes corresponding to the character dictionary. And (3) counting the text character length in each batch to obtain the maximum value of the character length, and padding (expanding) all the text character coding sequences into the character coding sequence with the maximum value of the character length to obtain the character coding sequences with the same length.

B5, extracting the part-of-speech attributes of the question text, and converting the part-of-speech attributes into a representation form which can be identified by the network;

specifically, referring to fig. 6, fig. 6 is a flowchart of deep learning sample data preprocessing provided in the embodiment of the present application, and the implementation manner of step B5 is as follows: and performing word segmentation operation on each text, recording the part of speech of each word, and converting the part of speech into corresponding coded representation according to a part of speech dictionary. And (5) counting the text length in each batch to take the maximum value, and padding the text part-of-speech coded sequences into part-of-speech coded sequences with fixed lengths. And performing OneHot processing on the obtained part-of-speech coded sequence, and converting part-of-speech characteristics into a matrix form with dimensions (pitch size, padding character length and part-of-speech dictionary size).

Step B6: and (4) building a neural network model based on the combination of the Transformer and the IDCNN, and specifically realizing the steps B1-B6 in the embodiment.

Step B7: and sending the processed data into a neural network model, training parameters in the network, and storing the trained parameters and model.

Step B8: converting the question bank into a question vector bank by using a neural network module and storing the question vector bank;

specifically, in this embodiment, the preset question data in the question bank may be converted into the question vector to obtain the question vector bank in the following manner: loading the trained model and the trained problem set, traversing data in the problem library, and processing the data into an input form required by the network model; sending the data into a network model, and extracting the output of the Average Pooling layer as text representation; and storing data such as question texts, network input data, text vector libraries and the like.

Step B9: obtaining a similar candidate problem set according to user problem matching;

specifically, the embodiment may perform preprocessing on the user question, perform similarity calculation on the preprocessed user question text and the data in the inverted sorting index library, and take topK questions with the highest similarity as the candidate question set.

Step B10: extracting a user vector and calculating the similarity between the user vector and a candidate question;

specifically, the present embodiment may extract the neural network input data corresponding to the candidate question from the question vector library, and combine the neural network input data with the user question data to serve as the input of the neural network. And predicting by using the network to obtain the vector representation of the problem pair and the probability of whether the problem pair is related. Calculating similarity S between text vectors in question pairs_iAnd sorting according to the similarity. The calculation formula is as follows:

where i denotes the ith text pair, u_i，v_iRespectively representing the vector representation of two texts in a text pair, t representing the vector dimension, u_ijRepresents a vector u_iOf the j-th dimension of (d), v_ijRepresenting a vector v_iThe value of the jth dimension of (a).

Step B11: and B10, judging whether the user question and the similar question point to the same answer or not according to the similarity sorting result and the corresponding correlation probability obtained in the step B10, determining the answer of the user question according to the judgment logic and returning.

The above embodiments propose a new Siamese network model in which text information is learned based on character features and word features, respectively. In this embodiment, the character feature learning stage uses a Transformer network to mine the dynamic semantic features of the text. The importance of each word in the text is learned by the Attention network in the word feature learning part. In addition, the IDCNN idea is adopted in the model, the perception domain is expanded through a multilayer convolution network, and the context relationship of the text is learned. The model loss calculation part in the present embodiment designs two objective functions. The combined loss calculation method can train the similarity of the representation of the two texts and the correlation of the two texts.

The embodiment provides a dynamic semantic question-answering method and system based on Siamese. The method comprises the following steps: acquiring question and answer data and creating a reverse sorting index library; fine tuning is carried out on a deep neural network based on Siamese by adopting a pre-trained neural network, and vector representation of a text is extracted by utilizing a dynamic semantic representation method; preliminarily screening a candidate problem set by adopting a sorting algorithm; obtaining a semantic text vector of a user problem by using a deep neural network; and calculating the similarity between the text vector of the candidate question and the user question, and determining the answer corresponding to the final user question. In addition, a double-task type training method is adopted in the depth network model based on the Siamese, and the characteristics among texts are better learned by using a double-loss calculation mode; the part-of-speech characteristics are added to improve the performance of the network, so that the model can achieve good effect on a few data sets; the learning of the correlation characteristics between the text pairs is improved to a certain extent by adopting a parameter sharing mechanism, and the number of parameters can be effectively reduced.

The embodiment of the present application further provides a semantic question answering device, which may include:

In this embodiment, a reverse sorting index library corresponding to preset problem data in a problem library is created first, problem pair sample data is generated based on the reverse sorting index library, and then the problem pair sample data is converted into a character coding sequence. The embodiment also converts the part-of-speech information of the preset question data in the question bank into a part-of-speech coded sequence. And determining a loss function of the neural network model through the character coding sequence and the part-of-speech coding sequence, and further determining preset answer data corresponding to the user question by using the neural network model. In the embodiment, the loss function of the neural network model is determined by utilizing the character coding sequence and the part-of-speech coding sequence, dynamic semantic question answering based on Siamese can be realized, and the accuracy of the semantic question answering is improved.

Further, the reverse order index database creation module comprises:

the preprocessing unit is used for executing preprocessing operation on the preset question data in the question bank; wherein the preprocessing operation comprises any one or a combination of any several of a synonym conversion operation, a participle operation and a stop word removal operation;

the statistical unit is used for counting the inverse text frequency index, the word frequency and the part of speech weight of each word in all the preset problem data;

and the index database establishing unit is used for establishing the inverted sorting index database corresponding to preset problem data in the problem database according to the inverse text frequency index, the word frequency and the part-of-speech weight.

Further, the character code sequence generation module comprises:

a positive sample determining unit, configured to determine a first question combination of preset question data corresponding to the same answer in the question database, and set the first question combination as a positive sample of the question-to-sample data;

the similarity sorting unit is used for calculating the problem similarity of each preset problem data and the user problem data according to the reverse sorting index library, and removing the preset problem data which are completely the same as the user problem data to obtain similarity sorting;

and the negative sample determining unit is used for taking the preset problem data with N bits before the similarity ranking and the user problem data as a second problem combination and setting the second problem combination as a negative sample of the problem pair sample data.

Further, the method also comprises the following steps:

the first labeling module is used for labeling a label corresponding to the first question combination as 1 after the first question combination is set as a positive sample of the question pair sample data;

and the second labeling module is used for labeling the label corresponding to the second question combination as 0 after the second question combination is set as the negative sample of the question pair sample data.

Further, the character code sequence generating module is configured to convert the characters corresponding to the sample data to character codes according to the character dictionary, and perform bit complementing on all the character codes to make all the character codes have the same length, so as to obtain the character code sequence.

And further, a part-of-speech coding sequence generation module, configured to convert part-of-speech information corresponding to the sample data of the question into part-of-speech codes according to a part-of-speech dictionary, and perform padding on all part-of-speech codes to make all part-of-speech codes have the same length, so as to obtain the part-of-speech coding sequence.

Further, the loss function determining module is used for generating text vectors corresponding to the character coding sequences by using an Embedding layer and a transform network of the neural network model, and calculating text relevancy loss according to distances between the text vectors; the system is also used for generating a characteristic vector of the part-of-speech coding sequence by utilizing an Attention network and a multilayer convolution neural network of the neural network model and determining Softmax loss of the characteristic vector; and the method is also used for carrying out weighted calculation on the text relevancy loss and the Softmax loss to obtain a loss function of the neural network model.

Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.

The present application also provides a storage medium having a computer program stored thereon, which when executed, may implement the steps provided by the above-described embodiments. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The application further provides an electronic device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided by the foregoing embodiments when calling the computer program in the memory. Of course, the electronic device may also include various network interfaces, power supplies, and the like.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A semantic question answering method is characterized by comprising the following steps:

setting the candidate question data with the highest vector similarity as target question data, and outputting preset answer data corresponding to the target question data;

the construction process of the neural network model based on the combination of the Transformer and the IDCNN comprises the following steps:

step A1: setting a placeholder for the input of the neural network model, and taking a label whether a character coding sequence, a part of speech coding sequence and a text pair are related as the input of the neural network model;

step A2: the character coding sequence is transmitted into a multi-layer Transformer network after being subjected to Embedding, the output of the last layer of Transformer network is taken and transmitted into an Average Pooling layer to reduce the dimension of the characteristics, and the output of the Average Pooling layer is extracted to be used as the representation of a text vector;

step A3: calculating text vector distance, and calculating text relevancy loss according to the text vector distance and the label whether the text pair is relevant or not;

step A4: sending the part-of-speech coding sequence into an Attention network, calculating the importance of the words in the text pair, and obtaining new feature representation of each word according to the importance; sending the new feature representation into a multilayer convolutional neural network so that the multilayer convolutional neural network expands a perception domain and extracts context information; connecting output features of the multilayer convolutional neural network and then sending the output features into a full connection layer so as to extract the features and reduce the dimension of the extracted features to obtain a vector after dimension reduction;

step A5: calculating Softmax loss of the vector after dimensionality reduction;

step A6: and accumulating the text relevancy loss and the Softmax loss as a loss function of the neural network model.

2. The semantic question answering method according to claim 1, wherein the creating of a reverse order index library corresponding to preset question data in a question library comprises:

performing preprocessing operation on the preset question data in the question bank; wherein the preprocessing operation comprises any one or combination of any several of a synonym conversion operation, a participle operation and a stop word removal operation;

3. The semantic question answering method according to claim 1, wherein generating question pair sample data by using the inverted sorting index library comprises:

4. The semantic question answering method according to claim 3, further comprising, after setting the first question combination as a positive sample of the question versus sample data:

marking the label corresponding to the first question combination as 1;

and marking the label corresponding to the second question combination as 0.

5. The semantic question answering method according to claim 1, wherein converting the question pair sample data into a character encoding sequence comprises:

6. The semantic question answering method according to claim 1, wherein converting the part-of-speech information into a part-of-speech encoding sequence comprises:

7. The semantic question-answering method according to claim 1, wherein determining a loss function of a neural network model from the character encoding sequence and the part-of-speech encoding sequence comprises:

8. A semantic question answering device, comprising:

the answer output module is used for setting the candidate question data with the highest vector similarity as target question data and outputting preset answer data corresponding to the target question data;

step A3: calculating text vector distance, and calculating text relevance loss according to the text vector distance and whether the text pair is relevant labels;

step A4: sending the part-of-speech coded sequence into an Attention network, calculating the importance of the words in the text pair, and obtaining a new feature representation of each word according to the importance; sending the new feature representation into a multilayer convolutional neural network so that the multilayer convolutional neural network expands a perception domain and extracts context information; connecting output features of the multilayer convolutional neural network and then sending the output features into a full connection layer so as to extract the features and reduce the dimension of the extracted features to obtain a vector after dimension reduction;

step A5: calculating Softmax loss of the vector after dimensionality reduction;

step A6: the text relevance loss and the Softmax loss are accumulated as a loss function of the neural network model.

9. An electronic device comprising a memory in which a computer program is stored and a processor, wherein the processor implements the steps of the semantic question answering method according to any one of claims 1 to 7 when calling the computer program in the memory.

10. A storage medium having stored thereon computer-executable instructions that, when loaded and executed by a processor, perform the steps of the semantic question answering method according to any one of claims 1 to 7.