CN113988079A - Low-data-oriented dynamic enhanced multi-hop text reading recognition processing method - Google Patents

Low-data-oriented dynamic enhanced multi-hop text reading recognition processing method Download PDF

Info

Publication number
CN113988079A
CN113988079A CN202111144082.3A CN202111144082A CN113988079A CN 113988079 A CN113988079 A CN 113988079A CN 202111144082 A CN202111144082 A CN 202111144082A CN 113988079 A CN113988079 A CN 113988079A
Authority
CN
China
Prior art keywords
model
answer
sentence
data
context
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111144082.3A
Other languages
Chinese (zh)
Inventor
伍赛
任雪峰
陈刚
陈珂
寿黎但
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202111144082.3A priority Critical patent/CN113988079A/en
Publication of CN113988079A publication Critical patent/CN113988079A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a low-data-oriented dynamic enhanced multi-hop text reading identification processing method. Carrying out correction preprocessing on a data set of a document; constructing a dynamically enhanced answer prediction model; using a training set to train the dynamically enhanced answer prediction model as a teacher model; randomly selecting a part of unlabeled data sets, inputting the part of unlabeled data sets into a teacher model, predicting to obtain a label result, establishing pseudo labels, and adding the data sets with the pseudo labels into a training set to form a new training set; retraining the teacher model with the new training set to obtain a student model; continuously repeating the steps for iteration until the model precision result of the verification set meets the requirement of a preset threshold; and predicting the reading document to be tested by using the final student model, and outputting the prediction to obtain the answer of the reading document to be tested. The invention uses a dynamic enhancement method to expand data, can reduce the input length, solves the multi-hop reading understanding problem under the condition of less tag data and enhances the generalization capability of the model.

Description

Low-data-oriented dynamic enhanced multi-hop text reading recognition processing method
Technical Field
The invention belongs to a text data processing method in the technical field of natural language processing, and particularly relates to a low-data-oriented dynamic enhanced multi-hop text reading, identifying and processing method.
Background
The machine reading understanding task requires the machine to answer questions through given context, can be used in the fields of search engines, intelligent assistants and the like, and provides high-quality consulting services for users. Early reading understood that the system was small in scale and limited to a particular field, and was not well suited for use. With the superior performance of deep learning in context information acquisition and the proposal of many large reference data sets, some machine-read understanding models have greatly improved performance on partial single-hop machine-read understanding data sets, but these models still lack the ability to reason across multiple sentences. In recent years, the multi-hop reading understanding data set is proposed to require that a model can carry out reasoning across a plurality of incoherent sentences or even documents, and at present, a large pre-training model is used as a feature extractor for fine adjustment on a specific reading understanding task. This approach requires a large amount of data to drive in the training process of the model. However, the process of labeling data in the real world is very time consuming and laborious, and in some fields, there is not enough sample for labeling.
At present, most of multi-hop reading and understanding tasks have a large amount of data as support, and the low-data condition is rarely researched. However, in the real situation of excessively large annotation cost, the direct use of the conventional reading and understanding model cannot achieve good results. Data enhancement is a good choice for low data situations. The current data enhancement method in the text field focuses on the text classification task, but no paper has proved to have significant effect in the reading and understanding task. The sliding window is used as a reading understanding task data enhancement means and is mostly applied to single-hop reading understanding tasks, however, the sliding window cannot guarantee that all supporting sentences are in the window, and therefore the sliding window is not suitable for a multi-hop situation.
Disclosure of Invention
Training of neural networks requires large-scale data sets for support, however, labeling of data sets is time-consuming and labor-consuming. In order to solve the problems of the multi-hop reading recognition understanding task and the background technology in the low data scene of insufficient data quantity, the invention focuses on the low data set condition, provides a dynamic context enhancement multi-hop text reading recognition processing method facing low data, introduces external knowledge by using a self-training method, and increases an auxiliary data set to improve the model expression.
As shown in fig. 1, the object of the present invention is achieved by the following technical solutions:
step 1: carrying out correction preprocessing on a data set of a document to eliminate the semantic ambiguity of a sample caused by desensitization;
step 2: constructing a dynamically enhanced answer prediction model;
and step 3: taking the data set with known labels processed in the step 1 as a training set, training a dynamically enhanced answer prediction model by using the training set, and taking the trained answer prediction model as a teacher model;
and 4, step 4: randomly selecting a part of unlabeled data set, inputting the part of unlabeled data set into the teacher model obtained in the step 3 to predict a label result, using the label obtained by prediction as a pseudo label, attaching the label to the unlabeled data set to form a data set with the pseudo label, and further adding the data set with the pseudo label into a training set to form a new training set;
and 5: the teacher model obtained in the step 3 is retrained by using the new training set obtained in the step 4, and the trained teacher model is used as a student model;
step 6: adopting a data set with other known labels as a verification set, inputting the verification set into the student model obtained in the step 5 to test and obtain the precision result of the model,
if the model precision result meets the requirement of a preset threshold value, carrying out the next step;
if the model precision result does not meet the preset threshold requirement, returning to the step 3, performing iteration by taking the student model as a teacher model, and continuously repeating the steps 3-5 until the model precision result obtained through the verification set meets the preset threshold requirement;
and 7: and (6) predicting the reading document to be tested by using the student model finally obtained in the step (6), and outputting the label of the reading document to be tested and the answer in the label.
The data set comprises documents, questions and labels, the labels comprise answers, supporting sentences and answer categories, and the answers are composed of answer starting positions and answer ending positions.
The question is a question of the text data in reading understanding, and the answer is an answer result corresponding to the question, specifically a character appearing in the document.
The support sentence is a sentence supporting answer reasoning in reading comprehension and appears at one position in the document.
The answer categories are generally classified into two categories, three categories, and four categories, but are not limited thereto.
In step 2, the dynamically enhanced answer prediction model specifically includes:
2.1, carrying out sentence segmentation from the documents of the data set to obtain each sentence, extracting part or all of the sentences from each sentence, and then combining the problems corresponding to the documents to splice to form a context text;
2.2, inputting the context text into a pre-trained answer prediction model, wherein the answer prediction model can be a BERT model, acquiring a whole feature vector CLS _ ALL of the context text, a problem in the context text and a decoding feature vector of each word of each sentence, forming a problem vector Q by the decoding feature vector of each word of the problem, and forming a context vector C by the decoding feature vector of each word of ALL sentences;
2.3、
A. the overall feature vector CLS _ ALL is subjected to linear layer processing to obtain a predicted answer type, which is expressed as the following formula processing:
type=softmax(Linear(CLS_ALL)/τ)∈R1×4
wherein softmax () represents a softmax activation function, τ represents a hyper-parameter;
B. processing the question vector Q and the context vector C according to the following mode to obtain a context feature vector C ', and extracting two results respectively serving as a predicted answer starting position start and an answer ending position end through a multilayer perceptron according to the context feature vector C':
Q′=Attention_pooling(Q)∈R1×d
C′=Norm(w1·CLS_ALL+w2·Q′+w3·C)∈Rl×d
Figure BDA0003285050230000031
in the formula, w1、w2、w3Respectively representing a first weight, a second weight and a third weight, d representing the dimension of a hidden vector, l representing the length of a sentence, Q' representing a dimension reduction problem vector, Norm () representing a normalization function, MLP () representing a multilayer perceptron, R1×4Representing four dimensions corresponding to four types of answers respectively, d representing the dimension of a hidden layer, and l representing the length of a context;
Figure BDA0003285050230000032
representing the probability of whether each position in the context is an answer start position or an answer end position, and respectively corresponding to an answer start position start and an answer end position end;
C. and processing a support sentence for obtaining prediction output according to the overall feature vector CLS _ ALL, the problem vector Q and the context vector C together in the following way:
SFeature=W·AttentionPooling(C,CLS_ALL,Q)
Figure BDA0003285050230000033
wherein W represents a weight matrix; sigmoid () represents a sigmoid activation function, attentionPooling () represents an attention pooling operation,
Figure BDA0003285050230000034
a supporting sentence representing the predicted output.
The step 7 specifically includes:
step 7.1, segmenting the to-be-detected reading document to obtain sentences, screening the obtained sentences and the problems through a screening model respectively to obtain the relevancy between each sentence and each problem, and arranging K sentences with the highest relevancy according to the sequence of the sentences appearing in the to-be-detected reading document to form a new document;
and 7.2, inputting the new document into the student model finally obtained in the step 6, outputting a label of the new document obtained through prediction, and extracting an answer in the label as an answer of the document to be read.
The screening model specifically uses a Chinese BERT pre-training model, a Roberta model and a Chinese BERT pre-training model to encode the question sentence pairs.
The method of the invention first trains a two-stage reading model as a teacher network. The reading model is a two-stage model. The first stage is a sentence screening model, considering that other sentences do not participate in the reasoning process except for real supporting sentences, the screening model screens Top K sentences which are strongly related to the problem from the sentences, so that the interference of the irrelevant sentences on the model is reduced, and the defect that the text input is too long and can not be directly input into a pre-training model is overcome; and the second stage of the answer prediction model carries out reasoning of the answers and the support sentences, adds an answer type prediction task, and carries out the answer extraction process only when the answers are positioned in the text. Because the training data set is less, a dynamically enhanced answer prediction model is provided, each batch of training data is composed of a real supporting sentence and a plurality of sentences randomly extracted from the context in the training process of the model, the generalization capability of the model is increased by dynamically updating the input, and the sentences screened by the screening model are directly used as the input of the answer prediction model in the reasoning process. In addition, the invention also uses a self-training mode to label the label-free data to expand the data set.
Compared with the prior art, the invention has the following beneficial effects:
according to the method, a set of low-data-oriented dynamic enhancement multi-hop reading recognition processing model is established, and a data set is expanded by using a dynamic enhancement method in the training process of an answer prediction model aiming at the problem of less label data;
aiming at the problem that all documents cannot be input into the model under the condition of low resources, randomly inputting a non-support sentence in a training stage, and screening sentences related to answers of the problem by using a screening model in a prediction stage to reduce the input length; the prediction part of the support sentence part does not use a mainstream graph network for coding any more, but carries out recoding on the basis of an improved Transformer to learn the relation between sentences; and an external data set is introduced, and the generalization capability of the model is enhanced by using a self-training pseudo-labeling method.
The invention is tested to greatly improve the reading and understanding of the data set by CAIL 2020 in the field of Chinese law, proves the effectiveness of the solution, and has certain application value and practical significance for text data in the network.
Drawings
FIG. 1 is a block diagram of the self-training learning method employed in the present invention;
FIG. 2 is a sentence screening model architecture diagram according to the present invention;
FIG. 3 is a diagram of an answer prediction model architecture according to the present invention;
FIG. 4 is a diagram of a contextual sentence feature extraction architecture in accordance with the present invention;
fig. 5 is a sample of data used by the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention shall be described in further detail with reference to the following detailed description and accompanying drawings.
As shown in fig. 1, the embodiment of the present invention and its implementation process are as follows:
the embodiment of the invention is implemented and tested on a CAIL 2020 reading comprehension data set in the field of Chinese law.
The CAIL 2020 reading understanding data set is a reading understanding data set in the Chinese judicial field, part of data in the training set is a CJRC training set, part of the training set is 5100 re-labeled question-answer pairs, the CJRC training set comprises three fields of civil affairs, criminal affairs and administration, the verification set and the test set respectively comprise 1900 question-answer pairs and 2600 question-answer pairs, and the data volume is small. An external data set CJRC used in the experiment is provided by CAIL 2019 for reading and understanding the race course, the data mainly comes from a referee document published by a referee document network and comprises two fields of criminals and civil affairs, a training set comprises 40000 problems, and a verification set and a test set respectively comprise 5000 problems.
Step 1: carrying out correction preprocessing on a data set of a document to eliminate the semantic ambiguity of a sample caused by desensitization;
in the step 1, for each document in the data set, traversing each sentence of the document, matching the triples by adopting a regular matching expression, adding the extracted triples into a name list, then traversing the whole document once to split the name and the digital part, and matching from short to long during matching. Therefore, disambiguation processing aiming at the document is realized, and the aim of semantic ambiguity when word segmentation is brought by data desensitization is fulfilled.
Step 2: constructing a dynamically enhanced answer prediction model;
as shown in fig. 3, the answer prediction model for dynamic enhancement specifically includes:
2.1, obtaining each sentence by sentence division from the document of the data set, extracting part or all of the sentences, and combining the problems corresponding to the document to splice to form a context text. (ii) a As shown in fig. 3, the SEP in the figure indicates a spacer.
2.2, inputting the context text into a pre-trained BERT model, obtaining a whole feature vector CLS _ ALL of the context text, a problem in the context text and a decoding feature vector of each word of each sentence, forming a problem vector Q by the decoding feature vector of each word of the problem, and forming a context vector C by the decoding feature vector of each word of ALL sentences;
2.3、
A. the overall feature vector CLS _ ALL is subjected to linear layer processing to obtain a predicted answer type, which is expressed as the following formula processing:
type=softmax(Linear(CLS_ALL)/τ)∈R1×4
wherein softmax () represents a softmax activation function, τ represents a weight;
B. processing the question vector Q and the context vector C according to the following mode to obtain a context feature vector C ', and extracting two results respectively serving as a predicted answer starting position start and an answer ending position end through a multilayer perceptron according to the context feature vector C':
Q′=Attention_pooling(Q)∈R1×d
C′=Norm(w1·CLS_ALL+w2·Q′+w3·C)∈Rl×d
Figure BDA0003285050230000061
in the formula, w1、w2、w3Respectively representing a first weight, a second weight and a third weight, d representing the dimension of a hidden vector, l representing the length of a sentence, Q' representing a dimension reduction problem vector, Norm () representing a normalization function, MLP () representing a multilayer perceptron, R1×4Representing four dimensions corresponding to four types of answers respectively, d representing the dimension of a hidden layer, and l representing the length of a context;
Figure BDA0003285050230000062
representing the probability of whether each position in the context is an answer start position or an answer end position, and respectively corresponding to an answer start position start and an answer end position end;
C. as shown in fig. 4, the support sentence for obtaining the prediction output is processed according to the overall feature vector CLS _ ALL, the problem vector Q, and the context vector C together in the following manner:
SFeature=W·AttentionPooling(C,CLS_ALL,Q)
Figure BDA0003285050230000063
wherein, W represents a weight matrix, initialized to a constant; sigmoid () represents a sigmoid activation function, attentionPooling () represents an attention pooling operation,
Figure BDA0003285050230000064
a supporting sentence representing the predicted output.
It can be seen from the above that, the answer category prediction of the present invention is to perform a four-classification task, and a weight τ is used to punish each output value, and the bit with the highest probability is taken as the answer category.
If the answer is in the context, the answer is extracted from the document by using the answer extraction task. The answer extraction task is to perform weighted summation on the Q dimension reduction obtained by recoding and CLS _ ALL in each dimension of C, then change the output into 2 dimensions by using a multilayer perceptron, and predict the starting position and the ending position of the answer respectively.
In a specific implementation, the dimension sizes of q and k during calculation are increased to re-encode sentence vectors, a parameter matrix is used to superpose sentence information obtained from each re-encoding to perform sentence feature fusion, then a multilayer perceptron is used to convert the sentence information into output of n × 1 dimensions, and the sentence with each dimension higher than the threshold is determined as a supporting sentence, as shown in fig. 4.
And step 3: taking the data set with known labels processed in the step 1 as a training set, training a dynamically enhanced answer prediction model by using the training set, and taking the trained answer prediction model as a teacher model;
and 4, step 4: randomly selecting a part of unlabeled data set, inputting the part of unlabeled data set into the teacher model obtained in the step 3 to predict a label result, using the label obtained by prediction as a pseudo label, attaching the label to the unlabeled data set to form a data set with the pseudo label, and further adding the data set with the pseudo label into a training set to form a new training set;
thus, the self-training method is used for marking pseudo labels on the label-free data set, adding the pseudo labels into the label data, increasing the capacity of the training set, improving the training effect and realizing a model with higher precision and prediction thereof.
And 5: the teacher model obtained in the step 3 is retrained by using the new training set obtained in the step 4, and the trained teacher model is used as a student model;
step 6: adopting a data set with other known labels as a verification set, inputting the verification set into the student model obtained in the step 5 to test and obtain the precision result of the model,
if the model precision result meets the requirement of a preset threshold value, carrying out the next step;
if the model precision result does not meet the preset threshold requirement, returning to the step 3, performing iteration by taking the student model as a teacher model, and continuously repeating the steps 3-5 until the model precision result obtained through the verification set meets the preset threshold requirement;
in the concrete implementation of the invention, a teacher model and a student model use models with the same topological structure, dropout noise is added in the stage of the student model to increase the learning difficulty, and noise is not added when a pseudo label is generated in the stage of the teacher model.
And 7: and (6) predicting the reading document to be tested by using the student model finally obtained in the step (6), outputting the label of the reading document to be tested and the answer contained in the label, wherein the label is obtained by prediction.
And (4) the reading text to be detected is also input into the student model after being subjected to correction preprocessing in the step (1).
Step 7.1, performing sentence segmentation on the reading document to be detected to obtain each sentence, respectively performing screening processing on the obtained sentences and the problems through a screening model to obtain the correlation degree between each sentence and the problem, and arranging K sentences with the highest correlation degree according to the sequence of the sentences appearing in the reading document to be detected to form a new document, as shown in FIG. 2;
and 7.2, inputting the new document into the student model finally obtained in the step 6, outputting a label of the new document obtained through prediction, and extracting an answer in the label as an answer of the document to be read.
The screening model implemented specifically uses a Chinese BERT pre-training model that encodes the question sentence pair. The Chinese BERT pre-training model is loaded with pre-training weights, and the training process of the Chinese BERT pre-training model all uses a fine-tuning form. The loss value calculation method is a binary cross entropy loss function and is optimized by using a gradient descent optimizer of Adam self-adaptive learning rate of a Warmup mechanism.
Therefore, the method and the device can better reduce the interference of irrelevant noise aiming at the situation to be detected by processing the reading document to be detected through the screening model, and realize the effects and advantages of utilizing the effective information of the long text information and reducing the interference of the irrelevant information.
And training an answer prediction model by using a dynamic enhancement method on all context sentences based on the idea of multi-task learning, wherein the training comprises three subtasks of answer category prediction, answer extraction and support sentence prediction.
In the training process of the answer prediction model, randomly selecting a part of sentences from the documents of the data set except the support sentences to form context texts together with the support sentences as dynamically enhanced context sentences;
and in the process of processing to be tested, the answer prediction model takes the new document obtained by screening the model in the step 7.1 as the context text.
All random seeds are fixed in the training process, and the batch equal super parameters are kept the same; the loss function L is composed of three parts of answer categories, span prediction and support sentences:
Figure BDA0003285050230000081
Figure BDA0003285050230000082
wherein L represents the overall loss function, α represents the weight occupied by the answer category loss function, β represents the weight occupied by the support data prediction,
Figure BDA0003285050230000083
indicates the probability of the predicted answer belonging to each category, ytypeThe true answer category is represented by the number of answers,
Figure BDA0003285050230000084
probability of whether each predicted sentence is a supporting sentence, ysfA tag representing a real supporting sentence is displayed,
Figure BDA0003285050230000085
representing the probability of whether the respective position in the predicted context is the answer start position,
Figure BDA0003285050230000086
denotes the probability of whether each position in the predicted context is the answer end position, ystartPosition in the context, y, representing the starting position of the real answerendRepresents the position of the end position of the real answer in the context, CE () represents a multivariate cross entropy loss function, BCE () represents a dyadic cross entropy loss function, LansA loss function representing the answer.
In the loss function, the starting position and the ending position respectively calculate cross entropy loss, and the cross entropy loss is added into the whole loss calculation after the average value is taken. Because the learning difficulty of each task is different, two weights of alpha and beta are added to control different differences, and the values in the experiment are respectively 0.5 and 0.8.
The example conditions were as follows:
fig. 5 is a sample data in the CAIL 2020 reading understanding data set, and the sample is taken as an example to describe the dynamic enhanced multi-hop reading understanding method facing low data according to the present invention.
(1) And preprocessing the document and performing semantic disambiguation.
(1) Training a sentence screening model and an answer prediction model by using the labeled data in the training set, and taking the sentence screening model and the answer prediction model as a teacher model; randomly selecting a part of label-free data by using a teacher model, printing pseudo labels by using a self-training method, and adding the pseudo labels into the label data; training a student model by using the obtained new training set, and iterating the student model to a teacher model; this process is repeated until the metrics are no longer elevated on the validation set; and testing the test sample.
(1) For the example shown in fig. 5, after the documents are divided into sentences according to punctuations, several desensitized names of wu x0, wu x2, chen x3, wu x6, wu x9 and changyu 17 are obtained, and then the disambiguation is performed on wu x22.1 mu, chen x31.95 mu, wu x61.98 mu, wu x90.99 mu and changyu 171.47 mu in the sentence [5] according to the obtained names, so as to obtain a sentence of "original wu x0 subcontracts itself to wu x22.1 mu, chen x31.95 mu, wu x61.98 mu, wu x90.99 mu and changyu 171.47 mu in 2001";
(1) and respectively screening the context of each sentence and the problem through a screening model to obtain the relevance score of each clause and the problem, and storing K sentences with the highest relevance to the answer of the problem according to the sequence of the sentences. In the data set K, 15 is selected, and the selected sentences obtained in the example are [2], [3], [4], [5], [9], [10], [11], [12], [13], [14], [15], [17], [18], [19] and [21 ].
(1) Splicing the question and the sentence screened in the step (4) and inputting the spliced question and the sentence into an answer prediction model to obtain an integral feature vector CLS _ ALL, a question vector Q and a context vector C; the CLS _ ALL vector is used for performing a four-classification task to predict the answer type, and the answer type is extracted, so that the answer needs to be extracted from the document. And carrying out weighted summation on Q dimensionality reduction obtained by recoding the problem vector and CLS _ ALL in each dimension of C, and then changing the output into 2 dimensions by using a multilayer perceptron to respectively predict the initial position of the answer.
Figure BDA0003285050230000091
Q′=Attention_pooling(Q)∈R1×d
C′=Norm(w1·CLS_ALL+w2·Q′+w3·C)∈Rl×d
Figure BDA0003285050230000092
In the formula, w represents weight, d represents dimension of hidden vector, and l represents length of sentence.
(6) And respectively taking 10 positions with the maximum scores at the starting position and the ending position, matching two positions, removing the position pair of which the answer position is not in the context section part or the ending position is in the starting position, and taking the starting position and the ending position with the maximum scores as the starting position and the ending position of the final answer to obtain the answer 'Wu x 6' of the question.
(7) On the basis of a Transformer structure, increasing the dimension size of q and k during calculation to recode sentence vectors, simultaneously using a parameter matrix to superpose sentence information obtained from each recoding for sentence characteristic fusion, then using MLP to convert the sentence information into output of n multiplied by 1 dimension, and judging that each dimension is higher than a threshold value of 0.5 as a support sentence;
SFeature=W·AttentionPooling(C,CLS_ALL,Question)
Figure BDA0003285050230000093
in the formula, W represents a weight matrix, and is initialized to a constant.
(8) And (3) obtaining support sentences [3], [8], [9] and [11] from the support sentences [ 7], mapping the support sentences back to [5], [13], [14] and [17], and obtaining a final support sentence.
The foregoing description of the embodiments is provided to enable one of ordinary skill in the art to make and use the invention, and it is to be understood that other modifications of the embodiments, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty, as will be readily apparent to those skilled in the art. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.

Claims (5)

1. A low-data-oriented dynamic enhancement multi-hop text reading recognition processing method is characterized by comprising the following steps:
step 1: carrying out correction preprocessing on a data set of a document;
step 2: constructing a dynamically enhanced answer prediction model;
and step 3: taking the data set with known labels processed in the step 1 as a training set, training a dynamically enhanced answer prediction model by using the training set, and taking the trained answer prediction model as a teacher model;
and 4, step 4: randomly selecting a part of unlabeled data set, inputting the part of unlabeled data set into the teacher model obtained in the step 3 to predict a label result, using the label obtained by prediction as a pseudo label, attaching the label to the unlabeled data set to form a data set with the pseudo label, and further adding the data set with the pseudo label into a training set to form a new training set;
and 5: the teacher model obtained in the step 3 is retrained by using the new training set obtained in the step 4, and the trained teacher model is used as a student model;
step 6: adopting a data set with other known labels as a verification set, inputting the verification set into the student model obtained in the step 5 to test and obtain the precision result of the model,
if the model precision result meets the requirement of a preset threshold value, carrying out the next step;
if the model precision result does not meet the preset threshold requirement, returning to the step 3, performing iteration by taking the student model as a teacher model, and continuously repeating the steps 3-5 until the model precision result obtained through the verification set meets the preset threshold requirement;
and 7: and (6) predicting the reading document to be tested by using the student model finally obtained in the step (6), and outputting the label of the reading document to be tested and the answer in the label.
2. The low-data-oriented dynamically-enhanced multi-hop text reading identification processing method as claimed in claim 1, wherein: the data set includes documents, questions, and tags including answers, supportive sentences, and answer categories.
3. The low-data-oriented dynamically-enhanced multi-hop text reading identification processing method as claimed in claim 1, wherein: in step 2, the dynamically enhanced answer prediction model specifically includes:
2.1, carrying out sentence segmentation from the documents of the data set to obtain each sentence, extracting part or all of the sentences from each sentence, and then combining the problems corresponding to the documents to splice to form a context text;
2.2, inputting the context text into a pre-trained answer prediction model, obtaining a whole feature vector CLS _ ALL of the context text, a problem in the context text and a decoding feature vector of each word of each sentence, forming a problem vector Q by the decoding feature vector of each word of the problem, and forming a context vector C by the decoding feature vector of each word of each sentence;
2.3、
A. the overall feature vector CLS _ ALL is subjected to linear layer processing to obtain a predicted answer type, which is expressed as the following formula processing:
type=softmax(Linear(CLS_ALL)/τ)∈R1×4
wherein softmax () represents a softmaax activation function, τ represents a hyper-parameter;
B. processing the question vector Q and the context vector C according to the following mode to obtain a context feature vector C ', and extracting two results respectively serving as a predicted answer starting position start and an answer ending position end through a multilayer perceptron according to the context feature vector C':
Q′=Attention_pooling(Q)∈R1×d
C′=Norm(w1·CLS_ALL+w2·Q′+w3·C)∈Rl×d
Figure FDA0003285050220000021
in the formula, w1、w2、w3Respectively representing a first weight, a second weight and a third weight, d representing the dimension of a hidden vector, l representing the length of a sentence, Q' representing a dimension reduction problem vector, Norm () representing a normalization function, MLP () representing a multilayer perceptron, R1×4Representing four dimensions corresponding to four types of answers respectively, d representing the dimension of a hidden layer, and 1 representing the length of a context;
Figure FDA0003285050220000022
representing the probability of whether each position in the context is an answer start position or an answer end position, and respectively corresponding to an answer start position start and an answer end position end;
C. and processing a support sentence for obtaining prediction output according to the overall feature vector CLS _ ALL, the problem vector Q and the context vector C together in the following way:
SFeature=W·AttentionPooling(C,CLS_ALL,Q)
Figure FDA0003285050220000023
wherein W represents a weight matrix; sigmoid () represents a sigmoid activation function, attentionPooling () represents an attention pooling operation,
Figure FDA0003285050220000024
a supporting sentence representing the predicted output.
4. The low-data-oriented dynamically-enhanced multi-hop text reading identification processing method as claimed in claim 1, wherein: the step 7 specifically includes:
step 7.1, segmenting the to-be-detected reading document to obtain sentences, screening the obtained sentences and the problems through a screening model respectively to obtain the relevancy between each sentence and each problem, and arranging K sentences with the highest relevancy according to the sequence of the sentences appearing in the to-be-detected reading document to form a new document;
and 7.2, inputting the new document into the student model finally obtained in the step 6, outputting a label of the new document obtained through prediction, and extracting an answer in the label as an answer of the document to be read.
5. The low-data-oriented dynamically-enhanced multi-hop text reading identification processing method as claimed in claim 4, wherein: the screening model specifically uses a Chinese BERT pre-training model, which encodes question sentence pairs.
CN202111144082.3A 2021-09-28 2021-09-28 Low-data-oriented dynamic enhanced multi-hop text reading recognition processing method Pending CN113988079A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111144082.3A CN113988079A (en) 2021-09-28 2021-09-28 Low-data-oriented dynamic enhanced multi-hop text reading recognition processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111144082.3A CN113988079A (en) 2021-09-28 2021-09-28 Low-data-oriented dynamic enhanced multi-hop text reading recognition processing method

Publications (1)

Publication Number Publication Date
CN113988079A true CN113988079A (en) 2022-01-28

Family

ID=79737069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111144082.3A Pending CN113988079A (en) 2021-09-28 2021-09-28 Low-data-oriented dynamic enhanced multi-hop text reading recognition processing method

Country Status (1)

Country Link
CN (1) CN113988079A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691827A (en) * 2022-03-17 2022-07-01 南京大学 Machine reading understanding method based on iterative screening and pre-training enhancement
CN114936296A (en) * 2022-07-25 2022-08-23 达而观数据(成都)有限公司 Indexing method, system and computer equipment for super-large-scale knowledge map storage
CN114969343A (en) * 2022-06-07 2022-08-30 重庆邮电大学 Weak supervision text classification method combining relative position information
CN117313732A (en) * 2023-11-29 2023-12-29 南京邮电大学 Medical named entity identification method, device and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691827A (en) * 2022-03-17 2022-07-01 南京大学 Machine reading understanding method based on iterative screening and pre-training enhancement
CN114969343A (en) * 2022-06-07 2022-08-30 重庆邮电大学 Weak supervision text classification method combining relative position information
CN114969343B (en) * 2022-06-07 2024-04-19 重庆邮电大学 Weak supervision text classification method combined with relative position information
CN114936296A (en) * 2022-07-25 2022-08-23 达而观数据(成都)有限公司 Indexing method, system and computer equipment for super-large-scale knowledge map storage
CN117313732A (en) * 2023-11-29 2023-12-29 南京邮电大学 Medical named entity identification method, device and storage medium
CN117313732B (en) * 2023-11-29 2024-03-26 南京邮电大学 Medical named entity identification method, device and storage medium

Similar Documents

Publication Publication Date Title
CN111291185B (en) Information extraction method, device, electronic equipment and storage medium
CN110188358B (en) Training method and device for natural language processing model
CN108984526B (en) Document theme vector extraction method based on deep learning
CN110765775B (en) Self-adaptive method for named entity recognition field fusing semantics and label differences
CN110609891A (en) Visual dialog generation method based on context awareness graph neural network
CN111414461B (en) Intelligent question-answering method and system fusing knowledge base and user modeling
CN113988079A (en) Low-data-oriented dynamic enhanced multi-hop text reading recognition processing method
CN111985239A (en) Entity identification method and device, electronic equipment and storage medium
CN113626589B (en) Multi-label text classification method based on mixed attention mechanism
CN109977199B (en) Reading understanding method based on attention pooling mechanism
CN111858878B (en) Method, system and storage medium for automatically extracting answer from natural language text
CN111966812A (en) Automatic question answering method based on dynamic word vector and storage medium
CN113536801A (en) Reading understanding model training method and device and reading understanding method and device
CN110851594A (en) Text classification method and device based on multi-channel deep learning model
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
Dai et al. Hybrid deep model for human behavior understanding on industrial internet of video things
CN115270797A (en) Text entity extraction method and system based on self-training semi-supervised learning
CN115203507A (en) Event extraction method based on pre-training model and oriented to document field
CN115391520A (en) Text emotion classification method, system, device and computer medium
US20230121404A1 (en) Searching for normalization-activation layer architectures
CN117708324A (en) Text topic classification method, device, chip and terminal
CN111813938A (en) Record question-answer classification method based on ERNIE and DPCNN
CN116680407A (en) Knowledge graph construction method and device
CN115934944A (en) Entity relation extraction method based on Graph-MLP and adjacent contrast loss
CN113342964B (en) Recommendation type determination method and system based on mobile service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination