CN113901990A - Case and news correlation analysis method for multi-view integrated learning - Google Patents
Case and news correlation analysis method for multi-view integrated learning Download PDFInfo
- Publication number
- CN113901990A CN113901990A CN202111078776.1A CN202111078776A CN113901990A CN 113901990 A CN113901990 A CN 113901990A CN 202111078776 A CN202111078776 A CN 202111078776A CN 113901990 A CN113901990 A CN 113901990A
- Authority
- CN
- China
- Prior art keywords
- news
- case
- information
- learner
- cases
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000010219 correlation analysis Methods 0.000 title claims abstract description 15
- 230000007246 mechanism Effects 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 20
- 239000013598 vector Substances 0.000 claims description 37
- 238000004364 calculation method Methods 0.000 claims description 23
- 238000013527 convolutional neural network Methods 0.000 claims description 17
- 230000000694 effects Effects 0.000 claims description 15
- 238000004458 analytical method Methods 0.000 claims description 11
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 230000000007 visual effect Effects 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 5
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 230000010354 integration Effects 0.000 abstract description 8
- 238000000605 extraction Methods 0.000 abstract description 4
- 238000003058 natural language processing Methods 0.000 abstract description 3
- 238000002474 experimental method Methods 0.000 description 15
- 239000011159 matrix material Substances 0.000 description 11
- 230000006870 function Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 230000002452 interceptive effect Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000012418 validation experiment Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a case and news correlation analysis method for multi-view integrated learning, belonging to the technical field of natural language processing. The invention comprises the following steps: the method comprises the steps of taking a twin network as a basic frame, combining a CNN network, a transform coding network and a theme model to realize feature extraction of local information, structural information and theme information, considering that case elements contain key semantic information of cases, using the case elements to guide three pre-training base learners to obtain directional information, constructing a weight learner through a multi-head self-attention mechanism, combining three directional information and calculating a Manhattan distance, and finally obtaining a more balanced and reasonable similarity relation. The experimental result shows that compared with the method based on semantic similarity, the multi-view integration method disclosed by the invention has the advantage that the F1 value is improved by 2.5%.
Description
Technical Field
The invention relates to a case and news correlation analysis method for multi-view integrated learning, belonging to the technical field of natural language processing.
Background
In recent years, the application of deep learning in text similarity calculation is of great interest, and common methods can be roughly divided into three categories: a representational network, an interactive network, a pre-trained language model. The representative Network includes a twin Network framework (Siamese Network), which is a neural Network based on a group of networks having the same parameters. Interactive networks, represented as ESIMs, are mainly characterized by the framework of capturing more interactive features between two sentences. However, the matching operation of the framework in capturing the interaction information often needs to consume part of the time. In addition, the pre-training language model which is emerging in recent years achieves excellent effects, and is represented by BERT, RoBERTA and the like. BERT is one of the key innovations in the latest evolution of context representation learning. BERT employs a fine-tuning method that requires little or no specific architecture for each final task, achieving the most advanced performance among many NLP tasks.
From the above analysis, the characteristic that the text content between cases and news is too different results in the need of using multiple perspectives to jointly model the similarity. The integrated learning can combine a plurality of different individual learners to obtain better results, and the heterogeneous individual learners can represent a plurality of similar angles and can be well suitable for case and news correlation analysis. Therefore, the invention uses the integrated learning thought to select three individual learners to represent three different visual angles by taking the previous work as reference so as to deeply explore the similarity problem of cases and news.
Disclosure of Invention
The invention provides a case and news correlation analysis method for multi-view integrated learning, which is used for improving the accuracy of case and news correlation analysis; the invention uses twin network frame as base, follows the thought of integrated learning, selects three network structures with different characteristics to represent three visual angles, and constructs a local information learner, a structure information learner and a theme information learner, so that the local information learner, the structure information learner and the theme information learner can obtain semantic features and keep different emphasis. The three individual learners are pre-trained separately so that each learner gets the best results. And finally, combining the three kinds of information by a weight learner constructed by a multi-head attention mechanism to obtain the final similarity measurement.
The technical scheme of the invention is as follows: the case and news correlation analysis method for multi-view ensemble learning comprises the following specific steps:
step1, constructing a local information learner by using a CNN network, and acquiring local information similarity between cases and news;
step2, constructing a local information learner by using a Transformer network, and acquiring structural information similarity between cases and news;
step3, constructing a local information learner by using a pre-training topic model, and acquiring topic information similarity between cases and news;
step4, constructing a weight learner by using a multi-head attention mechanism, and jointly judging the similarity degree from multiple angles.
As a preferred embodiment of the present invention, the Step1 specifically comprises the following steps:
step1.1, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and obtaining weighted case and news feature vectors;
step1.2, extracting local information of the feature vectors of cases and news by using a CNN network, and after pooling operation, performing weight learning on an output channel of the CNN by using a self-attention mechanism for improving the weight of important local information.
And Step1.3, performing Manhattan distance calculation on the extracted local information coding vectors of the cases and the news to obtain a final similarity relation.
As a preferable scheme of the invention, the Step2 comprises the following specific steps:
step2.1, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and adding absolute position coding information of case and news texts to obtain weighted case and news feature vectors.
And Step2.2, extracting structural information of case and news characteristic vectors containing position coding information by using a Transformer network layer.
And Step2.3, performing Manhattan distance calculation on the extracted structural information coding vectors of the cases and the news to obtain a final similarity relation.
As a preferred embodiment of the present invention, the Step3 specifically comprises the following steps:
step3.1, using a variational self-encoder (VAE) to perform unsupervised pre-training on all data of cases and news to obtain an unsupervised topic model.
Step3.2, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and adding case and news topic vectors extracted by a topic model to obtain weighted case and news feature vectors.
Step3.3, using a bidirectional LSTM network layer to extract the subject information of case and news feature vectors containing the subject information.
And Step3.4, performing Manhattan distance calculation on the extracted subject information coding vectors of the cases and the news to obtain a final similarity relation.
As a preferred embodiment of the present invention, the Step4 specifically comprises the following steps:
after an individual learner is used for learning a single visual angle and the pre-training effect is optimized, in order to more balance the importance degree of three representations, weight learning is carried out by combining the three representations respectively obtained by Step1, Step2 and Step3, weight information under different angles is obtained by using a multi-head self-attention mechanism, then distance calculation is carried out through a feedforward neural network, and finally the final similarity y is obtained.
outputc=fn(MultiHead([Partc;Compositionc;Topicc])) (1)
outputx=fn(MultiHead([Partx;Compositionx;Topicx])) (2)
y=1-Sigmoid(Manhattan(outputc,outputx)) (3)
Wherein outputcAnd outputxThe output of the feed-forward neural network is represented, representing the final characterization of cases and news.
The invention has the beneficial effects that:
the invention discloses a case and news correlation analysis method for multi-view integrated learning, and aims to improve the accuracy of case and news correlation analysis. Aiming at the problems of unbalanced distribution and overlarge content difference between case description texts and news texts, the invention provides the judgment of the similarity between cases and news from multiple perspectives, and the validity of the invention is verified through experiments.
Drawings
FIG. 1 is a block diagram of a specific process of the present invention;
FIG. 2 is a diagram of a local information learner in accordance with the present invention;
FIG. 3 is a diagram of a structural information learner in accordance with the present invention;
FIG. 4 is a schematic diagram of a VAE topic learner in accordance with the present invention;
FIG. 5 is a diagram of a weight learner model constructed based on a multi-head self-attention mechanism in accordance with the present invention.
Detailed Description
Example 1: as shown in fig. 1 to 5, the method for analyzing case and news relevance of multi-view ensemble learning of the present invention comprises:
step1, constructing a local information learner by using a CNN network, and acquiring local information similarity between cases and news;
as shown in the local information learner in fig. 2, case elements are used as external guidance, and the output channels of the CNN network are weighted by using a self-attention mechanism, so that the capturing capability of the network on local information is improved. The local information learner is pre-trained and uses cross entropy loss as a loss function and the Adam algorithm as an optimizer.
Step2, constructing a local information learner by using a Transformer network, and acquiring structural information similarity between cases and news;
as shown in the structural information learner in FIG. 3, the encoding layer of the Transformer is a self-attribute-based network structure, and residual connection is performed after each processing, so that the network is favorable for acquiring global information of cases and news. Here again, case element external guidance is used, position coding information is added in an Embedding layer, the capture capability of the network for global structure information is enhanced, cross entropy loss is used as a loss function, and Adam algorithm is used as an optimizer for pre-training.
Step3, constructing a local information learner by using a pre-training topic model, and acquiring topic information similarity between cases and news;
as shown in the VAE topic learner of fig. 4, the variational auto-encoder (VAE) based topic model is an unsupervised document generation model that aims to extract potential topic features from the word vector space of a document and generate a corresponding document. Suchi et al used VAE to extract topic information to assist in text classification tasks, and the present invention refers to predecessor work, uses pre-trained VAE to obtain topic information and uses it to assist in building topic information learners.
Step4, constructing a weight learner by using a multi-head attention mechanism, and jointly judging the similarity degree from multiple angles. After an individual learner is used for learning a single visual angle and the pre-training effect is optimized, in order to more balance the importance degree of three characteristics, weight learning is carried out by combining three characteristics respectively obtained by Step1, Step2 and Step3, weight information under different angles is obtained by using a multi-head self-attention mechanism, then distance calculation is carried out through a feedforward neural network, and finally the final similarity is obtained.
As shown in fig. 5, which is a diagram of a model structure of a weight learner constructed based on a multi-head self-attention mechanism, after three types of individual learners are pre-trained, the weight learners are trained using their output results as a training set. Different from the traditional integrated learning combination strategy, the method selects the final output characterization part of each individual learner as a training set, so that the weight learner can learn the information of the original text.
Step5, crawling original news corpora on news websites such as microblogs through XPath, and performing operations such as data preprocessing and data set division on the original corpora.
Step6, and taking accuracy (Acc.), precision (P), recall (R) and F1 values and Q statistic as evaluation indexes to measure the experimental effectiveness of the invention.
Step7, the invention mainly adopts six classic text similarity calculation models as baseline models to carry out comparison experiments, and the baseline models comprise a twin network model, an aggregation-matching model and a pre-training model.
Step8, in order to verify the effectiveness of the method of the invention on the case and news correlation analysis task, 6 baseline models are adopted to carry out comparison experiments, and the experimental results are analyzed.
As a preferred embodiment of the present invention, the Step1 specifically comprises the following steps:
step1.1, firstly, manually marking case description to obtain case elements, and compressing news texts according to news titles to remove news redundant information. The case coded by sharing the Embedding layer is described asThe news text isThe case element isWhere m represents the length of the case description text, n represents the news text length, and k represents the number of case elements. After obtaining the coding matrix of the three kinds of information, the case element coding matrix E is used for guiding and weighting the news coding matrix X, and the specific operation is as follows:
H=XWE∈Rn*v,W∈Rv*k (1)
X'=tanh(H)∈Rn*v (2)
x' represents a news coding matrix guided by case elements. The operations of the formula (1) and the formula (2) can be understood as a simplified version of attention mechanism, the weight matrix W is a trainable matrix, can be regarded as a weighting function between the case element matrix E and the news coding matrix X, the weighting rule of E to each element of X can be learned through model autonomous learning, and then the weighted matrix elements are mapped to a 0-1 interval through a tanh activation function, so that the convergence speed is improved.
Step1.2, respectively inputting words described by the case into the C and a news matrix X' generated by the guidance of case elements into the CNN layer of the shared parameters for convolution to obtain windowed local information. And then reducing the space size of the data through a pooling operation to relieve the fitting, wherein the specific operations are as follows:
hiddenx=MaxPooling(CNN(X')) (3)
hiddenc=MaxPooling(CNN(C)) (4)
wherein Maxploling represents pooling operation, hiddenxAnd hiddencRepresenting the local information extracted by the CNN network.
Will hibdenxAnd hiddencWeighting respectively through a self-attention mechanism, and obtaining local information representation vectors, namely Part, of the news text and the case text after passing through a feedforward neural network layerxAnd Partc. As shown in formulas (5) and (6).
Partx=fn(Attention(hiddenx)) (5)
Partc=fn(Attention(hiddenc)) (6)
It should be noted that the self-attention mechanism here weights the output channels of the CNN network, because each channel represents local information obtained by convolution operation of one convolution kernel, and it is more important to use the self-attention mechanism for the output channels to indicate which convolution kernel is autonomously learned by the network to obtain the output of the local information. The self-attention mechanism is implemented as follows.
Q, K, V are all vector type and the three are equal. Calculating the dot product of Q and K and dividing bySo as to control the inner product of Q and K not to be too large.
Step1.3, the local information learner of the present invention selects Manhattan distance to calculate the distance between the news text and the case text, so as to measure the similarity difference between the two.
yP=1-Sigmoid(Manhattan(Partx,Partc)) (8)
Manhattan represents Manhattan distance calculation, Manhattan distance between two local information representation vectors of cases and news is obtained through the function, and then the distance value is mapped into a range of 0-1 through a Sigmoid function. Since the similarity is higher as the distance is closer, i.e., the distance and the similarity are inversely proportional, the result of subtracting Sigmoid from 1 is used as the final similarity yp。
As a preferable scheme of the invention, the Step2 comprises the following specific steps:
step2.1, the structure information learner and the local information learner have similar structures, except that the structure information learner adds position information in an Embedding layer, and a middle shared network layer is replaced by a transform coding layer. This is because the first half of the entire network is being encoded and weighted, which can be seen as the collection of information, while the middle shared network layer is the part where feature extraction is performed. The position information used here is absolute position information in the native transform, i.e. the absolute position of each word is calculated, resulting in a fixed position matrix.
Where pos is the index of the position of each word, whose size is equal to the size of the vocabulary, dmodelIs the coding dimension, i denotes an arbitrary dimension and 2 x i ═ dmodel。
Step2.2, obtaining word vectors C and X 'by the case and news through the Embedding process which is the same as the local information learner in the step1, and combining the word vectors C and X' with the position code CposAnd XposForm a complete word vector CwholeAnd XwholeAnd then, obtaining text structure information through a transform coding layer. The specific process is shown in the following formulas (11) to (14):
Xattn=Norm([MultiHead(Xwhole);Xwhole]) (11)
Cattn=Norm([MultiHead(Cwhole);Cwhole]) (12)
Compositionx=Norm([fn(Xattn);Xattn]) (13)
Compositionc=Norm([fn(Cattn);Cattn]) (14)
wherein XattnAnd CattnRepresents the output of the normalized multi-headed autofocusing mechanism, CompositionxAnd CompositioncStructural information representing cases and news.
Step2.3, and finally calculating the final similarity y through the Manhattan distancec。
yc=1-Sigmoid(Manhattan(Compositionx,Compositionc)) (15)
As a preferred embodiment of the present invention, the Step3 specifically comprises the following steps:
the step3.1, VAE architecture is an encoder-decoder architecture. In the encoder, the input is compressed to the underlying subject Z, and the decoder reconstructs the input signal D from the distribution of Z in the underlying space of data by sampling.
Where Z represents a potential topic and P (D | Z) describes the probability of generating D from Z.
Typically, the VAE model assumes that the posterior probabilities of the underlying subject Z of the input data D approximately satisfy a gaussian distribution, i.e.:
logP(Z|d(i))=logN(z;μ(i),δ2(i)I) (17)
wherein d is(i)Representing a real sample in D, each of μ and δ2Are all formed by(i)Generated by a neural network.
Passing through mu(i)And delta2(i)Further obtain each d(i)Corresponding distribution P (Z)(i)|d(i)) Then through a decoding networkIs reconstructed to obtain
μ(i)=f1(d(i)) (18)
logδ2(i)=f2(d(i)) (19)
In order to make the reconstructed data as close to the original data as possible, the final optimization goal of the VAE is to maximize d(i)Generation probability P (d)(i)) At the same time, the posterior probability P (Z) obtained from the data is used by utilizing KL divergence(i)|d(i)) As close as possible to its theoretical variational probability, i.e., N (0, I). The expression of this optimization objective is shown in equation (20).
Step3.2, as shown in FIG. 4, where D represents case text and news text entered, two different processes are required for D. Firstly, the case and the news text in the step D are coded in the same way as the local learner in the step1, case elements are used for guidance, and case representation C and news representation X' guided by the case elements are obtained. In addition, case text and news text are respectively input into a pre-training VAE topic model, and potential topic vectors Z of the case and the news are respectively obtainedcAnd ZxAs shown in formula (21) and formula (22).
ZC=PreTrainedVAE(C)∈Rtopic_size (21)
ZX=PreTrainedVAE(X)∈Rtopic_size (22)
Where topic _ size represents a preset number of potential topics. And splicing the theme vector, the case vector and the news vector, then interacting the theme information and the text information through a bidirectional LSTM, and finally obtaining the theme information representation of the case and the news through a full-connection network. The specific operation is shown in formulas (23) to (25):
Topicx=MLP(BiLSTM([ZX;X'])) (23)
Topicc=MLP(BiLSTM([ZC;C])) (24)
yT=1-Sigmoid(Manhattan(Topicx,Topicc)) (25)
wherein TopicxAnd TopiccThe topic information vectors representing news text and case text, respectively.
Finally, the final similarity y is obtained through the Manhattan distance calculation and the normalization processing through the Sigmoid functionT. The pre-training of the learner uses cross-entropy as a loss function and the Adam algorithm as an optimizer.
As a preferred embodiment of the present invention, the Step4 specifically comprises the following steps:
step4.1, learning a single visual angle by using an individual learner, optimizing a pre-training effect, combining output characteristics of the three individual learners to carry out weight learning, acquiring weight information under different angles by using a multi-head self-attention mechanism, and carrying out distance calculation after passing through a feedforward neural network to obtain the final similarity y in order to measure the importance degree of the three characteristics of different individual learners.
outputc=fn(MultiHead([Partc;Compositionc;Topicc])) (26)
outputx=fn(MultiHead([Partx;Compositionx;Topicx])) (27)
y=1-Sigmoid(Manhattan(outputc,outputx)) (28)
Wherein outputcAnd outputxThe output of the feed-forward neural network is represented, representing the final characterization of cases and news.
As a preferred embodiment of the present invention, the Step5 specifically comprises the following steps:
step5.1, through analyzing recent hot cases, the invention selects 15 representative hot cases and crawls the news 6049 related to the cases. According to the association relation between the crawled cases and news, triples of case and news similarity relations are established in the form of (case, news and similarity relation), relevant case-news data pairs 6049 are obtained, and by using a data augmentation method, 6000 unrelated case-news data pairs are obtained, and 12049 triples are obtained finally. The specific partitioning of the data set is shown in table 1.
TABLE 1 case and News data distribution Table
As a preferred embodiment of the present invention, the Step6 specifically comprises the following steps:
step6.1, the evaluation index of the invention mainly adopts accuracy (Acc.), precision (P), recall (R) and F1 values, and the invention selects Q statistic as the diversity measurement index of the individual learner. The value range of the Q statistic is [ -1.1], wherein-1 represents negative correlation, 1 represents positive correlation, and 0 represents irrelevant.
As a preferred embodiment of the present invention, the Step7 specifically comprises the following steps:
step7.1, the invention mainly adopts six classical text similarity calculation models as baseline models for comparison, wherein the baseline models comprise a twin network model, a polymerization-matching model and a pre-training model, and the baseline models are as follows:
● Siamese-CNN model: shen et al enhanced the ability of the model to capture window features by using CNN. The model mainly comprises a convolution layer and a pooling layer, and similarity calculation is carried out through a full-connection layer.
● Siamese-LSTM model: neculoiu et al use two layers of LSTM for feature extraction and similarity calculation through a fully connected layer.
● Siamese-Transformer model: and (4) carrying out feature extraction by using a single coding layer of a Transformer, and then carrying out similarity calculation by using a full connection layer.
● ESIM model: qian et al uses attention-based LSTM to capture high-order interaction information between two sentences. The model mainly comprises the following components: inputting codes, local reasoning modeling and reasoning combination, and then carrying out similarity calculation through a full connection layer.
● BiMPM model: wang et al propose four matching functions, and obtain matching results after interactive fusion. The model mainly comprises the following components: inputting codes, a matching layer and a feature fusion layer, and carrying out similarity calculation through two layers of feedforward networks.
● BERT model: the pre-training language model proposed by google is mainly fine-tuned by adding a full connection layer after BERT to obtain text similarity.
As a preferred embodiment of the present invention, the Step8 specifically comprises the following steps:
step8.1, comparative experiments of the invention with six baseline models: the experimental part is mainly used for verifying the effectiveness of the invention on the case and news correlation analysis task. The invention adopts 6 baseline models to carry out comparison experiments, and the comparison results of the experimental results are shown in Table 2.
TABLE 2 comparison of the invention with the baseline model test results
Analysis of table 2 shows that the acc, P, R, and F1 values of the present invention all exceed those of other baseline models, wherein acc is improved by 3.2% and F1 is improved by 2.5%. Therefore, the effectiveness of the multi-view similarity calculation method based on ensemble learning in case and news correlation analysis tasks is proved. In addition, the method provided by the invention is an integrated system constructed on the basis of three models, namely the baseline model Siamese-CNN, the Siamese-LSTM and the Siamese-Transformer, compared with the three baseline models, the F1 value of the method provided by the invention is improved by 3.9%, and the rationality of the method provided by the invention is powerfully proved. In addition, the F1 value of the BiMPM model obtains the optimal effect compared with other baseline models, because the BiMPM model uses a multi-feature matching mode, and the experimental effect of the method is combined, the similarity matching from multiple perspectives is an effective solution for the problem of matching unbalanced texts such as cases and news.
Step8.2, individual learner diversity analysis experiment: this experimental part is to verify the diversity of the individual learners proposed by the present invention. Table 3 is a tabulation of the individual learner predictions. In the table, the horizontal axis and the vertical axis indicate the respective learner, "G1" indicates the local information learner, "G2" indicates the structure information learner, "G3" indicates the topic information learner, "+" indicates the number of samples determined to be relevant by the individual learner, and "-" indicates the number of samples determined to be irrelevant by the individual learner. The results are shown in Table 3.
TABLE 3 prediction results tabulation
The Q statistic calculations were performed on the results in table 3, resulting in the results shown in table 4. Wherein "Q12"represents the diversity metric result between the individual learners G1 and G2," Q "represents the diversity metric of the entire integrated system.
TABLE 4 Individual learner diversity metric results
As can be seen from Table 4, Q12, Q13 and Q22 are all in the range of 0-1, so that they all have positive correlation and have certain diversity. The results of the Q values illustrate the diversity of the overall integrated system. The experimental results prove that three different individual learners respectively learn information on different sides and the weight learner also well integrates the three different information.
Step8.3, integrated strategy utility analysis experiment: the experimental part is to verify the effectiveness of the weight learner, respectively select an average method, a voting method and a logistic regression algorithm which are common in an integrated strategy as comparison experiments, and use the final output of each individual learner as a training set of the comparison experiments. The results of the experiment are shown in table 5 below.
TABLE 5 integration strategy experimental results
Compared with other integration strategies, the method of the invention achieves the optimal effect, and the F1 value exceeds about 1 percent, thereby fully proving that the integration strategy of the invention has superiority in the task of the invention. The results of the averaging method and the voting method are not improved, but are slightly reduced, which shows that the two strategies do not play an integrated role, because the number of the individual learners is small, the effect of the structural information learner is superior to that of other experimental effects, and the phenomenon of result coverage exists. The logistic regression integration achieves the effect exceeding that of an individual learner, so that the integration strategy plays a role, and the integration effect of the learning type integration strategy is better than that of other strategies by combining the method disclosed by the invention.
Step8.4, and each internal learning device key module utility analysis experiment: the experimental part is used for verifying the validity of the key modules of each individual learner in the method. The results of the experiment are shown in Table 6. Specifically, the "(-) case" indicates that no case element is used by each individual learner as an external guide, the "(-) position" indicates that the position information is not used by the structure information learner, the "(-) self-Attention" indicates that the self-Attention mechanism is not used by the local information learner, and the "(-) topic" indicates that the subject information learner does not use the subject information.
Table 6 shows the results of the validation experiment of the characteristics of each part
Analysis table 6 shows that, in the case of (-) case, "the effects of the three individual learners are all significantly reduced compared with the method of the present invention, and comparison table 2 shows that, because of the guidance of case elements, the twin network with a simple structure originally has the performance exceeding that of ESIM, which is a complex matching-aggregation network. Therefore, the effectiveness of the invention for external guidance by case elements is fully proved. In the case of (-) position, there is a large drop in the result of the structure information learner, indicating that the learner has effectively learned the structure information; in the case of "(-) self-Attention", the drop in the effect of the local information learner indicates the effectiveness of self-Attention on the CNN channels, which represent different local feature information, and laterally verifies the importance of the learner in discriminating the local information. Finally, under the condition of (-) topic, the reduction of the effect of the subject information learner also reflects that the learner can effectively utilize the subject information to carry out similarity calculation.
Step8.5, News example test analysis experiment: the experiment is mainly used for verifying the improvement of the method of the invention on the case and news correlation analysis accuracy. The invention selects the news and cases shown in the following tables 7 and 8 to construct the triples of the case and news with similar relations.
Table 7 news text examples
Table 8 example of case description text
The triplet form constructed in tables 7 and 8 is as follows:
(case description, News 1, similar)
(case description, news 2, similar)
(case description, news 3, similar)
The invention selects the typical Siamese-LSTM, BiMPM and BERT in the baseline model to carry out experiments, and the experimental results are as follows, wherein 0 represents dissimilar, and 1 represents similar.
TABLE 9 News example test results
The experimental results are shown in the table 9 above, and the method of the present invention accurately determines the similarity relationship between the three news cases and the corresponding cases. For siemese-LSTM, BiMPM, and BERT, they cannot make an accurate determination of these three examples simultaneously. Therefore, the method and the device can be proved to well utilize the similarity relation of a plurality of visual angles, effectively solve the problem of unbalanced text between cases and news and improve the accuracy of calculating the similarity of the cases and the news.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.
Claims (5)
1. The case and news correlation analysis method for multi-view ensemble learning is characterized by comprising the following steps of: the method comprises the following specific steps:
step1, constructing a local information learner by using a CNN network, and acquiring local information similarity between cases and news;
step2, constructing a local information learner by using a Transformer network, and acquiring structural information similarity between cases and news;
step3, constructing a local information learner by using a pre-training topic model, and acquiring topic information similarity between cases and news;
step4, constructing a weight learner by using a multi-head attention mechanism, and jointly judging the similarity degree from multiple angles.
2. The case and news relevance analysis method for multi-view ensemble learning according to claim 1, wherein: the specific steps of Step1 are as follows:
step1.1, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and obtaining weighted case and news feature vectors;
step1.2, extracting local information of the feature vectors of cases and news by using a CNN network, and after pooling operation, performing weight learning on an output channel of the CNN by using a self-attention mechanism for improving the weight of important local information;
step1.3, performing Manhattan distance calculation on the extracted local information of the case and the news to obtain a final similarity relation.
3. The case and news relevance analysis method for multi-view ensemble learning according to claim 1, wherein: the specific Step of Step2 is as follows:
step2.1, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and adding absolute position coding information of case and news texts to obtain weighted case and news feature vectors;
step2.2, extracting structure information of case and news characteristic vectors containing position coding information by using a Transformer network layer;
and Step2.3, performing Manhattan distance calculation on the extracted structural information coding vectors of the cases and the news to obtain a final similarity relation.
4. The case and news relevance analysis method for multi-view ensemble learning according to claim 1, wherein: the specific steps of Step3 are as follows:
step3.1, carrying out unsupervised pre-training on all data of cases and news by using a variational self-encoder VAE to obtain an unsupervised topic model;
step3.2, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and adding case and news topic vectors extracted by a topic model to obtain weighted case and news characteristic vectors;
step3.3, extracting the theme information of case and news characteristic vectors containing the theme information by using a bidirectional LSTM network layer;
and Step3.4, performing Manhattan distance calculation on the extracted subject information coding vectors of the cases and the news to obtain a final similarity relation.
5. The case and news relevance analysis method for multi-view ensemble learning according to claim 1, wherein: the Step4 includes:
and (3) learning a single visual angle by using an individual learner, optimizing the pre-training effect, performing weight learning by combining three representations obtained by Step1, Step2 and Step3, acquiring weight information at different angles by using a multi-head self-attention mechanism, then performing distance calculation by using a feedforward neural network, and finally obtaining the final similarity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111078776.1A CN113901990A (en) | 2021-09-15 | 2021-09-15 | Case and news correlation analysis method for multi-view integrated learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111078776.1A CN113901990A (en) | 2021-09-15 | 2021-09-15 | Case and news correlation analysis method for multi-view integrated learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113901990A true CN113901990A (en) | 2022-01-07 |
Family
ID=79028500
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111078776.1A Pending CN113901990A (en) | 2021-09-15 | 2021-09-15 | Case and news correlation analysis method for multi-view integrated learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113901990A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114817501A (en) * | 2022-04-27 | 2022-07-29 | 马上消费金融股份有限公司 | Data processing method, data processing device, electronic equipment and storage medium |
CN114926206A (en) * | 2022-05-18 | 2022-08-19 | 阿里巴巴(中国)有限公司 | Prediction model training method, and article sales information prediction method and apparatus |
CN117056874A (en) * | 2023-08-17 | 2023-11-14 | 国网四川省电力公司营销服务中心 | Unsupervised electricity larceny detection method based on deep twin autoregressive network |
CN117236323A (en) * | 2023-10-09 | 2023-12-15 | 青岛中企英才集团商业管理有限公司 | Information processing method and system based on big data |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180018757A1 (en) * | 2016-07-13 | 2018-01-18 | Kenji Suzuki | Transforming projection data in tomography by means of machine learning |
CN109885673A (en) * | 2019-02-13 | 2019-06-14 | 北京航空航天大学 | A kind of Method for Automatic Text Summarization based on pre-training language model |
CN110717332A (en) * | 2019-07-26 | 2020-01-21 | 昆明理工大学 | News and case similarity calculation method based on asymmetric twin network |
CN110766065A (en) * | 2019-10-18 | 2020-02-07 | 山东浪潮人工智能研究院有限公司 | Hash learning method based on deep hyper-information |
CN111368087A (en) * | 2020-03-23 | 2020-07-03 | 中南大学 | Chinese text classification method based on multi-input attention network |
CN112231472A (en) * | 2020-09-18 | 2021-01-15 | 昆明理工大学 | Judicial public opinion sensitive information identification method integrated with domain term dictionary |
CN112287687A (en) * | 2020-09-17 | 2021-01-29 | 昆明理工大学 | Case tendency extraction type summarization method based on case attribute perception |
CN112732916A (en) * | 2021-01-11 | 2021-04-30 | 河北工业大学 | BERT-based multi-feature fusion fuzzy text classification model |
CN112925877A (en) * | 2019-12-06 | 2021-06-08 | 中国科学院软件研究所 | One-person multi-case association identification method and system based on depth measurement learning |
-
2021
- 2021-09-15 CN CN202111078776.1A patent/CN113901990A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180018757A1 (en) * | 2016-07-13 | 2018-01-18 | Kenji Suzuki | Transforming projection data in tomography by means of machine learning |
CN109885673A (en) * | 2019-02-13 | 2019-06-14 | 北京航空航天大学 | A kind of Method for Automatic Text Summarization based on pre-training language model |
CN110717332A (en) * | 2019-07-26 | 2020-01-21 | 昆明理工大学 | News and case similarity calculation method based on asymmetric twin network |
CN110766065A (en) * | 2019-10-18 | 2020-02-07 | 山东浪潮人工智能研究院有限公司 | Hash learning method based on deep hyper-information |
CN112925877A (en) * | 2019-12-06 | 2021-06-08 | 中国科学院软件研究所 | One-person multi-case association identification method and system based on depth measurement learning |
CN111368087A (en) * | 2020-03-23 | 2020-07-03 | 中南大学 | Chinese text classification method based on multi-input attention network |
CN112287687A (en) * | 2020-09-17 | 2021-01-29 | 昆明理工大学 | Case tendency extraction type summarization method based on case attribute perception |
CN112231472A (en) * | 2020-09-18 | 2021-01-15 | 昆明理工大学 | Judicial public opinion sensitive information identification method integrated with domain term dictionary |
CN112732916A (en) * | 2021-01-11 | 2021-04-30 | 河北工业大学 | BERT-based multi-feature fusion fuzzy text classification model |
Non-Patent Citations (2)
Title |
---|
赵承鼎;郭军军;余正涛;黄于欣;刘权;宋燃;: "基于非对称孪生网络的新闻与案件相关性分析", 中文信息学报, no. 03, 15 March 2020 (2020-03-15) * |
陈佳伟;韩芳;王直杰;: "基于自注意力门控图卷积网络的特定目标情感分析", 计算机应用, no. 08, 10 August 2020 (2020-08-10) * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114817501A (en) * | 2022-04-27 | 2022-07-29 | 马上消费金融股份有限公司 | Data processing method, data processing device, electronic equipment and storage medium |
CN114926206A (en) * | 2022-05-18 | 2022-08-19 | 阿里巴巴(中国)有限公司 | Prediction model training method, and article sales information prediction method and apparatus |
CN117056874A (en) * | 2023-08-17 | 2023-11-14 | 国网四川省电力公司营销服务中心 | Unsupervised electricity larceny detection method based on deep twin autoregressive network |
CN117236323A (en) * | 2023-10-09 | 2023-12-15 | 青岛中企英才集团商业管理有限公司 | Information processing method and system based on big data |
CN117236323B (en) * | 2023-10-09 | 2024-03-29 | 京闽数科(北京)有限公司 | Information processing method and system based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113901990A (en) | Case and news correlation analysis method for multi-view integrated learning | |
CN111274398B (en) | Method and system for analyzing comment emotion of aspect-level user product | |
CN111259127B (en) | Long text answer selection method based on transfer learning sentence vector | |
CN111414461B (en) | Intelligent question-answering method and system fusing knowledge base and user modeling | |
CN110717334A (en) | Text emotion analysis method based on BERT model and double-channel attention | |
CN111930887B (en) | Multi-document multi-answer machine reading and understanding system based on joint training mode | |
CN112000772B (en) | Sentence-to-semantic matching method based on semantic feature cube and oriented to intelligent question and answer | |
CN112667818A (en) | GCN and multi-granularity attention fused user comment sentiment analysis method and system | |
CN113255366B (en) | Aspect-level text emotion analysis method based on heterogeneous graph neural network | |
CN113254604B (en) | Reference specification-based professional text generation method and device | |
CN111651558A (en) | Hyperspherical surface cooperative measurement recommendation device and method based on pre-training semantic model | |
CN112527993B (en) | Cross-media hierarchical deep video question-answer reasoning framework | |
CN114398976A (en) | Machine reading understanding method based on BERT and gate control type attention enhancement network | |
CN113901847A (en) | Neural machine translation method based on source language syntax enhanced decoding | |
Lin et al. | PS-mixer: A polar-vector and strength-vector mixer model for multimodal sentiment analysis | |
CN113157919A (en) | Sentence text aspect level emotion classification method and system | |
CN115810351A (en) | Controller voice recognition method and device based on audio-visual fusion | |
Wang et al. | EfficientTDNN: Efficient architecture search for speaker recognition | |
Zhang et al. | TS-GCN: Aspect-level sentiment classification model for consumer reviews | |
CN117539999A (en) | Cross-modal joint coding-based multi-modal emotion analysis method | |
CN117972434A (en) | Training method, training device, training equipment, training medium and training program product for text processing model | |
CN117648469A (en) | Cross double-tower structure answer selection method based on contrast learning | |
Fajcik et al. | Pruning the index contents for memory efficient open-domain qa | |
CN116663523A (en) | Semantic text similarity calculation method for multi-angle enhanced network | |
Zhao et al. | Improving stability and performance of spiking neural networks through enhancing temporal consistency |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |