CN113901990A - Case and news correlation analysis method for multi-view integrated learning - Google Patents

Case and news correlation analysis method for multi-view integrated learning Download PDF

Info

Publication number
CN113901990A
CN113901990A CN202111078776.1A CN202111078776A CN113901990A CN 113901990 A CN113901990 A CN 113901990A CN 202111078776 A CN202111078776 A CN 202111078776A CN 113901990 A CN113901990 A CN 113901990A
Authority
CN
China
Prior art keywords
news
case
information
learner
cases
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111078776.1A
Other languages
Chinese (zh)
Inventor
余正涛
汪翠
黄于欣
毛存礼
张玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202111078776.1A priority Critical patent/CN113901990A/en
Publication of CN113901990A publication Critical patent/CN113901990A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a case and news correlation analysis method for multi-view integrated learning, belonging to the technical field of natural language processing. The invention comprises the following steps: the method comprises the steps of taking a twin network as a basic frame, combining a CNN network, a transform coding network and a theme model to realize feature extraction of local information, structural information and theme information, considering that case elements contain key semantic information of cases, using the case elements to guide three pre-training base learners to obtain directional information, constructing a weight learner through a multi-head self-attention mechanism, combining three directional information and calculating a Manhattan distance, and finally obtaining a more balanced and reasonable similarity relation. The experimental result shows that compared with the method based on semantic similarity, the multi-view integration method disclosed by the invention has the advantage that the F1 value is improved by 2.5%.

Description

Case and news correlation analysis method for multi-view integrated learning
Technical Field
The invention relates to a case and news correlation analysis method for multi-view integrated learning, belonging to the technical field of natural language processing.
Background
In recent years, the application of deep learning in text similarity calculation is of great interest, and common methods can be roughly divided into three categories: a representational network, an interactive network, a pre-trained language model. The representative Network includes a twin Network framework (Siamese Network), which is a neural Network based on a group of networks having the same parameters. Interactive networks, represented as ESIMs, are mainly characterized by the framework of capturing more interactive features between two sentences. However, the matching operation of the framework in capturing the interaction information often needs to consume part of the time. In addition, the pre-training language model which is emerging in recent years achieves excellent effects, and is represented by BERT, RoBERTA and the like. BERT is one of the key innovations in the latest evolution of context representation learning. BERT employs a fine-tuning method that requires little or no specific architecture for each final task, achieving the most advanced performance among many NLP tasks.
From the above analysis, the characteristic that the text content between cases and news is too different results in the need of using multiple perspectives to jointly model the similarity. The integrated learning can combine a plurality of different individual learners to obtain better results, and the heterogeneous individual learners can represent a plurality of similar angles and can be well suitable for case and news correlation analysis. Therefore, the invention uses the integrated learning thought to select three individual learners to represent three different visual angles by taking the previous work as reference so as to deeply explore the similarity problem of cases and news.
Disclosure of Invention
The invention provides a case and news correlation analysis method for multi-view integrated learning, which is used for improving the accuracy of case and news correlation analysis; the invention uses twin network frame as base, follows the thought of integrated learning, selects three network structures with different characteristics to represent three visual angles, and constructs a local information learner, a structure information learner and a theme information learner, so that the local information learner, the structure information learner and the theme information learner can obtain semantic features and keep different emphasis. The three individual learners are pre-trained separately so that each learner gets the best results. And finally, combining the three kinds of information by a weight learner constructed by a multi-head attention mechanism to obtain the final similarity measurement.
The technical scheme of the invention is as follows: the case and news correlation analysis method for multi-view ensemble learning comprises the following specific steps:
step1, constructing a local information learner by using a CNN network, and acquiring local information similarity between cases and news;
step2, constructing a local information learner by using a Transformer network, and acquiring structural information similarity between cases and news;
step3, constructing a local information learner by using a pre-training topic model, and acquiring topic information similarity between cases and news;
step4, constructing a weight learner by using a multi-head attention mechanism, and jointly judging the similarity degree from multiple angles.
As a preferred embodiment of the present invention, the Step1 specifically comprises the following steps:
step1.1, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and obtaining weighted case and news feature vectors;
step1.2, extracting local information of the feature vectors of cases and news by using a CNN network, and after pooling operation, performing weight learning on an output channel of the CNN by using a self-attention mechanism for improving the weight of important local information.
And Step1.3, performing Manhattan distance calculation on the extracted local information coding vectors of the cases and the news to obtain a final similarity relation.
As a preferable scheme of the invention, the Step2 comprises the following specific steps:
step2.1, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and adding absolute position coding information of case and news texts to obtain weighted case and news feature vectors.
And Step2.2, extracting structural information of case and news characteristic vectors containing position coding information by using a Transformer network layer.
And Step2.3, performing Manhattan distance calculation on the extracted structural information coding vectors of the cases and the news to obtain a final similarity relation.
As a preferred embodiment of the present invention, the Step3 specifically comprises the following steps:
step3.1, using a variational self-encoder (VAE) to perform unsupervised pre-training on all data of cases and news to obtain an unsupervised topic model.
Step3.2, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and adding case and news topic vectors extracted by a topic model to obtain weighted case and news feature vectors.
Step3.3, using a bidirectional LSTM network layer to extract the subject information of case and news feature vectors containing the subject information.
And Step3.4, performing Manhattan distance calculation on the extracted subject information coding vectors of the cases and the news to obtain a final similarity relation.
As a preferred embodiment of the present invention, the Step4 specifically comprises the following steps:
after an individual learner is used for learning a single visual angle and the pre-training effect is optimized, in order to more balance the importance degree of three representations, weight learning is carried out by combining the three representations respectively obtained by Step1, Step2 and Step3, weight information under different angles is obtained by using a multi-head self-attention mechanism, then distance calculation is carried out through a feedforward neural network, and finally the final similarity y is obtained.
outputc=fn(MultiHead([Partc;Compositionc;Topicc])) (1)
outputx=fn(MultiHead([Partx;Compositionx;Topicx])) (2)
y=1-Sigmoid(Manhattan(outputc,outputx)) (3)
Wherein outputcAnd outputxThe output of the feed-forward neural network is represented, representing the final characterization of cases and news.
The invention has the beneficial effects that:
the invention discloses a case and news correlation analysis method for multi-view integrated learning, and aims to improve the accuracy of case and news correlation analysis. Aiming at the problems of unbalanced distribution and overlarge content difference between case description texts and news texts, the invention provides the judgment of the similarity between cases and news from multiple perspectives, and the validity of the invention is verified through experiments.
Drawings
FIG. 1 is a block diagram of a specific process of the present invention;
FIG. 2 is a diagram of a local information learner in accordance with the present invention;
FIG. 3 is a diagram of a structural information learner in accordance with the present invention;
FIG. 4 is a schematic diagram of a VAE topic learner in accordance with the present invention;
FIG. 5 is a diagram of a weight learner model constructed based on a multi-head self-attention mechanism in accordance with the present invention.
Detailed Description
Example 1: as shown in fig. 1 to 5, the method for analyzing case and news relevance of multi-view ensemble learning of the present invention comprises:
step1, constructing a local information learner by using a CNN network, and acquiring local information similarity between cases and news;
as shown in the local information learner in fig. 2, case elements are used as external guidance, and the output channels of the CNN network are weighted by using a self-attention mechanism, so that the capturing capability of the network on local information is improved. The local information learner is pre-trained and uses cross entropy loss as a loss function and the Adam algorithm as an optimizer.
Step2, constructing a local information learner by using a Transformer network, and acquiring structural information similarity between cases and news;
as shown in the structural information learner in FIG. 3, the encoding layer of the Transformer is a self-attribute-based network structure, and residual connection is performed after each processing, so that the network is favorable for acquiring global information of cases and news. Here again, case element external guidance is used, position coding information is added in an Embedding layer, the capture capability of the network for global structure information is enhanced, cross entropy loss is used as a loss function, and Adam algorithm is used as an optimizer for pre-training.
Step3, constructing a local information learner by using a pre-training topic model, and acquiring topic information similarity between cases and news;
as shown in the VAE topic learner of fig. 4, the variational auto-encoder (VAE) based topic model is an unsupervised document generation model that aims to extract potential topic features from the word vector space of a document and generate a corresponding document. Suchi et al used VAE to extract topic information to assist in text classification tasks, and the present invention refers to predecessor work, uses pre-trained VAE to obtain topic information and uses it to assist in building topic information learners.
Step4, constructing a weight learner by using a multi-head attention mechanism, and jointly judging the similarity degree from multiple angles. After an individual learner is used for learning a single visual angle and the pre-training effect is optimized, in order to more balance the importance degree of three characteristics, weight learning is carried out by combining three characteristics respectively obtained by Step1, Step2 and Step3, weight information under different angles is obtained by using a multi-head self-attention mechanism, then distance calculation is carried out through a feedforward neural network, and finally the final similarity is obtained.
As shown in fig. 5, which is a diagram of a model structure of a weight learner constructed based on a multi-head self-attention mechanism, after three types of individual learners are pre-trained, the weight learners are trained using their output results as a training set. Different from the traditional integrated learning combination strategy, the method selects the final output characterization part of each individual learner as a training set, so that the weight learner can learn the information of the original text.
Step5, crawling original news corpora on news websites such as microblogs through XPath, and performing operations such as data preprocessing and data set division on the original corpora.
Step6, and taking accuracy (Acc.), precision (P), recall (R) and F1 values and Q statistic as evaluation indexes to measure the experimental effectiveness of the invention.
Step7, the invention mainly adopts six classic text similarity calculation models as baseline models to carry out comparison experiments, and the baseline models comprise a twin network model, an aggregation-matching model and a pre-training model.
Step8, in order to verify the effectiveness of the method of the invention on the case and news correlation analysis task, 6 baseline models are adopted to carry out comparison experiments, and the experimental results are analyzed.
As a preferred embodiment of the present invention, the Step1 specifically comprises the following steps:
step1.1, firstly, manually marking case description to obtain case elements, and compressing news texts according to news titles to remove news redundant information. The case coded by sharing the Embedding layer is described as
Figure BDA0003263093460000051
The news text is
Figure BDA0003263093460000052
The case element is
Figure BDA0003263093460000053
Where m represents the length of the case description text, n represents the news text length, and k represents the number of case elements. After obtaining the coding matrix of the three kinds of information, the case element coding matrix E is used for guiding and weighting the news coding matrix X, and the specific operation is as follows:
H=XWE∈Rn*v,W∈Rv*k (1)
X'=tanh(H)∈Rn*v (2)
x' represents a news coding matrix guided by case elements. The operations of the formula (1) and the formula (2) can be understood as a simplified version of attention mechanism, the weight matrix W is a trainable matrix, can be regarded as a weighting function between the case element matrix E and the news coding matrix X, the weighting rule of E to each element of X can be learned through model autonomous learning, and then the weighted matrix elements are mapped to a 0-1 interval through a tanh activation function, so that the convergence speed is improved.
Step1.2, respectively inputting words described by the case into the C and a news matrix X' generated by the guidance of case elements into the CNN layer of the shared parameters for convolution to obtain windowed local information. And then reducing the space size of the data through a pooling operation to relieve the fitting, wherein the specific operations are as follows:
hiddenx=MaxPooling(CNN(X')) (3)
hiddenc=MaxPooling(CNN(C)) (4)
wherein Maxploling represents pooling operation, hiddenxAnd hiddencRepresenting the local information extracted by the CNN network.
Will hibdenxAnd hiddencWeighting respectively through a self-attention mechanism, and obtaining local information representation vectors, namely Part, of the news text and the case text after passing through a feedforward neural network layerxAnd Partc. As shown in formulas (5) and (6).
Partx=fn(Attention(hiddenx)) (5)
Partc=fn(Attention(hiddenc)) (6)
It should be noted that the self-attention mechanism here weights the output channels of the CNN network, because each channel represents local information obtained by convolution operation of one convolution kernel, and it is more important to use the self-attention mechanism for the output channels to indicate which convolution kernel is autonomously learned by the network to obtain the output of the local information. The self-attention mechanism is implemented as follows.
Figure BDA0003263093460000061
Q, K, V are all vector type and the three are equal. Calculating the dot product of Q and K and dividing by
Figure BDA0003263093460000062
So as to control the inner product of Q and K not to be too large.
Step1.3, the local information learner of the present invention selects Manhattan distance to calculate the distance between the news text and the case text, so as to measure the similarity difference between the two.
yP=1-Sigmoid(Manhattan(Partx,Partc)) (8)
Manhattan represents Manhattan distance calculation, Manhattan distance between two local information representation vectors of cases and news is obtained through the function, and then the distance value is mapped into a range of 0-1 through a Sigmoid function. Since the similarity is higher as the distance is closer, i.e., the distance and the similarity are inversely proportional, the result of subtracting Sigmoid from 1 is used as the final similarity yp
As a preferable scheme of the invention, the Step2 comprises the following specific steps:
step2.1, the structure information learner and the local information learner have similar structures, except that the structure information learner adds position information in an Embedding layer, and a middle shared network layer is replaced by a transform coding layer. This is because the first half of the entire network is being encoded and weighted, which can be seen as the collection of information, while the middle shared network layer is the part where feature extraction is performed. The position information used here is absolute position information in the native transform, i.e. the absolute position of each word is calculated, resulting in a fixed position matrix.
Figure BDA0003263093460000063
Figure BDA0003263093460000064
Where pos is the index of the position of each word, whose size is equal to the size of the vocabulary, dmodelIs the coding dimension, i denotes an arbitrary dimension and 2 x i ═ dmodel
Step2.2, obtaining word vectors C and X 'by the case and news through the Embedding process which is the same as the local information learner in the step1, and combining the word vectors C and X' with the position code CposAnd XposForm a complete word vector CwholeAnd XwholeAnd then, obtaining text structure information through a transform coding layer. The specific process is shown in the following formulas (11) to (14):
Xattn=Norm([MultiHead(Xwhole);Xwhole]) (11)
Cattn=Norm([MultiHead(Cwhole);Cwhole]) (12)
Compositionx=Norm([fn(Xattn);Xattn]) (13)
Compositionc=Norm([fn(Cattn);Cattn]) (14)
wherein XattnAnd CattnRepresents the output of the normalized multi-headed autofocusing mechanism, CompositionxAnd CompositioncStructural information representing cases and news.
Step2.3, and finally calculating the final similarity y through the Manhattan distancec
yc=1-Sigmoid(Manhattan(Compositionx,Compositionc)) (15)
As a preferred embodiment of the present invention, the Step3 specifically comprises the following steps:
the step3.1, VAE architecture is an encoder-decoder architecture. In the encoder, the input is compressed to the underlying subject Z, and the decoder reconstructs the input signal D from the distribution of Z in the underlying space of data by sampling.
Figure BDA0003263093460000071
Where Z represents a potential topic and P (D | Z) describes the probability of generating D from Z.
Typically, the VAE model assumes that the posterior probabilities of the underlying subject Z of the input data D approximately satisfy a gaussian distribution, i.e.:
logP(Z|d(i))=logN(z;μ(i)2(i)I) (17)
wherein d is(i)Representing a real sample in D, each of μ and δ2Are all formed by(i)Generated by a neural network.
Passing through mu(i)And delta2(i)Further obtain each d(i)Corresponding distribution P (Z)(i)|d(i)) Then through a decoding network
Figure BDA0003263093460000072
Is reconstructed to obtain
Figure BDA0003263093460000073
μ(i)=f1(d(i)) (18)
logδ2(i)=f2(d(i)) (19)
In order to make the reconstructed data as close to the original data as possible, the final optimization goal of the VAE is to maximize d(i)Generation probability P (d)(i)) At the same time, the posterior probability P (Z) obtained from the data is used by utilizing KL divergence(i)|d(i)) As close as possible to its theoretical variational probability, i.e., N (0, I). The expression of this optimization objective is shown in equation (20).
Figure BDA0003263093460000081
Step3.2, as shown in FIG. 4, where D represents case text and news text entered, two different processes are required for D. Firstly, the case and the news text in the step D are coded in the same way as the local learner in the step1, case elements are used for guidance, and case representation C and news representation X' guided by the case elements are obtained. In addition, case text and news text are respectively input into a pre-training VAE topic model, and potential topic vectors Z of the case and the news are respectively obtainedcAnd ZxAs shown in formula (21) and formula (22).
ZC=PreTrainedVAE(C)∈Rtopic_size (21)
ZX=PreTrainedVAE(X)∈Rtopic_size (22)
Where topic _ size represents a preset number of potential topics. And splicing the theme vector, the case vector and the news vector, then interacting the theme information and the text information through a bidirectional LSTM, and finally obtaining the theme information representation of the case and the news through a full-connection network. The specific operation is shown in formulas (23) to (25):
Topicx=MLP(BiLSTM([ZX;X'])) (23)
Topicc=MLP(BiLSTM([ZC;C])) (24)
yT=1-Sigmoid(Manhattan(Topicx,Topicc)) (25)
wherein TopicxAnd TopiccThe topic information vectors representing news text and case text, respectively.
Finally, the final similarity y is obtained through the Manhattan distance calculation and the normalization processing through the Sigmoid functionT. The pre-training of the learner uses cross-entropy as a loss function and the Adam algorithm as an optimizer.
As a preferred embodiment of the present invention, the Step4 specifically comprises the following steps:
step4.1, learning a single visual angle by using an individual learner, optimizing a pre-training effect, combining output characteristics of the three individual learners to carry out weight learning, acquiring weight information under different angles by using a multi-head self-attention mechanism, and carrying out distance calculation after passing through a feedforward neural network to obtain the final similarity y in order to measure the importance degree of the three characteristics of different individual learners.
outputc=fn(MultiHead([Partc;Compositionc;Topicc])) (26)
outputx=fn(MultiHead([Partx;Compositionx;Topicx])) (27)
y=1-Sigmoid(Manhattan(outputc,outputx)) (28)
Wherein outputcAnd outputxThe output of the feed-forward neural network is represented, representing the final characterization of cases and news.
As a preferred embodiment of the present invention, the Step5 specifically comprises the following steps:
step5.1, through analyzing recent hot cases, the invention selects 15 representative hot cases and crawls the news 6049 related to the cases. According to the association relation between the crawled cases and news, triples of case and news similarity relations are established in the form of (case, news and similarity relation), relevant case-news data pairs 6049 are obtained, and by using a data augmentation method, 6000 unrelated case-news data pairs are obtained, and 12049 triples are obtained finally. The specific partitioning of the data set is shown in table 1.
TABLE 1 case and News data distribution Table
Figure BDA0003263093460000091
As a preferred embodiment of the present invention, the Step6 specifically comprises the following steps:
step6.1, the evaluation index of the invention mainly adopts accuracy (Acc.), precision (P), recall (R) and F1 values, and the invention selects Q statistic as the diversity measurement index of the individual learner. The value range of the Q statistic is [ -1.1], wherein-1 represents negative correlation, 1 represents positive correlation, and 0 represents irrelevant.
As a preferred embodiment of the present invention, the Step7 specifically comprises the following steps:
step7.1, the invention mainly adopts six classical text similarity calculation models as baseline models for comparison, wherein the baseline models comprise a twin network model, a polymerization-matching model and a pre-training model, and the baseline models are as follows:
● Siamese-CNN model: shen et al enhanced the ability of the model to capture window features by using CNN. The model mainly comprises a convolution layer and a pooling layer, and similarity calculation is carried out through a full-connection layer.
● Siamese-LSTM model: neculoiu et al use two layers of LSTM for feature extraction and similarity calculation through a fully connected layer.
● Siamese-Transformer model: and (4) carrying out feature extraction by using a single coding layer of a Transformer, and then carrying out similarity calculation by using a full connection layer.
● ESIM model: qian et al uses attention-based LSTM to capture high-order interaction information between two sentences. The model mainly comprises the following components: inputting codes, local reasoning modeling and reasoning combination, and then carrying out similarity calculation through a full connection layer.
● BiMPM model: wang et al propose four matching functions, and obtain matching results after interactive fusion. The model mainly comprises the following components: inputting codes, a matching layer and a feature fusion layer, and carrying out similarity calculation through two layers of feedforward networks.
● BERT model: the pre-training language model proposed by google is mainly fine-tuned by adding a full connection layer after BERT to obtain text similarity.
As a preferred embodiment of the present invention, the Step8 specifically comprises the following steps:
step8.1, comparative experiments of the invention with six baseline models: the experimental part is mainly used for verifying the effectiveness of the invention on the case and news correlation analysis task. The invention adopts 6 baseline models to carry out comparison experiments, and the comparison results of the experimental results are shown in Table 2.
TABLE 2 comparison of the invention with the baseline model test results
Figure BDA0003263093460000101
Analysis of table 2 shows that the acc, P, R, and F1 values of the present invention all exceed those of other baseline models, wherein acc is improved by 3.2% and F1 is improved by 2.5%. Therefore, the effectiveness of the multi-view similarity calculation method based on ensemble learning in case and news correlation analysis tasks is proved. In addition, the method provided by the invention is an integrated system constructed on the basis of three models, namely the baseline model Siamese-CNN, the Siamese-LSTM and the Siamese-Transformer, compared with the three baseline models, the F1 value of the method provided by the invention is improved by 3.9%, and the rationality of the method provided by the invention is powerfully proved. In addition, the F1 value of the BiMPM model obtains the optimal effect compared with other baseline models, because the BiMPM model uses a multi-feature matching mode, and the experimental effect of the method is combined, the similarity matching from multiple perspectives is an effective solution for the problem of matching unbalanced texts such as cases and news.
Step8.2, individual learner diversity analysis experiment: this experimental part is to verify the diversity of the individual learners proposed by the present invention. Table 3 is a tabulation of the individual learner predictions. In the table, the horizontal axis and the vertical axis indicate the respective learner, "G1" indicates the local information learner, "G2" indicates the structure information learner, "G3" indicates the topic information learner, "+" indicates the number of samples determined to be relevant by the individual learner, and "-" indicates the number of samples determined to be irrelevant by the individual learner. The results are shown in Table 3.
TABLE 3 prediction results tabulation
Figure BDA0003263093460000102
Figure BDA0003263093460000111
The Q statistic calculations were performed on the results in table 3, resulting in the results shown in table 4. Wherein "Q12"represents the diversity metric result between the individual learners G1 and G2," Q "represents the diversity metric of the entire integrated system.
TABLE 4 Individual learner diversity metric results
Figure BDA0003263093460000112
As can be seen from Table 4, Q12, Q13 and Q22 are all in the range of 0-1, so that they all have positive correlation and have certain diversity. The results of the Q values illustrate the diversity of the overall integrated system. The experimental results prove that three different individual learners respectively learn information on different sides and the weight learner also well integrates the three different information.
Step8.3, integrated strategy utility analysis experiment: the experimental part is to verify the effectiveness of the weight learner, respectively select an average method, a voting method and a logistic regression algorithm which are common in an integrated strategy as comparison experiments, and use the final output of each individual learner as a training set of the comparison experiments. The results of the experiment are shown in table 5 below.
TABLE 5 integration strategy experimental results
Figure BDA0003263093460000113
Compared with other integration strategies, the method of the invention achieves the optimal effect, and the F1 value exceeds about 1 percent, thereby fully proving that the integration strategy of the invention has superiority in the task of the invention. The results of the averaging method and the voting method are not improved, but are slightly reduced, which shows that the two strategies do not play an integrated role, because the number of the individual learners is small, the effect of the structural information learner is superior to that of other experimental effects, and the phenomenon of result coverage exists. The logistic regression integration achieves the effect exceeding that of an individual learner, so that the integration strategy plays a role, and the integration effect of the learning type integration strategy is better than that of other strategies by combining the method disclosed by the invention.
Step8.4, and each internal learning device key module utility analysis experiment: the experimental part is used for verifying the validity of the key modules of each individual learner in the method. The results of the experiment are shown in Table 6. Specifically, the "(-) case" indicates that no case element is used by each individual learner as an external guide, the "(-) position" indicates that the position information is not used by the structure information learner, the "(-) self-Attention" indicates that the self-Attention mechanism is not used by the local information learner, and the "(-) topic" indicates that the subject information learner does not use the subject information.
Table 6 shows the results of the validation experiment of the characteristics of each part
Figure BDA0003263093460000121
Analysis table 6 shows that, in the case of (-) case, "the effects of the three individual learners are all significantly reduced compared with the method of the present invention, and comparison table 2 shows that, because of the guidance of case elements, the twin network with a simple structure originally has the performance exceeding that of ESIM, which is a complex matching-aggregation network. Therefore, the effectiveness of the invention for external guidance by case elements is fully proved. In the case of (-) position, there is a large drop in the result of the structure information learner, indicating that the learner has effectively learned the structure information; in the case of "(-) self-Attention", the drop in the effect of the local information learner indicates the effectiveness of self-Attention on the CNN channels, which represent different local feature information, and laterally verifies the importance of the learner in discriminating the local information. Finally, under the condition of (-) topic, the reduction of the effect of the subject information learner also reflects that the learner can effectively utilize the subject information to carry out similarity calculation.
Step8.5, News example test analysis experiment: the experiment is mainly used for verifying the improvement of the method of the invention on the case and news correlation analysis accuracy. The invention selects the news and cases shown in the following tables 7 and 8 to construct the triples of the case and news with similar relations.
Table 7 news text examples
Figure BDA0003263093460000122
Table 8 example of case description text
Figure BDA0003263093460000123
The triplet form constructed in tables 7 and 8 is as follows:
(case description, News 1, similar)
(case description, news 2, similar)
(case description, news 3, similar)
The invention selects the typical Siamese-LSTM, BiMPM and BERT in the baseline model to carry out experiments, and the experimental results are as follows, wherein 0 represents dissimilar, and 1 represents similar.
TABLE 9 News example test results
Figure BDA0003263093460000131
The experimental results are shown in the table 9 above, and the method of the present invention accurately determines the similarity relationship between the three news cases and the corresponding cases. For siemese-LSTM, BiMPM, and BERT, they cannot make an accurate determination of these three examples simultaneously. Therefore, the method and the device can be proved to well utilize the similarity relation of a plurality of visual angles, effectively solve the problem of unbalanced text between cases and news and improve the accuracy of calculating the similarity of the cases and the news.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (5)

1. The case and news correlation analysis method for multi-view ensemble learning is characterized by comprising the following steps of: the method comprises the following specific steps:
step1, constructing a local information learner by using a CNN network, and acquiring local information similarity between cases and news;
step2, constructing a local information learner by using a Transformer network, and acquiring structural information similarity between cases and news;
step3, constructing a local information learner by using a pre-training topic model, and acquiring topic information similarity between cases and news;
step4, constructing a weight learner by using a multi-head attention mechanism, and jointly judging the similarity degree from multiple angles.
2. The case and news relevance analysis method for multi-view ensemble learning according to claim 1, wherein: the specific steps of Step1 are as follows:
step1.1, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and obtaining weighted case and news feature vectors;
step1.2, extracting local information of the feature vectors of cases and news by using a CNN network, and after pooling operation, performing weight learning on an output channel of the CNN by using a self-attention mechanism for improving the weight of important local information;
step1.3, performing Manhattan distance calculation on the extracted local information of the case and the news to obtain a final similarity relation.
3. The case and news relevance analysis method for multi-view ensemble learning according to claim 1, wherein: the specific Step of Step2 is as follows:
step2.1, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and adding absolute position coding information of case and news texts to obtain weighted case and news feature vectors;
step2.2, extracting structure information of case and news characteristic vectors containing position coding information by using a Transformer network layer;
and Step2.3, performing Manhattan distance calculation on the extracted structural information coding vectors of the cases and the news to obtain a final similarity relation.
4. The case and news relevance analysis method for multi-view ensemble learning according to claim 1, wherein: the specific steps of Step3 are as follows:
step3.1, carrying out unsupervised pre-training on all data of cases and news by using a variational self-encoder VAE to obtain an unsupervised topic model;
step3.2, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and adding case and news topic vectors extracted by a topic model to obtain weighted case and news characteristic vectors;
step3.3, extracting the theme information of case and news characteristic vectors containing the theme information by using a bidirectional LSTM network layer;
and Step3.4, performing Manhattan distance calculation on the extracted subject information coding vectors of the cases and the news to obtain a final similarity relation.
5. The case and news relevance analysis method for multi-view ensemble learning according to claim 1, wherein: the Step4 includes:
and (3) learning a single visual angle by using an individual learner, optimizing the pre-training effect, performing weight learning by combining three representations obtained by Step1, Step2 and Step3, acquiring weight information at different angles by using a multi-head self-attention mechanism, then performing distance calculation by using a feedforward neural network, and finally obtaining the final similarity.
CN202111078776.1A 2021-09-15 2021-09-15 Case and news correlation analysis method for multi-view integrated learning Pending CN113901990A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111078776.1A CN113901990A (en) 2021-09-15 2021-09-15 Case and news correlation analysis method for multi-view integrated learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111078776.1A CN113901990A (en) 2021-09-15 2021-09-15 Case and news correlation analysis method for multi-view integrated learning

Publications (1)

Publication Number Publication Date
CN113901990A true CN113901990A (en) 2022-01-07

Family

ID=79028500

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111078776.1A Pending CN113901990A (en) 2021-09-15 2021-09-15 Case and news correlation analysis method for multi-view integrated learning

Country Status (1)

Country Link
CN (1) CN113901990A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817501A (en) * 2022-04-27 2022-07-29 马上消费金融股份有限公司 Data processing method, data processing device, electronic equipment and storage medium
CN114926206A (en) * 2022-05-18 2022-08-19 阿里巴巴(中国)有限公司 Prediction model training method, and article sales information prediction method and apparatus
CN117056874A (en) * 2023-08-17 2023-11-14 国网四川省电力公司营销服务中心 Unsupervised electricity larceny detection method based on deep twin autoregressive network
CN117236323A (en) * 2023-10-09 2023-12-15 青岛中企英才集团商业管理有限公司 Information processing method and system based on big data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180018757A1 (en) * 2016-07-13 2018-01-18 Kenji Suzuki Transforming projection data in tomography by means of machine learning
CN109885673A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A kind of Method for Automatic Text Summarization based on pre-training language model
CN110717332A (en) * 2019-07-26 2020-01-21 昆明理工大学 News and case similarity calculation method based on asymmetric twin network
CN110766065A (en) * 2019-10-18 2020-02-07 山东浪潮人工智能研究院有限公司 Hash learning method based on deep hyper-information
CN111368087A (en) * 2020-03-23 2020-07-03 中南大学 Chinese text classification method based on multi-input attention network
CN112231472A (en) * 2020-09-18 2021-01-15 昆明理工大学 Judicial public opinion sensitive information identification method integrated with domain term dictionary
CN112287687A (en) * 2020-09-17 2021-01-29 昆明理工大学 Case tendency extraction type summarization method based on case attribute perception
CN112732916A (en) * 2021-01-11 2021-04-30 河北工业大学 BERT-based multi-feature fusion fuzzy text classification model
CN112925877A (en) * 2019-12-06 2021-06-08 中国科学院软件研究所 One-person multi-case association identification method and system based on depth measurement learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180018757A1 (en) * 2016-07-13 2018-01-18 Kenji Suzuki Transforming projection data in tomography by means of machine learning
CN109885673A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A kind of Method for Automatic Text Summarization based on pre-training language model
CN110717332A (en) * 2019-07-26 2020-01-21 昆明理工大学 News and case similarity calculation method based on asymmetric twin network
CN110766065A (en) * 2019-10-18 2020-02-07 山东浪潮人工智能研究院有限公司 Hash learning method based on deep hyper-information
CN112925877A (en) * 2019-12-06 2021-06-08 中国科学院软件研究所 One-person multi-case association identification method and system based on depth measurement learning
CN111368087A (en) * 2020-03-23 2020-07-03 中南大学 Chinese text classification method based on multi-input attention network
CN112287687A (en) * 2020-09-17 2021-01-29 昆明理工大学 Case tendency extraction type summarization method based on case attribute perception
CN112231472A (en) * 2020-09-18 2021-01-15 昆明理工大学 Judicial public opinion sensitive information identification method integrated with domain term dictionary
CN112732916A (en) * 2021-01-11 2021-04-30 河北工业大学 BERT-based multi-feature fusion fuzzy text classification model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
赵承鼎;郭军军;余正涛;黄于欣;刘权;宋燃;: "基于非对称孪生网络的新闻与案件相关性分析", 中文信息学报, no. 03, 15 March 2020 (2020-03-15) *
陈佳伟;韩芳;王直杰;: "基于自注意力门控图卷积网络的特定目标情感分析", 计算机应用, no. 08, 10 August 2020 (2020-08-10) *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817501A (en) * 2022-04-27 2022-07-29 马上消费金融股份有限公司 Data processing method, data processing device, electronic equipment and storage medium
CN114926206A (en) * 2022-05-18 2022-08-19 阿里巴巴(中国)有限公司 Prediction model training method, and article sales information prediction method and apparatus
CN117056874A (en) * 2023-08-17 2023-11-14 国网四川省电力公司营销服务中心 Unsupervised electricity larceny detection method based on deep twin autoregressive network
CN117236323A (en) * 2023-10-09 2023-12-15 青岛中企英才集团商业管理有限公司 Information processing method and system based on big data
CN117236323B (en) * 2023-10-09 2024-03-29 京闽数科(北京)有限公司 Information processing method and system based on big data

Similar Documents

Publication Publication Date Title
CN113901990A (en) Case and news correlation analysis method for multi-view integrated learning
CN111274398B (en) Method and system for analyzing comment emotion of aspect-level user product
CN111259127B (en) Long text answer selection method based on transfer learning sentence vector
CN111414461B (en) Intelligent question-answering method and system fusing knowledge base and user modeling
CN110717334A (en) Text emotion analysis method based on BERT model and double-channel attention
CN111930887B (en) Multi-document multi-answer machine reading and understanding system based on joint training mode
CN112000772B (en) Sentence-to-semantic matching method based on semantic feature cube and oriented to intelligent question and answer
CN112667818A (en) GCN and multi-granularity attention fused user comment sentiment analysis method and system
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN113254604B (en) Reference specification-based professional text generation method and device
CN111651558A (en) Hyperspherical surface cooperative measurement recommendation device and method based on pre-training semantic model
CN112527993B (en) Cross-media hierarchical deep video question-answer reasoning framework
CN114398976A (en) Machine reading understanding method based on BERT and gate control type attention enhancement network
CN113901847A (en) Neural machine translation method based on source language syntax enhanced decoding
Lin et al. PS-mixer: A polar-vector and strength-vector mixer model for multimodal sentiment analysis
CN113157919A (en) Sentence text aspect level emotion classification method and system
CN115810351A (en) Controller voice recognition method and device based on audio-visual fusion
Wang et al. EfficientTDNN: Efficient architecture search for speaker recognition
Zhang et al. TS-GCN: Aspect-level sentiment classification model for consumer reviews
CN117539999A (en) Cross-modal joint coding-based multi-modal emotion analysis method
CN117972434A (en) Training method, training device, training equipment, training medium and training program product for text processing model
CN117648469A (en) Cross double-tower structure answer selection method based on contrast learning
Fajcik et al. Pruning the index contents for memory efficient open-domain qa
CN116663523A (en) Semantic text similarity calculation method for multi-angle enhanced network
Zhao et al. Improving stability and performance of spiking neural networks through enhancing temporal consistency

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination