CN113901990A

CN113901990A - Case and news correlation analysis method for multi-view integrated learning

Info

Publication number: CN113901990A
Application number: CN202111078776.1A
Authority: CN
Inventors: 余正涛; 汪翠; 黄于欣; 毛存礼; 张玉
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2021-09-15
Filing date: 2021-09-15
Publication date: 2022-01-07

Abstract

The invention relates to a case and news correlation analysis method for multi-view integrated learning, belonging to the technical field of natural language processing. The invention comprises the following steps: the method comprises the steps of taking a twin network as a basic frame, combining a CNN network, a transform coding network and a theme model to realize feature extraction of local information, structural information and theme information, considering that case elements contain key semantic information of cases, using the case elements to guide three pre-training base learners to obtain directional information, constructing a weight learner through a multi-head self-attention mechanism, combining three directional information and calculating a Manhattan distance, and finally obtaining a more balanced and reasonable similarity relation. The experimental result shows that compared with the method based on semantic similarity, the multi-view integration method disclosed by the invention has the advantage that the F1 value is improved by 2.5%.

Description

Case and news correlation analysis method for multi-view integrated learning

Technical Field

The invention relates to a case and news correlation analysis method for multi-view integrated learning, belonging to the technical field of natural language processing.

Background

In recent years, the application of deep learning in text similarity calculation is of great interest, and common methods can be roughly divided into three categories: a representational network, an interactive network, a pre-trained language model. The representative Network includes a twin Network framework (Siamese Network), which is a neural Network based on a group of networks having the same parameters. Interactive networks, represented as ESIMs, are mainly characterized by the framework of capturing more interactive features between two sentences. However, the matching operation of the framework in capturing the interaction information often needs to consume part of the time. In addition, the pre-training language model which is emerging in recent years achieves excellent effects, and is represented by BERT, RoBERTA and the like. BERT is one of the key innovations in the latest evolution of context representation learning. BERT employs a fine-tuning method that requires little or no specific architecture for each final task, achieving the most advanced performance among many NLP tasks.

From the above analysis, the characteristic that the text content between cases and news is too different results in the need of using multiple perspectives to jointly model the similarity. The integrated learning can combine a plurality of different individual learners to obtain better results, and the heterogeneous individual learners can represent a plurality of similar angles and can be well suitable for case and news correlation analysis. Therefore, the invention uses the integrated learning thought to select three individual learners to represent three different visual angles by taking the previous work as reference so as to deeply explore the similarity problem of cases and news.

Disclosure of Invention

The invention provides a case and news correlation analysis method for multi-view integrated learning, which is used for improving the accuracy of case and news correlation analysis; the invention uses twin network frame as base, follows the thought of integrated learning, selects three network structures with different characteristics to represent three visual angles, and constructs a local information learner, a structure information learner and a theme information learner, so that the local information learner, the structure information learner and the theme information learner can obtain semantic features and keep different emphasis. The three individual learners are pre-trained separately so that each learner gets the best results. And finally, combining the three kinds of information by a weight learner constructed by a multi-head attention mechanism to obtain the final similarity measurement.

The technical scheme of the invention is as follows: the case and news correlation analysis method for multi-view ensemble learning comprises the following specific steps:

step1, constructing a local information learner by using a CNN network, and acquiring local information similarity between cases and news;

step2, constructing a local information learner by using a Transformer network, and acquiring structural information similarity between cases and news;

step3, constructing a local information learner by using a pre-training topic model, and acquiring topic information similarity between cases and news;

step4, constructing a weight learner by using a multi-head attention mechanism, and jointly judging the similarity degree from multiple angles.

As a preferred embodiment of the present invention, the Step1 specifically comprises the following steps:

step1.1, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and obtaining weighted case and news feature vectors;

step1.2, extracting local information of the feature vectors of cases and news by using a CNN network, and after pooling operation, performing weight learning on an output channel of the CNN by using a self-attention mechanism for improving the weight of important local information.

And Step1.3, performing Manhattan distance calculation on the extracted local information coding vectors of the cases and the news to obtain a final similarity relation.

As a preferable scheme of the invention, the Step2 comprises the following specific steps:

step2.1, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and adding absolute position coding information of case and news texts to obtain weighted case and news feature vectors.

And Step2.2, extracting structural information of case and news characteristic vectors containing position coding information by using a Transformer network layer.

And Step2.3, performing Manhattan distance calculation on the extracted structural information coding vectors of the cases and the news to obtain a final similarity relation.

As a preferred embodiment of the present invention, the Step3 specifically comprises the following steps:

step3.1, using a variational self-encoder (VAE) to perform unsupervised pre-training on all data of cases and news to obtain an unsupervised topic model.

Step3.2, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and adding case and news topic vectors extracted by a topic model to obtain weighted case and news feature vectors.

Step3.3, using a bidirectional LSTM network layer to extract the subject information of case and news feature vectors containing the subject information.

And Step3.4, performing Manhattan distance calculation on the extracted subject information coding vectors of the cases and the news to obtain a final similarity relation.

As a preferred embodiment of the present invention, the Step4 specifically comprises the following steps:

after an individual learner is used for learning a single visual angle and the pre-training effect is optimized, in order to more balance the importance degree of three representations, weight learning is carried out by combining the three representations respectively obtained by Step1, Step2 and Step3, weight information under different angles is obtained by using a multi-head self-attention mechanism, then distance calculation is carried out through a feedforward neural network, and finally the final similarity y is obtained.

output_c＝fn(MultiHead([Part_c；Composition_c；Topic_c])) (1)

output_x＝fn(MultiHead([Part_x；Composition_x；Topic_x])) (2)

y＝1-Sigmoid(Manhattan(output_c,output_x)) (3)

Wherein output_cAnd output_xThe output of the feed-forward neural network is represented, representing the final characterization of cases and news.

The invention has the beneficial effects that:

the invention discloses a case and news correlation analysis method for multi-view integrated learning, and aims to improve the accuracy of case and news correlation analysis. Aiming at the problems of unbalanced distribution and overlarge content difference between case description texts and news texts, the invention provides the judgment of the similarity between cases and news from multiple perspectives, and the validity of the invention is verified through experiments.

Drawings

FIG. 1 is a block diagram of a specific process of the present invention;

FIG. 2 is a diagram of a local information learner in accordance with the present invention;

FIG. 3 is a diagram of a structural information learner in accordance with the present invention;

FIG. 4 is a schematic diagram of a VAE topic learner in accordance with the present invention;

FIG. 5 is a diagram of a weight learner model constructed based on a multi-head self-attention mechanism in accordance with the present invention.

Detailed Description

Example 1: as shown in fig. 1 to 5, the method for analyzing case and news relevance of multi-view ensemble learning of the present invention comprises:

as shown in the local information learner in fig. 2, case elements are used as external guidance, and the output channels of the CNN network are weighted by using a self-attention mechanism, so that the capturing capability of the network on local information is improved. The local information learner is pre-trained and uses cross entropy loss as a loss function and the Adam algorithm as an optimizer.

as shown in the structural information learner in FIG. 3, the encoding layer of the Transformer is a self-attribute-based network structure, and residual connection is performed after each processing, so that the network is favorable for acquiring global information of cases and news. Here again, case element external guidance is used, position coding information is added in an Embedding layer, the capture capability of the network for global structure information is enhanced, cross entropy loss is used as a loss function, and Adam algorithm is used as an optimizer for pre-training.

as shown in the VAE topic learner of fig. 4, the variational auto-encoder (VAE) based topic model is an unsupervised document generation model that aims to extract potential topic features from the word vector space of a document and generate a corresponding document. Suchi et al used VAE to extract topic information to assist in text classification tasks, and the present invention refers to predecessor work, uses pre-trained VAE to obtain topic information and uses it to assist in building topic information learners.

Step4, constructing a weight learner by using a multi-head attention mechanism, and jointly judging the similarity degree from multiple angles. After an individual learner is used for learning a single visual angle and the pre-training effect is optimized, in order to more balance the importance degree of three characteristics, weight learning is carried out by combining three characteristics respectively obtained by Step1, Step2 and Step3, weight information under different angles is obtained by using a multi-head self-attention mechanism, then distance calculation is carried out through a feedforward neural network, and finally the final similarity is obtained.

As shown in fig. 5, which is a diagram of a model structure of a weight learner constructed based on a multi-head self-attention mechanism, after three types of individual learners are pre-trained, the weight learners are trained using their output results as a training set. Different from the traditional integrated learning combination strategy, the method selects the final output characterization part of each individual learner as a training set, so that the weight learner can learn the information of the original text.

Step5, crawling original news corpora on news websites such as microblogs through XPath, and performing operations such as data preprocessing and data set division on the original corpora.

Step6, and taking accuracy (Acc.), precision (P), recall (R) and F1 values and Q statistic as evaluation indexes to measure the experimental effectiveness of the invention.

Step7, the invention mainly adopts six classic text similarity calculation models as baseline models to carry out comparison experiments, and the baseline models comprise a twin network model, an aggregation-matching model and a pre-training model.

Step8, in order to verify the effectiveness of the method of the invention on the case and news correlation analysis task, 6 baseline models are adopted to carry out comparison experiments, and the experimental results are analyzed.

step1.1, firstly, manually marking case description to obtain case elements, and compressing news texts according to news titles to remove news redundant information. The case coded by sharing the Embedding layer is described as

The news text is

The case element is

Where m represents the length of the case description text, n represents the news text length, and k represents the number of case elements. After obtaining the coding matrix of the three kinds of information, the case element coding matrix E is used for guiding and weighting the news coding matrix X, and the specific operation is as follows:

H＝XWE∈R^n*v,W∈R^v*k (1)

X'＝tanh(H)∈R^n*v (2)

x' represents a news coding matrix guided by case elements. The operations of the formula (1) and the formula (2) can be understood as a simplified version of attention mechanism, the weight matrix W is a trainable matrix, can be regarded as a weighting function between the case element matrix E and the news coding matrix X, the weighting rule of E to each element of X can be learned through model autonomous learning, and then the weighted matrix elements are mapped to a 0-1 interval through a tanh activation function, so that the convergence speed is improved.

Step1.2, respectively inputting words described by the case into the C and a news matrix X' generated by the guidance of case elements into the CNN layer of the shared parameters for convolution to obtain windowed local information. And then reducing the space size of the data through a pooling operation to relieve the fitting, wherein the specific operations are as follows:

hidden_x＝MaxPooling(CNN(X')) (3)

hidden_c＝MaxPooling(CNN(C)) (4)

wherein Maxploling represents pooling operation, hidden_xAnd hidden_cRepresenting the local information extracted by the CNN network.

Will hibden_xAnd hidden_cWeighting respectively through a self-attention mechanism, and obtaining local information representation vectors, namely Part, of the news text and the case text after passing through a feedforward neural network layer_xAnd Part_c. As shown in formulas (5) and (6).

Part_x＝fn(Attention(hidden_x)) (5)

Part_c＝fn(Attention(hidden_c)) (6)

It should be noted that the self-attention mechanism here weights the output channels of the CNN network, because each channel represents local information obtained by convolution operation of one convolution kernel, and it is more important to use the self-attention mechanism for the output channels to indicate which convolution kernel is autonomously learned by the network to obtain the output of the local information. The self-attention mechanism is implemented as follows.

Q, K, V are all vector type and the three are equal. Calculating the dot product of Q and K and dividing by

So as to control the inner product of Q and K not to be too large.

Step1.3, the local information learner of the present invention selects Manhattan distance to calculate the distance between the news text and the case text, so as to measure the similarity difference between the two.

y_P＝1-Sigmoid(Manhattan(Part_x,Part_c)) (8)

Manhattan represents Manhattan distance calculation, Manhattan distance between two local information representation vectors of cases and news is obtained through the function, and then the distance value is mapped into a range of 0-1 through a Sigmoid function. Since the similarity is higher as the distance is closer, i.e., the distance and the similarity are inversely proportional, the result of subtracting Sigmoid from 1 is used as the final similarity y_p。

step2.1, the structure information learner and the local information learner have similar structures, except that the structure information learner adds position information in an Embedding layer, and a middle shared network layer is replaced by a transform coding layer. This is because the first half of the entire network is being encoded and weighted, which can be seen as the collection of information, while the middle shared network layer is the part where feature extraction is performed. The position information used here is absolute position information in the native transform, i.e. the absolute position of each word is calculated, resulting in a fixed position matrix.

Where pos is the index of the position of each word, whose size is equal to the size of the vocabulary, d_modelIs the coding dimension, i denotes an arbitrary dimension and 2 x i ═ d_model。

Step2.2, obtaining word vectors C and X 'by the case and news through the Embedding process which is the same as the local information learner in the step1, and combining the word vectors C and X' with the position code C_posAnd X_posForm a complete word vector C_wholeAnd X_wholeAnd then, obtaining text structure information through a transform coding layer. The specific process is shown in the following formulas (11) to (14):

X_attn＝Norm([MultiHead(X_whole)；X_whole]) (11)

C_attn＝Norm([MultiHead(C_whole)；C_whole]) (12)

Composition_x＝Norm([fn(X_attn)；X_attn]) (13)

Composition_c＝Norm([fn(C_attn)；C_attn]) (14)

wherein X_attnAnd C_attnRepresents the output of the normalized multi-headed autofocusing mechanism, Composition_xAnd Composition_cStructural information representing cases and news.

Step2.3, and finally calculating the final similarity y through the Manhattan distance_c。

y_c＝1-Sigmoid(Manhattan(Composition_x,Composition_c)) (15)

the step3.1, VAE architecture is an encoder-decoder architecture. In the encoder, the input is compressed to the underlying subject Z, and the decoder reconstructs the input signal D from the distribution of Z in the underlying space of data by sampling.

Where Z represents a potential topic and P (D | Z) describes the probability of generating D from Z.

Typically, the VAE model assumes that the posterior probabilities of the underlying subject Z of the input data D approximately satisfy a gaussian distribution, i.e.:

logP(Z|d⁽ⁱ⁾)＝logN(z；μ⁽ⁱ⁾,δ²⁽ⁱ⁾I) (17)

wherein d is⁽ⁱ⁾Representing a real sample in D, each of μ and δ²Are all formed by⁽ⁱ⁾Generated by a neural network.

Passing through mu⁽ⁱ⁾And delta²⁽ⁱ⁾Further obtain each d⁽ⁱ⁾Corresponding distribution P (Z)⁽ⁱ⁾|d⁽ⁱ⁾) Then through a decoding network

Is reconstructed to obtain

μ⁽ⁱ⁾＝f₁(d⁽ⁱ⁾) (18)

logδ²⁽ⁱ⁾＝f₂(d⁽ⁱ⁾) (19)

In order to make the reconstructed data as close to the original data as possible, the final optimization goal of the VAE is to maximize d⁽ⁱ⁾Generation probability P (d)⁽ⁱ⁾) At the same time, the posterior probability P (Z) obtained from the data is used by utilizing KL divergence⁽ⁱ⁾|d⁽ⁱ⁾) As close as possible to its theoretical variational probability, i.e., N (0, I). The expression of this optimization objective is shown in equation (20).

Step3.2, as shown in FIG. 4, where D represents case text and news text entered, two different processes are required for D. Firstly, the case and the news text in the step D are coded in the same way as the local learner in the step1, case elements are used for guidance, and case representation C and news representation X' guided by the case elements are obtained. In addition, case text and news text are respectively input into a pre-training VAE topic model, and potential topic vectors Z of the case and the news are respectively obtained_cAnd Z_xAs shown in formula (21) and formula (22).

Z_C＝PreTrainedVAE(C)∈R^topic_size (21)

Z_X＝PreTrainedVAE(X)∈R^topic_size (22)

Where topic _ size represents a preset number of potential topics. And splicing the theme vector, the case vector and the news vector, then interacting the theme information and the text information through a bidirectional LSTM, and finally obtaining the theme information representation of the case and the news through a full-connection network. The specific operation is shown in formulas (23) to (25):

Topic_x＝MLP(BiLSTM([Z_X；X'])) (23)

Topic_c＝MLP(BiLSTM([Z_C；C])) (24)

y_T＝1-Sigmoid(Manhattan(Topic_x,Topic_c)) (25)

wherein Topic_xAnd Topic_cThe topic information vectors representing news text and case text, respectively.

Finally, the final similarity y is obtained through the Manhattan distance calculation and the normalization processing through the Sigmoid function_T. The pre-training of the learner uses cross-entropy as a loss function and the Adam algorithm as an optimizer.

step4.1, learning a single visual angle by using an individual learner, optimizing a pre-training effect, combining output characteristics of the three individual learners to carry out weight learning, acquiring weight information under different angles by using a multi-head self-attention mechanism, and carrying out distance calculation after passing through a feedforward neural network to obtain the final similarity y in order to measure the importance degree of the three characteristics of different individual learners.

output_c＝fn(MultiHead([Part_c；Composition_c；Topic_c])) (26)

output_x＝fn(MultiHead([Part_x；Composition_x；Topic_x])) (27)

y＝1-Sigmoid(Manhattan(output_c,output_x)) (28)

As a preferred embodiment of the present invention, the Step5 specifically comprises the following steps:

step5.1, through analyzing recent hot cases, the invention selects 15 representative hot cases and crawls the news 6049 related to the cases. According to the association relation between the crawled cases and news, triples of case and news similarity relations are established in the form of (case, news and similarity relation), relevant case-news data pairs 6049 are obtained, and by using a data augmentation method, 6000 unrelated case-news data pairs are obtained, and 12049 triples are obtained finally. The specific partitioning of the data set is shown in table 1.

TABLE 1 case and News data distribution Table

As a preferred embodiment of the present invention, the Step6 specifically comprises the following steps:

step6.1, the evaluation index of the invention mainly adopts accuracy (Acc.), precision (P), recall (R) and F1 values, and the invention selects Q statistic as the diversity measurement index of the individual learner. The value range of the Q statistic is [ -1.1], wherein-1 represents negative correlation, 1 represents positive correlation, and 0 represents irrelevant.

As a preferred embodiment of the present invention, the Step7 specifically comprises the following steps:

step7.1, the invention mainly adopts six classical text similarity calculation models as baseline models for comparison, wherein the baseline models comprise a twin network model, a polymerization-matching model and a pre-training model, and the baseline models are as follows:

● Siamese-CNN model: shen et al enhanced the ability of the model to capture window features by using CNN. The model mainly comprises a convolution layer and a pooling layer, and similarity calculation is carried out through a full-connection layer.

● Siamese-LSTM model: neculoiu et al use two layers of LSTM for feature extraction and similarity calculation through a fully connected layer.

● Siamese-Transformer model: and (4) carrying out feature extraction by using a single coding layer of a Transformer, and then carrying out similarity calculation by using a full connection layer.

● ESIM model: qian et al uses attention-based LSTM to capture high-order interaction information between two sentences. The model mainly comprises the following components: inputting codes, local reasoning modeling and reasoning combination, and then carrying out similarity calculation through a full connection layer.

● BiMPM model: wang et al propose four matching functions, and obtain matching results after interactive fusion. The model mainly comprises the following components: inputting codes, a matching layer and a feature fusion layer, and carrying out similarity calculation through two layers of feedforward networks.

● BERT model: the pre-training language model proposed by google is mainly fine-tuned by adding a full connection layer after BERT to obtain text similarity.

As a preferred embodiment of the present invention, the Step8 specifically comprises the following steps:

step8.1, comparative experiments of the invention with six baseline models: the experimental part is mainly used for verifying the effectiveness of the invention on the case and news correlation analysis task. The invention adopts 6 baseline models to carry out comparison experiments, and the comparison results of the experimental results are shown in Table 2.

TABLE 2 comparison of the invention with the baseline model test results

Analysis of table 2 shows that the acc, P, R, and F1 values of the present invention all exceed those of other baseline models, wherein acc is improved by 3.2% and F1 is improved by 2.5%. Therefore, the effectiveness of the multi-view similarity calculation method based on ensemble learning in case and news correlation analysis tasks is proved. In addition, the method provided by the invention is an integrated system constructed on the basis of three models, namely the baseline model Siamese-CNN, the Siamese-LSTM and the Siamese-Transformer, compared with the three baseline models, the F1 value of the method provided by the invention is improved by 3.9%, and the rationality of the method provided by the invention is powerfully proved. In addition, the F1 value of the BiMPM model obtains the optimal effect compared with other baseline models, because the BiMPM model uses a multi-feature matching mode, and the experimental effect of the method is combined, the similarity matching from multiple perspectives is an effective solution for the problem of matching unbalanced texts such as cases and news.

Step8.2, individual learner diversity analysis experiment: this experimental part is to verify the diversity of the individual learners proposed by the present invention. Table 3 is a tabulation of the individual learner predictions. In the table, the horizontal axis and the vertical axis indicate the respective learner, "G1" indicates the local information learner, "G2" indicates the structure information learner, "G3" indicates the topic information learner, "+" indicates the number of samples determined to be relevant by the individual learner, and "-" indicates the number of samples determined to be irrelevant by the individual learner. The results are shown in Table 3.

TABLE 3 prediction results tabulation

The Q statistic calculations were performed on the results in table 3, resulting in the results shown in table 4. Wherein "Q₁₂"represents the diversity metric result between the individual learners G1 and G2," Q "represents the diversity metric of the entire integrated system.

TABLE 4 Individual learner diversity metric results

As can be seen from Table 4, Q12, Q13 and Q22 are all in the range of 0-1, so that they all have positive correlation and have certain diversity. The results of the Q values illustrate the diversity of the overall integrated system. The experimental results prove that three different individual learners respectively learn information on different sides and the weight learner also well integrates the three different information.

Step8.3, integrated strategy utility analysis experiment: the experimental part is to verify the effectiveness of the weight learner, respectively select an average method, a voting method and a logistic regression algorithm which are common in an integrated strategy as comparison experiments, and use the final output of each individual learner as a training set of the comparison experiments. The results of the experiment are shown in table 5 below.

TABLE 5 integration strategy experimental results

Compared with other integration strategies, the method of the invention achieves the optimal effect, and the F1 value exceeds about 1 percent, thereby fully proving that the integration strategy of the invention has superiority in the task of the invention. The results of the averaging method and the voting method are not improved, but are slightly reduced, which shows that the two strategies do not play an integrated role, because the number of the individual learners is small, the effect of the structural information learner is superior to that of other experimental effects, and the phenomenon of result coverage exists. The logistic regression integration achieves the effect exceeding that of an individual learner, so that the integration strategy plays a role, and the integration effect of the learning type integration strategy is better than that of other strategies by combining the method disclosed by the invention.

Step8.4, and each internal learning device key module utility analysis experiment: the experimental part is used for verifying the validity of the key modules of each individual learner in the method. The results of the experiment are shown in Table 6. Specifically, the "(-) case" indicates that no case element is used by each individual learner as an external guide, the "(-) position" indicates that the position information is not used by the structure information learner, the "(-) self-Attention" indicates that the self-Attention mechanism is not used by the local information learner, and the "(-) topic" indicates that the subject information learner does not use the subject information.

Table 6 shows the results of the validation experiment of the characteristics of each part

Analysis table 6 shows that, in the case of (-) case, "the effects of the three individual learners are all significantly reduced compared with the method of the present invention, and comparison table 2 shows that, because of the guidance of case elements, the twin network with a simple structure originally has the performance exceeding that of ESIM, which is a complex matching-aggregation network. Therefore, the effectiveness of the invention for external guidance by case elements is fully proved. In the case of (-) position, there is a large drop in the result of the structure information learner, indicating that the learner has effectively learned the structure information; in the case of "(-) self-Attention", the drop in the effect of the local information learner indicates the effectiveness of self-Attention on the CNN channels, which represent different local feature information, and laterally verifies the importance of the learner in discriminating the local information. Finally, under the condition of (-) topic, the reduction of the effect of the subject information learner also reflects that the learner can effectively utilize the subject information to carry out similarity calculation.

Step8.5, News example test analysis experiment: the experiment is mainly used for verifying the improvement of the method of the invention on the case and news correlation analysis accuracy. The invention selects the news and cases shown in the following tables 7 and 8 to construct the triples of the case and news with similar relations.

Table 7 news text examples

Table 8 example of case description text

The triplet form constructed in tables 7 and 8 is as follows:

(case description, News 1, similar)

(case description, news 2, similar)

(case description, news 3, similar)

The invention selects the typical Siamese-LSTM, BiMPM and BERT in the baseline model to carry out experiments, and the experimental results are as follows, wherein 0 represents dissimilar, and 1 represents similar.

TABLE 9 News example test results

The experimental results are shown in the table 9 above, and the method of the present invention accurately determines the similarity relationship between the three news cases and the corresponding cases. For siemese-LSTM, BiMPM, and BERT, they cannot make an accurate determination of these three examples simultaneously. Therefore, the method and the device can be proved to well utilize the similarity relation of a plurality of visual angles, effectively solve the problem of unbalanced text between cases and news and improve the accuracy of calculating the similarity of the cases and the news.

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. The case and news correlation analysis method for multi-view ensemble learning is characterized by comprising the following steps of: the method comprises the following specific steps:

2. The case and news relevance analysis method for multi-view ensemble learning according to claim 1, wherein: the specific steps of Step1 are as follows:

step1.2, extracting local information of the feature vectors of cases and news by using a CNN network, and after pooling operation, performing weight learning on an output channel of the CNN by using a self-attention mechanism for improving the weight of important local information;

step1.3, performing Manhattan distance calculation on the extracted local information of the case and the news to obtain a final similarity relation.

3. The case and news relevance analysis method for multi-view ensemble learning according to claim 1, wherein: the specific Step of Step2 is as follows:

step2.1, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and adding absolute position coding information of case and news texts to obtain weighted case and news feature vectors;

step2.2, extracting structure information of case and news characteristic vectors containing position coding information by using a Transformer network layer;

4. The case and news relevance analysis method for multi-view ensemble learning according to claim 1, wherein: the specific steps of Step3 are as follows:

step3.1, carrying out unsupervised pre-training on all data of cases and news by using a variational self-encoder VAE to obtain an unsupervised topic model;

step3.2, using Chinese microblog word vectors to obtain embedded representation of each word in the title, introducing case elements as external guidance of news, and adding case and news topic vectors extracted by a topic model to obtain weighted case and news characteristic vectors;

step3.3, extracting the theme information of case and news characteristic vectors containing the theme information by using a bidirectional LSTM network layer;

5. The case and news relevance analysis method for multi-view ensemble learning according to claim 1, wherein: the Step4 includes:

and (3) learning a single visual angle by using an individual learner, optimizing the pre-training effect, performing weight learning by combining three representations obtained by Step1, Step2 and Step3, acquiring weight information at different angles by using a multi-head self-attention mechanism, then performing distance calculation by using a feedforward neural network, and finally obtaining the final similarity.