CN111027313A

CN111027313A - BiGRU judgment result tendency analysis method based on attention mechanism

Info

Publication number: CN111027313A
Application number: CN201811166731.8A
Authority: CN
Inventors: 王宁; 周晓磊; 李世林; 刘堂亮; 张镝; 祁柏林; 赵奎
Original assignee: Shenyang Institute of Computing Technology of CAS
Current assignee: Shenyang Institute of Computing Technology of CAS
Priority date: 2018-10-08
Filing date: 2018-10-08
Publication date: 2020-04-17

Abstract

The invention relates to a BiGRU judgment result tendency analysis method based on an attention mechanism, which extracts keyword information of parties of a judgment document to be analyzed and keyword information of judgment results; segmenting the judgment result into a plurality of single sentences, and performing word segmentation and stop word removal processing on each single sentence to obtain a word sequence; constructing a word vector table, and expressing the word sequence as a corresponding word vector matrix by using the word vector table; performing BiGRU calculation on the word vector matrix to obtain a feature vector of the word vector matrix, and performing attention calculation on the feature vector of the word vector matrix to obtain an output vector of an attention mechanism; the output vector of the attention mechanism is sorted using softmax. The method improves the accuracy of the algorithm, enables the classification result to be more accurate, makes up for the neglect of the context of the text based on the traditional one-way neural network, improves the accuracy of the emotion classification result, has good effect on strengthening the key information in the text, and improves the accuracy of the algorithm.

Description

BiGRU judgment result tendency analysis method based on attention mechanism

Technical Field

The invention relates to the field of deep learning and natural language processing, in particular to a BiGRU judgment result tendency analysis method based on an attention mechanism.

Background

With the development of modern information technology and the deepening of legal construction, along with the disclosure of a large number of referee documents, the knowledge mining from a large number of referee documents becomes more meaningful. Text emotion analysis: it is also called opinion mining, tendency analysis, etc. and, in brief, is a process of analyzing, processing, inducing and reasoning subjective texts. The mainstream algorithms in research in the field of text emotion analysis are most popular with deep learning.

With the development of text emotion analysis, the application of the text emotion analysis to judgment result tendency analysis is a necessary trend. The current text sentiment analysis algorithm is applied to the research of sentiment analysis of judgment results, because the information loss causes the accuracy of the sentiment analysis to be reduced, the calculation is complex, and the sentiment analysis efficiency is restricted to a certain extent.

Aiming at the problems in the analysis and research of the judgment result tendency, the invention provides a BiGRU judgment result tendency analysis method based on attention mechanism, so as to improve the accuracy of the judgment result tendency analysis and the efficiency of the judgment result tendency.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a BiGRU judgment result tendency analysis method based on an attention mechanism, which solves the problem that the classification accuracy is reduced due to information loss in tendency analysis in judgment results, and improves the efficiency of judgment result tendency analysis calculation.

The technical scheme adopted by the invention for realizing the purpose is as follows:

a BiGRU decision result tendency analysis method based on attention mechanism,

extracting key word information of parties of a referee document to be analyzed and key word information of a judgment result;

segmenting the judgment result into a plurality of single sentences, and performing word segmentation and stop word removal processing on each single sentence to obtain a word sequence;

constructing a word vector table, and expressing the word sequence as a corresponding word vector matrix by using the word vector table;

performing BiGRU calculation on the word vector matrix to obtain a feature vector of the word vector matrix, and performing attention calculation on the feature vector of the word vector matrix to obtain an output vector of an attention mechanism;

the output vector of the attention mechanism is sorted using softmax.

The keyword information of the parties comprises original reports, announcements, appetitives, appetities, applicants and applicants.

The judgment result keyword information includes "judgment as follows: "," adjudicate as follows: "," adjudicate as follows: "," is determined as follows: ".

If the judgment result does not have the standard legal title of the party, the non-standard legal title of the party is replaced by the standard legal title of the corresponding party, and the method specifically comprises the following steps:

the constructing of the word vector table includes the following processes:

removing stop words and performing word segmentation processing on the referee document training set to generate a corpus required for constructing word vectors; generating a first vocabulary list for the materials, counting and sequencing word frequency of each word, and taking V words with the maximum word frequency to form a second vocabulary list;

each word in the second vocabulary list is represented by a corresponding one-hot vector, and the dimension of the one-hot vector corresponding to each word is V, so that a one-hot vector list is generated;

and (5) performing dimension reduction on the one-hot vector table by using a Skip-gram model to generate a word vector table.

The word vector matrix is represented as:

(w₁,w₂,w₃,",w_n)→S＝(s₁,s₂,s₃,",s_n)

wherein s is_iIs the word vector of the ith keyword, and S is a word vector matrix.

The eigenvectors of the word vector matrix are calculated as follows:

z_t＝σ(W_z·[h_t-1,x_t])

r_t＝σ(W_r·[h_t-1,x_t])

wherein x is_tIs input data, h_tIs the output of the current GRU calculation unit, h_t-1Is the calculation output of the previous calculation unit, z_tIs to update the door r_tIs a reset gate, z_tAnd r_tCo-control the slave h_t-1Hidden state to h_tCalculation of hidden state, updating gate while controlling current input data and previous memory information h_t-1Outputting a value z between 0 and 1_t,z_tDetermine how much to put h_t-1And the next state.

Are candidate hidden states and a reset gate is used to control the flow of the last hidden state containing past time information. In the formula, sigma is sigmoid function, tanh is tangent activation function, W_z,W_rW are the update gate, the reset gate, and the weight matrix of the candidate hidden state, respectively.

The output vector calculation process of the attention mechanism is as follows:

1) performing similarity calculation on the word vector of each word in the word vector matrix and all the word vectors in the matrix to obtain a weight, specifically comprising:

M＝tanh(h_t)

h_tfor the output vector of t time steps calculated by the BiGRU layer, tanh is an activation function, and M is a temporary weight matrix.

4) Normalizing the temporary weight matrix by utilizing a softmax function, which specifically comprises the following steps:

α＝softmax(w^TM)

wherein, w^TThe attention weight matrix α is obtained through softmax calculation for the weight matrix initialized at random and learned in training.

5) And performing weighted summation on the weights and the corresponding feature vectors to obtain an output vector of the attention layer, wherein the weighted summation specifically comprises the following steps:

γ＝h_tα^T

where γ is the output vector of the attention layer.

The invention has the following beneficial effects and advantages:

1. according to the method, the connection of the text context is strengthened by using the bidirectional neural network, the accuracy of the algorithm is improved, the classification result is more accurate, the neglect of the text context based on the traditional unidirectional neural network is compensated, and the emotion classification result accuracy is improved.

2. According to the method, the loss of the text detail information is reduced by adding the attention mechanism, a good effect can be achieved on strengthening the key information in the text, and the accuracy of the algorithm is improved to a certain extent.

3. The invention can carry out emotion classification calculation in specific fields according to texts in different fields, and has certain personalized expandability.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a diagram of a model for feature vector computation of the word vector matrix according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, embodiments accompanying the drawings are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather should be construed as modified in the spirit and scope of the present invention as set forth in the appended claims.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

As shown in fig. 1, a BiGRU decision result tendency analysis and calculation method based on attention mechanism includes the following steps:

step 1: the user inputs a referee document needing to analyze the tendency of the judgment result;

step 2: extracting required original reported information and a judgment result from the referee document according to the keywords;

and step 3: replacing the name of the party in the judgment result extracted in the step 2 with the original standard legal name;

and 4, step 4: dividing the judgment result processed in the step 3 into strips, and refining the judgment result;

and 5: removing stop words from each result and segmenting words to form word sequences;

step 6: expressing the judgment result after word segmentation as a corresponding judgment result vector (S) by utilizing a trained word vector table;

and 7: performing BiGRU calculation on the word vector sequence to obtain a feature vector of the word vector;

and 8: performing attention calculation on the calculation result obtained in the step 7 to obtain an output vector of an attention mechanism;

and step 9: the output vector of the attention mechanism in step 8 is finally sorted with softmax.

Wherein, the step 2: extracting key information, wherein the specific format of the official document is as follows:

and extracting the information of the party and the text content of the judgment result from the keywords.

And step 3:

and if no standard legal title judgment regulation exists in judgment, the legal titles need to be unified, and the replacement of the personal title in the judgment result with the original quilt and other standard legal titles is realized in a longest subsequence matching mode. And replacing the matched position in the judgment result by a corresponding legal title through the calculation of the longest public subsequence.

The formula is as follows:

and C [ i, j ] represents the length of the longest public subsequence, and the largest C [ i, j ] is selected and replaced by a corresponding legal name. Example (c):

original defended information:

the upper complainer (original trial) license x, man.

The loved one (former trial announcement) wang x, woman ·.

And (4) judging regulation:

the two trial cases accepted cost six thousand one hundred fifty dollars, borne by the permit x (paid).

Replacement is with a standard sentence:

the two-trial case acceptance cost six thousand, one hundred and fifty yuan, and is borne by the complainer (paid).

In the step 6: first, the related concepts are defined as follows:

decision result vector (S): decision after stripping using natural language processing techniquesAnd performing operations such as word segmentation and word stop removal to obtain a word sequence, and expressing the word sequence as S (S) through a word vector table₁,s₂,",s_n) Wherein s is_iA word vector for the ith keyword.

(w₁,w₂,w₃,",w_n)→S＝(s₁,s₂,s₃,",s_n)

Example (b):

and (4) judging the text: permitting a certain first of the Prolate to be divorced from a certain second of the Paris Caucaria;

and (3) judging a word sequence: (grant, grandfather, Franzeny, and, quilt, Cao, divorce).

A word vector matrix: (w)₁,w₂,w₃,",w_n)→S＝(s₁,s₂,s₃,",s_n)。

And 7: feature vector of word vector: and inputting the word vector into a BiGRU network for calculation to obtain a feature vector. As shown in fig. 2, the word vector sequence S obtained in step 6 is set to (S)₁,s₂,…,s_n-1,s_n) Input into BiGRU network, i.e. x in formula_tAnd finally obtaining the feature vector corresponding to the word vector through the calculation of each unit of the network layer.

z_t＝σ(W_z·[h_t-1,x_t])

r_t＝σ(W_r·[h_t-1,x_t])

The specific description is as follows: x is the number of_tIs input data, h_tIs the output of the current GRU calculation unit, h_t-1Is the calculation output of the previous calculation unit, z_tIs to update the door r_tIs a reset gate, z_tAnd r_tCo-control the slave h_t-1Hidden shapeState to h_tCalculation of hidden state, updating gate while controlling current input data and previous memory information h_t-1Outputting a value z between 0 and 1_t,z_tDetermine how much to put h_t-1And the next state. h ^ e_tAre candidate hidden states and a reset gate is used to control the flow of the last hidden state containing past time information. In the formula, sigma is sigmoid function, tanh is tangent activation function, W_z,W_rW are the update gate, the reset gate, and the weight matrix of the candidate hidden state, respectively.

And 8: the attention calculation for the vector is accomplished by inputting the feature vector of the word vector into the attention mechanism to calculate the output vector of the attention mechanism.

The specific calculation is as follows: 1) performing similarity calculation on the word vector of each word in the word vector matrix and all the word vectors in the matrix to obtain a weight, specifically comprising:

M＝tanh(h_t)

2) Normalizing the temporary weight matrix by utilizing a softmax function, which specifically comprises the following steps:

α＝softmax(w^TM)

3) And performing weighted summation on the weights and the corresponding feature vectors to obtain an output vector of the attention layer, wherein the weighted summation specifically comprises the following steps:

γ＝h_tα^T

where γ is the output vector of the attention layer.

And mapping the final result through an activation function to obtain a classification result of the text.

Claims

1. A BiGRU judgment result tendency analysis method based on an attention mechanism is characterized in that:

the output vector of the attention mechanism is sorted using softmax.

2. The attention mechanism-based BiGRU decision-tendency analysis method of claim 1, wherein: the keyword information of the parties comprises original reports, announcements, appetitives, appetities, applicants and applicants.

3. The attention mechanism-based BiGRU decision-tendency analysis method of claim 1, wherein: the judgment result keyword information includes "judgment as follows: "," adjudicate as follows: "," adjudicate as follows: "," is determined as follows: ".

4. The attention mechanism-based BiGRU decision-tendency analysis method of claim 1, wherein: if the judgment result does not have the standard legal title of the party, the non-standard legal title of the party is replaced by the standard legal title of the corresponding party, and the method specifically comprises the following steps:

wherein, C [ i, j ] represents the length of the longest public subsequence, and the largest C [ i, j ] is selected and replaced by a corresponding legal title.

5. The attention mechanism-based BiGRU decision-tendency analysis method of claim 1, wherein: the constructing of the word vector table includes the following processes:

6. The attention mechanism-based BiGRU decision-tendency analysis method of claim 1, wherein: the word vector matrix is represented as:

(w₁,w₂,w₃,…,w_n)→S＝(s₁,s₂,s₃,…,s_n)

7. The attention mechanism-based BiGRU decision-tendency analysis method of claim 1, wherein: the eigenvectors of the word vector matrix are calculated as follows:

z_t＝σ(W_z·[h_t-1,x_t])

r_t＝σ(W_r·[h_t-1,x_t])

wherein x is_tIs input data, h_tIs the output of the current GRU calculation unit, h_t-1Is the calculation output of the previous calculation unit, z_tIs to update the door r_tIs a reset gate, z_tAnd r_tCo-control the slave h_t-1Hidden state to h_tCalculation of hidden state, updating gate while controlling current input data and previous memory information h_t-1Outputting a value z between 0 and 1_t、z_tDetermine how much to put h_t-1Passing to the next state;

is a candidate hidden state, and uses a reset gate to control the flow of the last hidden state containing past time information; in the formula, sigma is sigmoid function, tanh is tangent activation function, W_z、W_rW are the weight matrices of the update gate, the reset gate and the candidate hidden state, respectively.

8. The attention mechanism-based BiGRU decision-tendency analysis method of claim 1, wherein: the output vector calculation process of the attention mechanism is as follows:

M＝tanh(h_t)

h_tcalculating output vectors of t time steps through a BiGRU layer, wherein tanh is an activation function, and M is a temporary weight matrix;

α＝softmax(w^TM)

wherein, w^TCalculating a weight matrix which is initialized randomly and learned in training by softmax to obtain an attention weight matrix α;

γ＝h_tα^T

where γ is the output vector of the attention layer.