CN112765315A

CN112765315A - Intelligent classification system and method for legal scenes

Info

Publication number: CN112765315A
Application number: CN202110061753.3A
Authority: CN
Inventors: 冯建周; 崔金满; 魏启凯; 王子易
Original assignee: Yanshan University
Current assignee: Shengming Jizhi Beijing Technology Co ltd
Priority date: 2021-01-18
Filing date: 2021-01-18
Publication date: 2021-05-07
Anticipated expiration: 2041-01-18
Also published as: CN112765315B

Abstract

The invention discloses a system and a method for intelligently classifying legal scenes, wherein the system comprises three modules of classification, self-learning and self-adaption. Firstly, inputting a sample to be classified into a classification module fusing a mixed attention prototype network and a word vector similarity method, and predicting the class of the sample; meanwhile, a self-learning module is designed, and the module can set confidence coefficient for the prediction result and add a high-confidence prediction sample into a training set, so that a prediction library is expanded and the performance of a model is improved; in addition, the model also has the self-adaptive capability, and can automatically adapt to changes such as increase, reduction or modification of categories. Compared with the traditional deep learning, the technical method can realize high-efficiency legal scene classification under the condition of only a small amount of initial training samples.

Description

Intelligent classification system and method for legal scenes

Technical Field

The invention relates to the technical field of legal natural language processing, in particular to a legal scene intelligent classification system and method based on fusion of a mixed attention prototype network and word vector similarity.

Background

The existing text classification models based on deep learning have a plurality of models and achieve good classification effect, but in the field of scarce samples, the models are difficult to play a role. Especially, in an intelligent question-answering system in the legal field, accurate classification of legal scenes is a necessary premise for realizing intelligent question-answering, but user consultation problems are often serious in spoken language, the labeling cost is high, and the effect of the traditional supervised learning method is poor. Aiming at the problem, a method for fusing prototype network classification and word vector similarity based on mixed attention is designed, and the problem of difficulty in classification of legal scenes under sparse labeling is solved. The model has self-learning and self-adaptive capabilities, wherein the self-learning performance is that the model performance can be improved by automatically expanding the corpus; the self-adaptation is represented by automatically adapting to the conditions of increasing, decreasing or modifying the categories and the like on the premise of not retraining the model.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a legal scene intelligent classification system and a legal scene intelligent classification method, which solve the defects in the prior art.

In order to realize the purpose, the technical scheme adopted by the invention is as follows:

a legal scene intelligent classification system comprising: the system comprises a classification module, a self-learning module and a self-adaption module;

the classification module comprises a prototype network module based on mixed attention and a word vector similarity module and is used for classifying user consultation problems. The prototype network module based on mixed attention is used for calculating the distance between the user consultation problem and each class prototype so as to judge the class to which the user consultation problem belongs; the word vector similarity module judges the category of the word vector by calculating the similarity between the word vector of the user consultation problem and various label vectors, so that the mixed attention prototype network is assisted, and the classification effect is improved;

a self-learning module: the method is used for automatically expanding the category data with insufficient corpus in the training set, so that the accuracy of the system in classification prediction is improved;

an adaptive module: when adding, subtracting or modifying categories to be classified, changes in categories can be automatically accommodated.

The invention also discloses a classification method of the intelligent classification system for legal scenes, which comprises the following steps:

step 1, firstly, inputting user consultation questions into a prototype network method based on mixed attention to obtain score vectors P on each class₁Then, inputting the user consultation question into a word vector similarity method to obtain score vectors P on each class₂Finally, by attention mechanism P₁And P₂And weighting and summing to obtain a final score vector P, thereby outputting the category with the highest score. Wherein the score vector P ═ α P₁+βP₂And alpha and beta are attention coefficients.

And 2, firstly, judging whether the training data of the final type obtained by model prediction is sufficient or not, when the data is insufficient, calculating the confidence coefficient of the prediction result of the sample, if the confidence coefficient exceeds a threshold value set by a system, storing the sample into a temporary corpus, and when the data volume of the temporary corpus reaches 200, expanding the temporary corpus into a training set to retrain the model, thereby improving the accuracy of the model.

And 3, when the categories to be classified are increased, decreased or modified, finely adjusting the model, testing the model by using the data in the test set, and triggering the model to be retrained if the accuracy on the test set is lower than a threshold value set by the system. In addition, when the categories are added, the number of samples corresponding to each category should not be less than 20.

Further, the prototype network method based on mixed attention comprises the following working steps:

1. the method adopts an N-way K-shot mode of small sample learning to train, firstly, N categories are extracted for training set data input into the module, and each category comprises K samples to form a support setAnd S, extracting Q samples from each class in the rest samples of the N classes to form a query set. Then inputting the samples in the support set into the coding layer, converting the natural language text into a vector form E which can be identified by a computer, and simultaneously inputting the samples q in the query set into the coding layer to convert the samples q into a feature vector x_q. Wherein, the supporting set

J-th sample, w, representing the i-th class_tDenotes the t-th word in the sample, n denotes the maximum length of the sample,

a feature vector representing the jth sample of the ith class;

2. because the contribution degrees of the features of the feature vectors in different dimensions to the categories of the feature vectors are different, for various samples in the support set of the input feature level attention module, the features of the various samples in each dimension are extracted, and the feature vector Z which can better represent the characteristics of the categories is obtained (Z is equal to the feature vector Z)₁，z₂，…z_N). Wherein z is_iRepresenting a weight vector of the ith class in a feature dimension;

3. in the classification of small samples, since training data input into the model is very small, noise contained in the samples has a great influence on the performance of the model. Therefore, each sample in the support set is weighted by the example-level attention module, so that the sample with higher similarity to the sample in the query set is given higher weight to obtain a weighted feature vector E', and noise data brought by the noise data is reducedInfluence. Wherein,

a weighted feature vector representing the jth sample of the ith class in the support set;

4. the weighted feature vector of each sample can be obtained through the example level attention module, so that the class prototype c of each class can be obtained through summing the weighted feature vectors of the samples of each class_i；

5. The weight vectors of various samples on different dimensions can be obtained through the feature level attention module, and therefore the weight vectors z of various samples can be obtained according to various weight vectors_iGenerating a distance function, and calculating the distance d between the query set sample q and various prototypes according to the distance function_iFinally according to the distance d_iThe class to which the sample q belongs can be derived. Wherein the distance function is represented as d_i＝z_i(c_i-x_q)²；

6. For the user consultation problem input into the module, the distance d between the module and various prototypes can be calculated according to the distance function_iAccording to the distance d_iObtaining the probability that the user's consultation problem belongs to various categories

Wherein,

representing the probability that the user consultation question belongs to the i-th class in the mixed attention prototype-based network method.

Further, the working steps of the word vector similarity method are as follows:

1. firstly, the class name of each class in a training set is taken as a keyword, and the keyword is input into word2vec to obtain a word vector K of each keyword_iThen, the samples in each class are classifiedConverting the words into corresponding word vectors, and adding and averaging the sample word vectors in each class as a mark vector V of each class_i；

2. Respectively matching the user consultation problems with various keyword vectors K_iAnd a flag vector V_iCarrying out similarity calculation to obtain a corresponding similarity score M₁And M₂Then to M₁And M₂Weighted summation is carried out to obtain the probability that the problem belongs to various types

Wherein,

indicating the probability that the user consultation question belongs to the ith class in the word vector similarity method.

Further, the corpus expansion in step 2 specifically includes: after the sample prediction is carried out, the category of the user consultation problem is obtained, whether the corpus is sufficient or not is judged, the confidence coefficient of the problem needs to be judged under the condition that the corpus is insufficient, and therefore a confidence coefficient selection strategy based on data uncertainty is adopted to judge whether the user consultation problem can be used for expanding the corpus or not.

Further, the confidence coefficient is calculated according to the formula

Wherein true _ num represents the number of words in the question appearing in the prediction class sample, and all _ num represents the number of words in the question; if the words in the problem exist in the samples of the class, the result corresponding to the problem is considered to be reliable and can be used for expanding the corpus, if part of the words do not exist in the samples of the class and the confidence coefficient smaller than the threshold set by the system is obtained, the problem cannot be used for expanding the corpus, the class to which the problem belongs needs to be judged by a lawyer and then put into the temporary corpus, and when the data volume in the temporary corpus reaches 200, the model is retrained, so that the model achieves a better classification effect.

Compared with the prior art, the invention has the advantages that:

the method not only can well realize the classification of legal scenes under the condition of scarce samples, but also has the self-learning and self-adaptive capabilities, and improves the flexibility of the model.

Drawings

FIG. 1 is a general framework diagram of an intelligent classification method for legal scenarios according to an embodiment of the present invention;

FIG. 2 is a block diagram of a hybrid attention based prototype network module according to an embodiment of the present invention;

FIG. 3 is a block diagram of a legal scene classification module based on word vector similarity according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings by way of examples.

As shown in fig. 1, the intelligent classification system for legal scenes includes: the system comprises a classification module, a self-learning module and a self-adaption module;

for the classification module, firstly, the user consultation questions are input into a prototype network method based on mixed attention to obtain score vectors P on each class₁Then, inputting the user consultation question into a word vector similarity method to obtain score vectors P on each class₂Finally, by attention mechanism P₁And P₂And weighting and summing to obtain a final score vector P, thereby outputting the category with the highest score.

For the self-learning module, the invention designs a mechanism for screening input corpora through confidence coefficient to expand the corpus of the model. Firstly, judging whether the training data of the final type obtained by model prediction needs to be expanded, if so, calculating the confidence coefficient of the sample prediction result, if the confidence coefficient exceeds a threshold value set by a system, storing the sample into a temporary corpus, and when the data volume of the temporary corpus reaches 200, expanding the sample into a training set, and automatically retraining the model, thereby improving the effect of the model.

For the self-adaptive module, when the categories to be classified are increased, decreased or modified, the model needs to be finely adjusted, the model is tested by using data in the test set, and if the accuracy on the test set is lower than a threshold value set by a system, the model is triggered to be retrained. In addition, when the categories are added, the number of samples corresponding to each category should not be less than 20.

As shown in fig. 2, the prototype network module based on mixed attention mainly includes the following points:

1. the method adopts an N-way K-shot mode of small sample learning to train, firstly extracts N types of training set data input into the module, wherein each type comprises K samples to form a support set S, and extracts Q samples from each type of the rest samples of the N types to form a query set. Then inputting the samples in the support set into the coding layer, converting the natural language text into a vector form E which can be identified by a computer, and simultaneously inputting the samples q in the query set into the coding layer to convert the samples q into a feature vector x_q. Wherein, the supporting set

representing the feature vector of the jth sample of the ith class.

2. Because the feature vectors have different contribution degrees of features on different dimensions to the categories to which the feature vectors belong, the attention module support set for the input feature levelThe characteristics of each sample in each dimension are extracted to obtain a characteristic vector Z (Z is the characteristic of each class of samples) which can better represent the class characteristics of the samples₁，z₂，…z_N). Wherein z is_iRepresenting the weight vector of class i in the feature dimension.

3. In the classification of small samples, since training data input into the model is very small, noise contained in the samples has a great influence on the performance of the model. Therefore, each sample in the support set is weighted by the example-level attention module, so that the sample with higher similarity to the sample in the query set is given higher weight to obtain a weighted feature vector E', and the influence caused by noise data is reduced. Wherein,

representing the weighted feature vector of the jth sample of the ith class.

4. The weighted feature vector of each sample can be obtained through the example level attention module, so that the class prototype c of each class can be obtained through adding the weighted feature vectors of the samples of each class_i。

5. The weight vectors of various samples on different dimensions can be obtained through the feature level attention module, and therefore the weight vectors z of various samples can be obtained according to various weight vectors_iGenerating a distance function, and calculating the distance d between the query set sample q and various prototypes according to the distance function_iFinally according to the distance d_iAnd obtaining the belonged category of the sample q. Wherein the distance function is represented as d_i＝z_i(c_i-x_q)²。

Wherein,

As shown in fig. 3, the legal scene classification module based on word vector similarity includes the following points:

1. firstly, the class name of each class in a training set is taken as a keyword, and the keyword is input into word2vec to obtain a word vector K of each keyword_i. Then, the samples in each class are divided into words and converted into corresponding word vectors, and the sample word vectors in each class are added to calculate the average value to be used as a mark vector V of each class_i。

2. Respectively matching the user consultation problems with various keyword vectors K_iAnd a flag vector V_iCarrying out similarity calculation to obtain corresponding similarity M₁And M₂Then to M₁And M₂Weighted summation is carried out to obtain the probability that the problem belongs to various types

Wherein,

The effect of the word vector similarity method is far inferior to that of the prototype network classification method, but the situation complementary to the prototype network classification method can be formed by adding the word vector similarity method, and when the prediction effect of the prototype network method is poor, the prediction result of the prototype network method can be supplemented by fusing the word vector similarity method.

Because the classified samples in the middle of the corpus are scarce, the model performance is poor, and the corpus is expanded by using a self-learning method to improve the effect of the model. After the sample prediction is carried out to obtain the category of the user consultation problem, whether the corpus is sufficient is judged, the confidence coefficient of the problem prediction result needs to be judged under the condition that the corpus is insufficient, and therefore a confidence coefficient selection strategy based on data uncertainty is adopted to judge whether the user consultation problem can be used for expanding the corpus.

The confidence of the problem is calculated by the formula

Where true _ num represents the number of words in the problem that appear in the prediction class sample and all _ num represents the number of words in the problem. If the words in the problem exist in the samples of the class, the result corresponding to the problem is considered to be reliable and can be used for expanding the corpus, if part of the words do not exist in the samples of the class and the confidence coefficient smaller than the threshold set by the system is obtained, the problem cannot be used for expanding the corpus, the class of the problem needs to be judged by a lawyer and then put into the temporary corpus, and when the data volume in the temporary corpus reaches 200, the model is retrained, so that the model achieves a better classification effect.

It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the manner in which the invention is practiced, and it is to be understood that the scope of the invention is not limited to such specifically recited statements and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. An intelligent legal scene classification system, comprising: the system comprises a classification module, a self-learning module and a self-adaption module;

the classification module comprises a prototype network module based on mixed attention and a word vector similarity module and is used for classifying user consultation problems. The prototype network module based on mixed attention is used for calculating the distance between the user consultation problem and each class prototype so as to judge the class to which the user consultation problem belongs; the word vector similarity module judges the category of the word vector by calculating the similarity between the word vector of the user consultation problem and various label word vectors, so that the mixed attention prototype network is assisted, and the classification effect is improved;

2. The classification method of the intelligent classification system of legal scenes according to claim 1, characterized by comprising the following steps:

step 1, firstly, inputting user consultation questions into a prototype network method based on mixed attention to obtain score vectors P on each class₁Then, inputting the user consultation question into a word vector similarity method to obtain score vectors P on each class₂By attention mechanism pair P₁And P₂And weighting and summing to obtain a final score vector P, thereby outputting the category with the highest score. Wherein the score vector P ═ α P₁+βP₂And alpha and beta are attention coefficients.

Step 2, firstly, judging whether the training data of the final type obtained by model prediction is sufficient or not, when the data is insufficient, calculating the confidence coefficient of the sample prediction result, if the confidence coefficient exceeds a threshold value set by a system, storing the sample into a temporary corpus, and when the data volume of the temporary corpus reaches 200, expanding the temporary corpus into a training set to retrain the model again, thereby improving the accuracy of the model;

3. The classification method according to claim 2, characterized in that: the prototype network method based on mixed attention comprises the following working steps:

(1) the method adopts an N-way K-shot mode of small sample learning to train, firstly extracts N types of training set data input into the module, wherein each type comprises K samples to form a support set S, and extracts Q samples from each type of the rest samples of the N types to form a query set. Then inputting the samples in the support set into the coding layer, converting the natural language text into a vector form E which can be identified by a computer, and simultaneously inputting the samples q in the query set into the coding layer to convert the samples q into a feature vector x_q. Wherein, the supporting set

a feature vector representing the jth sample of the ith class;

(2) because the feature vectors have different contribution degrees on the features of different dimensions to the categories of the feature vectors, for various samples in the support set of the input feature level attention module, the features of the various samples on each dimension are extracted to obtain the features capable of better representing the categoriesIs (Z) is the feature vector Z ═ Z₁,z₂,…z_N) (ii) a Wherein z is_iRepresenting a weight vector of the ith class in a feature dimension;

(3) in the classification of small samples, because the training data input into the model are few, the noise contained in the sample can have great influence on the performance of the model; therefore, each sample in the support set is weighted by the example level attention module, so that the sample with higher similarity to the sample in the query set is endowed with higher weight to obtain a weighted feature vector E', and the influence brought by noise data is reduced; wherein,

(4) the weighted feature vector of each sample can be obtained through the example level attention module, so that the class prototype c of each class can be obtained through adding the weighted feature vectors of the samples of each class_i；

(5) The weight vectors of various samples on different dimensions can be obtained through the feature level attention module, and therefore the weight vectors z of various samples can be obtained according to various weight vectors_iGenerating a distance function, and calculating the distance d between the query set sample q and various prototypes according to the distance function_iFinally according to the distance d_iThe class to which the sample q belongs can be derived. Wherein the distance function is represented as d_i＝z_i(c_i-x_q)²；

(6) For the user consultation problem input into the module, the distance d between the module and various prototypes can be calculated according to the distance function_iAccording to the distance d_iObtaining the probability that the user's consultation problem belongs to various categories

Wherein,

4. The classification method according to claim 2, characterized in that: the working steps of the word vector similarity method are as follows:

(1) firstly, the class name of each class in a training set is taken as a keyword, and the keyword is input into word2vec to obtain a word vector K of each keyword_iThen, the samples in each class are divided into words and converted into corresponding word vectors, and the sample word vectors in each class are added and averaged to be used as a mark vector V of each class_i；

(2) Respectively matching the user consultation problems with various keyword vectors K_iAnd a flag vector V_iCarrying out similarity calculation to obtain a corresponding similarity score M₁And M₂Then to M₁And M₂Weighted summation is carried out to obtain the probability that the problem belongs to various types

Wherein,

indicating the probability that sample q belongs to class i in the word vector similarity method.

5. The classification method according to claim 2, characterized in that: the corpus expansion in step 2 specifically comprises: after the sample prediction is carried out, the category of the user consultation problem is obtained, whether the corpus is sufficient or not is judged, the confidence coefficient of the problem needs to be judged under the condition that the corpus is insufficient, and therefore a confidence coefficient selection strategy based on data uncertainty is adopted to judge whether the user consultation problem can be used for expanding the corpus or not.

6. The classification method according to claim 5, characterized in that: the confidence coefficient is calculated by the formula