CN110909547A

CN110909547A - Judicial entity identification method based on improved deep learning

Info

Publication number: CN110909547A
Application number: CN201911156444.3A
Authority: CN
Inventors: 王艳; 杨品莉; 林锋; 邹奕; 周激流
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2020-03-24

Abstract

The invention discloses a judicial entity recognition method based on improved deep learning, which comprises the steps of obtaining a judicial text, carrying out standard processing on the format of the text and marking, and obtaining a data set comprising a training sample and a test sample; inputting the training sample into a judicial entity recognition model for training; and inputting the test sample of the text to be recognized into the trained judicial entity recognition model to obtain a recognition result. The invention can obtain long-distance context characteristics, obtain more information and improve the identification precision and range; the problem that the prediction label sequence is invalid in judicial identification by the deep learning method is solved, and the effectiveness and reliability of identification are ensured.

Description

Judicial entity identification method based on improved deep learning

Technical Field

The invention belongs to the technical field of judicial entity identification, and particularly relates to a judicial entity identification method based on improved deep learning.

Background

In the judicial field, the judicial files have the problems of large data volume, various file types and the like, so that the realization of information automation in the judicial field is a necessary trend for the development of the judicial field. The information automation in the judicial field can reduce the workload of the judicial personnel, is beneficial to improving the work efficiency of the judicial industry, and is beneficial to realizing the information sharing in the judicial field.

In recent years, with the continuous proposition of new natural language processing techniques and the urgent need of implementing judicial information automation in the judicial field, more and more natural language processing techniques are applied to the judicial field, such as entity recognition, relationship extraction, and the like. A large number of judicial domain entities exist in the legal case text, and the recognition of the judicial domain entities is the basis for realizing the automation of the judicial domain information and is the premise of the subsequent technologies of extracting the judicial information, constructing knowledge maps of the judicial domain and the like. Therefore, the research of judicial entity identification is very important for the development of the judicial field.

At present, named entity recognition has achieved a great deal of success in many fields as a fundamental research of natural language processing. However, because of the particularity of chinese characters compared with english characters, the phenomenon of meaning ambiguity of chinese characters exists and the relationship between chinese character words and words is relatively tight, and the research results of entity recognition in the chinese field are relatively few at present. The earliest named entity approaches included dictionary and rule-based approaches that required an expert to manually create a rule template that identified the named entity using pattern and string matching. The two methods have high requirements on the material library, and the transportability of the two methods is poor. With the wider application of deep learning technology in natural language processing and the introduction of distributed representation of words, the named entity recognition technology based on deep learning has achieved some achievements. However, deep learning based methods predict each character independently from a given set of features without considering the already predicted tags above, which may render the predicted tag sequence invalid. At present, a circulating neural network (RNN) is a typical deep learning network model for processing serialized sentences, and practice proves that the method can cause the problem of gradient disappearance and cannot continuously optimize if the length of a sequence is too long; therefore, RNN has a length dependency problem and cannot acquire context feature information of any length.

Disclosure of Invention

In order to solve the problems, the invention provides a judicial entity identification method based on improved deep learning, which can acquire long-distance context characteristics, acquire more information and improve identification precision and range; the problem that the prediction label sequence is invalid in judicial identification by the deep learning method is solved, and the effectiveness and reliability of identification are ensured.

In order to achieve the purpose, the invention adopts the technical scheme that: a judicial entity recognition method based on improved deep learning comprises the following steps of;

acquiring a judicial text, carrying out standard processing on the format of the text and marking the text, and acquiring a data set comprising a training sample and a test sample;

inputting the training sample into a judicial entity recognition model for training;

and inputting the test sample of the text to be recognized into the trained judicial entity recognition model to obtain a recognition result.

Further, in the process of carrying out standard processing and marking on the text format, space removal processing is carried out on the text first, and then the text is marked to obtain a text sequence.

Further, the judicial entity recognition model is a bidirectional long and short term memory model with a conditional random field, the bidirectional long and short term memory model with the conditional random field comprises a sequence input module, a forward long and short term memory model module, a backward long and short term memory model module and a conditional random field module, and the sequence input module, the forward long and short term memory model module, the backward long and short term memory model module and the conditional random field module are sequentially connected.

Further, the forward long-short term memory model module extracts past features, and the backward long-short term memory model module extracts future features; performing long-short term memory feature extraction on the same sequence from left to right, and performing long-short term memory feature extraction from right to left to obtain a tag sequence of the bidirectional semantic information; the problem of length dependence of a traditional deep learning method is solved, and context feature information with any length can be obtained; information changed to a cell state by a door mechanism is utilized to maintain the durability of information transmission, so that long-distance context characteristics can be learned; the information from front to back can be coded, the information from back to front can also be coded, bidirectional semantic information can be obtained, and the identification effectiveness is improved.

And the conditional random field module is connected to the hidden layer output of the backward long-short term memory model module, and jointly decodes the tag sequence output by the backward long-short term memory model module to perform sentence-level sequence marking. In order to solve the problem that the label sequence output from the bidirectional long and short term memory model is possibly invalid, a conditional random field module is connected to the hidden layer output of the bidirectional long and short term memory model, the label sequence output by the bidirectional long and short term memory model is jointly decoded, and sentence-level sequence labeling is carried out instead of decoding each label independently.

Further, the processing in the judicial entity recognition model comprises the steps of:

searching a character vector corresponding to each character in an input text sequence by a sequence input module, and inputting the searched character vector sequence into a forward long and short term memory model module and a backward long and short term memory model module;

respectively obtaining hidden layer coding representation of the character vector through a forward long-short term memory model module and a backward long-short term memory model module;

distributing marks for each character through a conditional random field module, and calculating two types of scores;

the output marker sequence is the highest-scoring sequence.

Further, the forward long-short term memory model module and the backward long-short term memory model module have the same structure and comprise three gate structures using sigmod as an activation function and a cell state unit, wherein the three gate structures are an input gate, a forgetting gate and an output gate respectively; the working process comprises the following steps:

f_t＝σ(W_f[h_t-1，x_t]+b_f)；

i_t＝σ(W_i[h_t-1，x_t]+b_i)；

O_t＝σ(W_o[h_t-1，x_t]+b_o)；

h_t＝O_t*tanh(C_t)；

wherein the input of the current time is x_t(ii) a Hidden layer state at the previous moment is h_t-1(ii) a Hidden layer state at current moment is h_t(ii) a The temporary cell state is

The cell state at the present time is C_t(ii) a Last minute cell state is C_t-1；

The forgetting gate is used for selecting information to be forgotten, and the input of the forgetting gate is h_t-1And x_tThe output is the value f of the forgetting gate_t(ii) a Calculating the cell state at the current moment, and inputting a value i_t、f_t、

And C_t-1The output is the cell state C at the current moment_t(ii) a Calculating the hidden layer state of the output gate and the current time, wherein the input is h_t-1、x_tAnd C_tOutput as value O of output gate_tAnd hidden layer state h_t；

Finally, the sentence is obtainedHidden layer state sequence with same sub-length { h }₀，h₁…h_t-1}。

Further, the conditional random field module is configured to calculate a joint probability for the entire sequence; the parameterized form of the conditional random field module is defined as follows:

in the formula, t_k、δ_lIs a characteristic function, λ_k、μ_lAre corresponding weights, Z_xIs a specification factor;

wherein; z (x) Σ_yexp(∑_i，kλ_kt_k(y_i-1，y_i，x，i)+∑_i，jμ_lδ_l(y_i，x，i))；

Obtaining the conditional probability of the output sequence y according to the input sequence x by the formula; t is t_kThe feature function defined on the edge is called transfer feature, and whether the feature is matched or not is judged by depending on the current word and the previous word, and the feature function is determined by the current position and the previous position; delta_lIs a characteristic function defined on the node, called as a state characteristic, and is determined by the current position; typically, the value of the characteristic function is 1 or 0; taking 1 when the condition is met, and taking 0 when the condition is not met; the output result of the conditional random field module is completely composed of a characteristic function t_k、δ_lAnd λ_k、μ_lAnd (6) determining.

Further, the conditional random field module may solve the problem that the predicted tag sequence based on the neural network method may not be valid by learning some constraints from the training samples to ensure that the finally predicted entity tag sequence is valid. In the loss function of the conditional random field module, the sequence with the largest output score is a label prediction sequence, and if a given sequence X is provided and the sequence marking result is y, the score is defined as:

wherein, P is an initial scoring matrix obtained by the linear operation of the hidden layer output of the bidirectional long and short term memory model, and A is a conversion scoring matrix; a. the_i，jProbability of a tag following a tag being a tag, P_i，jIs the word W_iProbability of mapping to a label;

and calculating the score of the output label sequence y corresponding to the input sequence X, wherein the final predicted label sequence is the sequence with the highest score.

Further, the data set includes training samples, validation samples, and test samples; verifying the accuracy of the model by the verification sample;

standardizing the format of the judicial text, and removing blank spaces; and then, marking the text into a BIO word tag form by using a corpus labeling tool as the input of the model, wherein the tag form comprises 3 types of entity categories and 7 types of word tags, and the 3 types of entity categories are criminal names, places and judicial units.

Further, optimizing the judicial entity recognition model through an optimizer, wherein the optimizer adopts adaptive moment estimation to optimize a recognition result; the method has the capability of calculating the self-adaptive learning rate of different parameters, low memory requirement and high calculation efficiency, and is suitable for larger-scale data sets.

The beneficial effects of the technical scheme are as follows:

the recognition model established in the judicial entity recognition method provided by the invention consists of a bidirectional long-short term memory model and a conditional random field module, and the characteristics of the deep learning-based method can be reserved based on the recognition network model. Long-distance context characteristics can be obtained, more information can be obtained, and the identification precision and range are improved; the problem that the prediction label sequence is invalid in judicial identification by the deep learning method is solved, and the effectiveness and reliability of identification are ensured.

Drawings

FIG. 1 is a schematic flow chart of a judicial entity recognition method based on improved deep learning according to the present invention;

FIG. 2 is a topology structure diagram of a judicial entity recognition model in an embodiment of the present invention;

FIG. 3 is a diagram of a topology structure of a long-term and short-term memory model module according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described with reference to the accompanying drawings.

In this embodiment, referring to fig. 1, the present invention provides a judicial entity recognition method based on improved deep learning, including;

As an optimization scheme of the above embodiment, in the process of performing specification processing and marking on a text format, a space removal processing is performed on a text first, and then the text is marked to obtain a text sequence.

As an optimization scheme of the above embodiment, as shown in fig. 2, the judicial entity recognition model is a bidirectional long-short term memory model with a conditional random field, and the bidirectional long-short term memory model with the conditional random field includes a sequence input module, a forward long-short term memory model module, a backward long-short term memory model module and a conditional random field module, and the sequence input module, the forward long-short term memory model module, the backward long-short term memory model module and the conditional random field module are connected in sequence.

The forward long-short term memory model module extracts past features, and the backward long-short term memory model module extracts future features; performing long-short term memory feature extraction on the same sequence from left to right, and performing long-short term memory feature extraction from right to left to obtain a tag sequence of the bidirectional semantic information; the problem of length dependence of a traditional deep learning method is solved, and context feature information with any length can be obtained; information changed to a cell state by a door mechanism is utilized to maintain the durability of information transmission, so that long-distance context characteristics can be learned; the information from front to back can be coded, the information from back to front can also be coded, bidirectional semantic information can be obtained, and the identification effectiveness is improved.

Wherein the processing in the judicial entity recognition model comprises the steps of:

the output marker sequence is the highest-scoring sequence.

As shown in fig. 3, the forward long-short term memory model module and the backward long-short term memory model module have the same structure, and include three gate structures using sigmod as an activation function and a cell state unit, where the three gate structures are an input gate, a forgetting gate and an output gate; the working process comprises the following steps:

f_t＝σ(W_f[h_t-1，x_t]+b_f)；

i_t＝σ(W_i[h_t-1，x_t]+b_i)；

O_t＝σ(W_o[h_t-1，x_t]+b_o)；

h_t＝O_t*tanh(C_t)；

Finally, a hidden layer state sequence { h) with the same length as the sentence is obtained₀，h₁…h_t-1}。

The conditional random field module is used for calculating the joint probability of the whole sequence; the parameterized form of the conditional random field module is defined as follows:

wherein: z (x) Σ_yexp(∑_i，kλ_kt_k(y_i-1，y_i，x，i)+∑_i，jμ_lδ_l(y_i，x，i))；

The conditional random field module may solve the problem that a predicted tag sequence based on a neural network approach may not be valid by learning some constraints from the training samples to ensure that the final predicted entity tag sequence is valid. In the loss function of the conditional random field module, the sequence with the largest output score is a label prediction sequence, and if a given sequence X is provided and the sequence marking result is y, the score is defined as:

As an optimization scheme of the above embodiment, the data set includes a training sample, a verification sample and a test sample, and the accuracy of the model is verified by the verification sample;

As an optimization scheme of the above embodiment, the judicial entity recognition model is optimized by an optimizer, and the optimizer estimates an optimized recognition result by using a self-adaptive moment; the method has the capability of calculating the self-adaptive learning rate of different parameters, low memory requirement and high calculation efficiency, and is suitable for larger-scale data sets.

In the specific implementation process, the following embodiments are used for illustration:

the experimental data set of the invention is from 500 referee documents downloaded from a referee document network, mainly comprising referee documents of three cases of a prisoner-reducing case, a parole case and a temporary supervision case, wherein 300 referee documents are taken as training samples, 100 are taken as verification samples and 100 are taken as test samples. Firstly, 500 referee documents are normalized in format, blank spaces are removed, and then the referee documents are marked into a BIO character label form by using a corpus marking tool YDEEA with the help of a legal expert as the input of a model so as to reduce manual participation. Herein, 3 types of entity categories (criminal name, place, judicial unit) and 7 types of word labels are defined, as shown in table 1.

TABLE 1 BIO word tag categories

For the purpose of evaluating the model herein, accuracy (precision), recall (recall) and F1 value (F-measure) are used as evaluation indexes herein. The calculation formula of the evaluation index is as follows:

in the model provided by the invention, a data set is trained, and a plurality of evaluation indexes obtain better results, wherein the accuracy rate is 0.863, the recall rate is 0.837, and the F1 value is 0.848.

As shown in Table 2, the recognition results obtained by the optimizer through optimization by using adaptive moment estimation are better than those obtained by other optimizers, and the accuracy, the recall rate and the F1 value are obviously higher than those obtained by other optimizers.

TABLE 2 comparison of evaluation indices of different optimizers under a dataset

The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A judicial entity recognition method based on improved deep learning is characterized by comprising the following steps of;

2. The judicial entity recognition method based on improved deep learning of claim 1, wherein in the process of carrying out standard processing and marking on the text format, the text is firstly subjected to de-space processing, and then is marked to obtain a text sequence.

3. The judicial entity recognition method based on the improved deep learning of claim 2, wherein the judicial entity recognition model is a bidirectional long-short term memory model with conditional random fields, and the bidirectional long-short term memory model with conditional random fields comprises a sequence input module, a forward long-short term memory model module, a backward long-short term memory model module and a conditional random field module, and the sequence input module, the forward long-short term memory model module, the backward long-short term memory model module and the conditional random field module are connected in sequence.

4. The judicial entity recognition method based on improved deep learning of claim 3, wherein the forward long-short term memory model module extracts past features and the backward long-short term memory model module extracts future features; performing long-short term memory feature extraction on the same sequence from left to right, and performing long-short term memory feature extraction from right to left to obtain a tag sequence of the bidirectional semantic information;

and the conditional random field module is connected to the hidden layer output of the backward long-short term memory model module, and jointly decodes the tag sequence output by the backward long-short term memory model module to perform sentence-level sequence marking.

5. The judicial entity recognition method based on improved deep learning of claim 4, wherein the processing procedure in the judicial entity recognition model comprises the steps of:

the output marker sequence is the highest-scoring sequence.

6. The judicial entity recognition method based on advanced learning of any one of claims 2-5, wherein the forward long-short term memory model module and the backward long-short term memory model module have the same structure, and comprise three gate structures with sigmod as an activation function and a cell state unit, wherein the three gate structures are an input gate, a forgetting gate and an output gate; the working process comprises the following steps:

f_t＝σ(W_f[h_t-1，x_t]+b_f)；

i_t＝σ(W_i[h_t-1，x_t]+b_i)；

O_t＝σ(W_O[h_t-1，x_t]+b_O)；

h_t＝O_t*tanh(C_t)；

7. The method of claim 5 in which the conditional random field module is configured to calculate the joint probability of the entire sequence and the parameterized form of the conditional random field module is defined as follows:

in the formula, t_k、δ_lIs a characteristic function, λ_k、μ_lAre corresponding weights, Z_xIs a specification factor; wherein: z (x) Σ_yexp(∑_i，kλ_kt_k(y_i-1，y_i，x，i)+∑_i，jμ_lδ_l(y_i，x，i))；

Through the upper partObtaining the conditional probability of an output sequence y according to an input sequence x; t is t_kThe feature function defined on the edge is called transfer feature, and whether the feature is matched or not is judged by depending on the current word and the previous word, and the feature function is determined by the current position and the previous position; delta_lIs a characteristic function defined on the node, called as a state characteristic, and is determined by the current position; typically, the value of the characteristic function is 1 or 0; taking 1 when the condition is met, and taking 0 when the condition is not met; the output result of the conditional random field module is completely composed of a characteristic function t_k、δ_lAnd λ_k、μ_lAnd (6) determining.

8. The method as claimed in claim 6, wherein in the loss function of the conditional random field module, the sequence with the largest output score is a label prediction sequence, and assuming that the sequence is given by X and the sequence labeling result is y, the score is defined as:

9. The judicial entity recognition method based on improved deep learning of claim 2, wherein the data set comprises training samples, verification samples and test samples;

10. The judicial entity recognition method based on improved deep learning of claim 1, wherein the judicial entity recognition model is optimized by an optimizer, and the optimizer estimates the optimized recognition result by using adaptive moments.