CN114091450A

CN114091450A - Judicial domain relation extraction method and system based on graph convolution network

Info

Publication number: CN114091450A
Application number: CN202111374051.7A
Authority: CN
Inventors: 刘奇; 施健伟; 潘付军
Original assignee: Nanjing Tongdahai Technology Co ltd
Current assignee: Nanjing Tongdahai Technology Co ltd
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-02-25
Anticipated expiration: 2041-11-19
Also published as: CN114091450B

Abstract

The invention discloses a judicial domain relation extraction method and system based on a graph convolution network, and mainly solves the problem that lengthy texts with complex structures in the judicial domain are difficult to extract information. The method introduces the dependency syntax information of the text, models the dependency syntax information by using a graph convolution network, and finally obtains the feature vectors of the relation classification by using an attention mechanism. Firstly, acquiring a large number of unstructured text corpora in the judicial field, preprocessing the texts to obtain fact description parts closely related to cases, and further performing word segmentation, part of speech analysis, dependency syntax analysis and label tagging on the texts to construct a judicial case special data set; secondly, encoding the text information and the corresponding dependency syntax tree, and inputting the encoded text information and the corresponding dependency syntax tree into a graph convolution network for feature extraction; then, the parameters of the model are updated iteratively by using a back propagation algorithm, so that the extracted model achieves the best performance; and finally, using the trained extraction model for relation extraction of the unstructured case text in the judicial field, and automatically completing extraction of entity triples.

Description

Judicial domain relation extraction method and system based on graph convolution network

Technical Field

The invention relates to a judicial domain relation extraction method and system based on a graph convolution network, and belongs to the technical field of text information extraction.

Background

In recent years, artificial intelligence technology has received great attention and has rapidly developed, and with the advent of AlphaGO, artificial intelligence has been pushed to the view of the general public. The goal of developing artificial intelligence technology is to be put into practical use in human production and activities, thus benefiting the general population. Information Extraction (IE) is a technique that frees up human resources, and aims to automatically and efficiently extract specific and valuable Information from semi-structured and unstructured text and structured data and store the Information in a reasonable structure on a storage medium. Information Extraction includes Named Entity identification (NER), Relationship Extraction (RE), and Event Extraction (Event Extraction).

The relation extraction is used as a subtask of information extraction, aims to extract semantic relation between two entities from unstructured text, and is an important upstream subtask of knowledge graph construction and a knowledge question-answering system. Currently, the development directions of relationship extraction mainly include two types, namely open-domain oriented and specific-domain oriented, and the relationship extraction oriented to the specific domain is the current application and development hot spot and comprises the fields of medical treatment, finance, judicial law and the like.

The number of electronic files related to judicial cases generated in China every year is huge, and the electronic files are astronomical numbers along with the accumulation of time. Most of these e-files are stored in the form of semi-structured and unstructured texts, and the types of texts are many, so that it is time-consuming and laborious for a skilled practitioner to select and extract the required information from the texts.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems and the defects in the prior art, the invention provides a judicial domain relation extraction method and system based on a graph convolution network. The relation extraction is applied to the judicial field texts, so that the automatic information extraction of the judicial text data can be realized, the working efficiency of judicial practitioners is further improved, and a solid foundation is laid for the subsequent construction and application of the knowledge maps in the judicial field.

The technical scheme is as follows: a judicial domain relation extraction method based on a graph convolution network comprises the following steps:

step 1) obtaining unstructured text corpora in the judicial field, preprocessing the text to obtain a fact description part related to a case, and then performing word segmentation, part of speech analysis, dependency syntax analysis and label tagging on the text to construct a judicial case special data set.

And 2) coding the sentence samples in the data set constructed in the last step and the dependency syntax trees corresponding to the sentence samples, and inputting the coded sentence samples into a relation extraction model based on a graph convolution network for feature extraction.

And 3) utilizing a back propagation algorithm to iteratively update the parameters of the relation extraction model based on the graph convolution network, so that the relation extraction model achieves the best performance. The optimal performance means that the evaluation index is not continuously improved along with the increase of the training turns, and the model reaches the optimal performance at this time.

And 4) applying the trained relation extraction model based on the graph convolution network to relation extraction of unstructured case texts in the judicial field, and automatically completing extraction of entity triples.

Wherein, the construction of the judicial case-specific data set of the step 1) comprises the following processes:

1-1) crawling a large amount of unstructured text data from a Chinese referee document network, wherein the unstructured text data are mainly first-aid judgment documents of three types of litigation cases, namely civil affairs, criminals and administration, and the facts influencing case judgment are usually concentrated in a few paragraphs in the judgment documents, and the paragraphs accurately describe the complete picture of the cases and are important sources for extracting information. Extracting case identification fact parts from text data by a rule-based method, namely a method for matching keywords and a text structure and taking paragraphs as granularity;

1-2) carrying out sentence segmentation and word segmentation on the case identification fact part text data obtained in the step 1-1) by means of the existing tool, and then carrying out entity labeling and relation labeling on the text data after word segmentation by taking sentences as units;

1-3) filtering the sentences subjected to entity labeling and relationship labeling in the step 1-2), only reserving the sentences containing entity pairs, and then performing part-of-speech analysis and dependency syntax analysis on the sentence texts to finally form the special data set for the judicial case.

Before training, the relation extraction model based on the graph convolution network describes a relation extraction model, encodes text sentences in a data set and dependency syntax trees corresponding to the sentences, and inputs the encoded text sentences and the dependency syntax trees into the relation extraction model based on the graph convolution network for feature extraction; refers to:

text sentences are taken out from the special data set of the judicial case, the text sentences are converted into real value vector sequences, the sentences and the dependency syntax tree are coded by using graph convolution operation, and finally the feature vectors for relation classification are obtained by extracting the output of the multi-layer GCN network by using an attention mechanism. The method specifically comprises the following steps:

2-1) converting words in the text sentences into real-value vectors by using a static word embedding matrix, coding the part of speech of the words by using a randomly initialized embedding matrix, splicing the word vectors of the words and the corresponding part of speech vectors to obtain initial characteristic vectors of the words, wherein one sentence corresponds to one vector sequence.

2-2) the initial characterization vectors of the words are independently coded and lack context information, and Bi-LSTM (bidirectional long-short term memory network) is adopted to model the sentences with the relation to be extracted. Specifically, an initial characterization vector sequence of a sentence is input into a two-layer LSTM network in a positive sequence and a reverse sequence, and then feature vectors extracted from two layers of LSTMs in different directions are spliced, so that the characterization vectors of words can contain context information of the sentence.

2-3) modeling by using graph convolution network, and taking the output of Bi-LSTM as the initial input H of graph convolution network⁽⁰⁾The word vector representation of the word is extracted using a graph convolution operation. Particularly, the graph convolution operation also depends on the dependency syntax information of the text sentence, and the adjacent matrix A corresponding to the dependency syntax tree is used as an important auxiliary information to help the graph convolution network to encode the text information, so that the modulus can be improvedAnd the ability to extract complex textual information. If there is dependency relationship between node i and node j, corresponding to element A in the adjacency matrix _i,j1 and A _j,i1, otherwise A_i,j0 and A_j,iIn order to transfer the characteristics of the nodes into the vectors of the next hidden layer, 0, a self-rotation edge, namely A, is added to each node_k,kAnd k is any term node 1.

The process of a graph convolution operation can be expressed as the formula:

H^(l)＝f(H^(l-1),A)

wherein H^(l)Represents the output of the first layer of the GCN, H^(l-1)Representing the input of the l-th layer of the GCN and the output of the l-1 layer, A being the adjacent matrix corresponding to the dependency syntax tree, in particular, H⁰Represents the initial input of the GCN;

in a more detailed process, the computation formula of the t-th hidden vector of the l-th layer in the GCN is as follows:

wherein, A_t,iValues, W, representing elements of the ith column of the t-th row in the adjacency matrix A^(l)Weight coefficient representing the l-th layer, c_tIs the number of words having a dependency relationship with the t-th word, b^(l)Is a bias term and σ (-) is a nonlinear activation function.

The complete dependency syntax tree contains rich structural information, wherein the dependency information is closely related to the relationship classification, and the dependency information is not related to the relationship classification task. By pruning the original dependency syntax tree, the dependency edges relevant to classification are reserved, and meanwhile, the irrelevant dependency edges are pruned, so that the anti-interference capability of the GCN can be enhanced. Firstly, according to the positions of two entities in the dependency syntax tree, the lowest Common ancestor node of the two entities is determined, and a subtree taking the lowest Common ancestor node as a root node is taken as a preliminary pruning result, and the subtree can be called an LCA (Lowest Common indicator) tree. Then, the Shortest dependent Path (SDP for short) of the two entities is determined in the LCA tree, and is extended outwards based on the SDP Path, and the node on the LCA tree with the distance less than or equal to D hops from the SDP Path is used as the final pruning result, and D is an adjustable parameter.

In order to avoid some important information which may be ignored by the rule-based pruning method, the computation of the graph convolution expression vector is performed on the dependency syntax trees before and after pruning, and then the memory of the pruned dependency relationship is adjusted by means of weight assignment:

h＝βh_full+(1-β)h_pruning

where β represents the memory coefficient for "dependence on pruned", the sum of the weight coefficients of the two intermediate hidden layer vectors being 1, h_fullRepresenting features extracted by convolution using a graph on the dependency syntax tree before pruning, h_pruningFeatures extracted using graph convolution on the pruned dependency syntax tree are represented.

2-4) graph convolutional layers typically have multiple layers of GCN overlays, the output of each layer of GCN containing information for relational classification of pairs of entities in text. And extracting semantic information for relation classification in each layer of GCN by adopting an attention mechanism. Firstly, using maximum pooling operation to calculate and obtain a sentence characterization vector of each layer of GCN, then calculating distributed attention weight, and finally, adopting weighted average sum to obtain final vector representation;

e_i＝w_h ^T·γⁱ

wherein gamma isⁱRepresenting the output vector of the i-th GCN after maximum pooling, e_iIndicates the correlation degree value, alpha, between the output vector of the i-th layer GCN and the output vector of the graph convolution layer_iOutput vector occupation map representing i-th layer GCNThe weight of the convolutional layer output vector, K represents the total number of layers of the GCN,

representing the attention query vector and r the output vector of the map convolution layer.

2-5) mapping the output of the previous layer to an output layer through nonlinear transformation, and then calculating the probability distribution of each relation type by adopting a softmax function:

o＝σ(W_dr+b_d)

wherein, W_dRepresenting a discriminant transformation matrix, b_dIs a bias term vector, | R | represents the total number of relationship classes, o_rDenotes the r-th element of the output vector o, similarly to o_kRepresenting the kth element value.

Further, the parameters of the relation extraction model are optimized by adopting an optimization method of random gradient descent:

where θ generally refers to all parameters of the model, B represents a training batch containing a fixed number of sentence instances, h_iAnd t_iRespectively representing the head and tail entities of the ith sentence sample in a batch, and alpha represents the learning rate.

Furthermore, a large number of sentences without artificial labels still exist in the judicial case corpus, entity recognition is carried out by using a named entity recognition tool, then part-of-speech analysis and dependency syntax analysis are carried out, and finally the sentences and corresponding auxiliary information such as part-of-speech and dependency syntax trees are input into the trained relation extraction model to predict the relation types of entity pairs so as to obtain a large number of new fact triples.

A judicial domain relation extraction system based on graph convolution network comprises:

constructing a judicial case special data set module, acquiring unstructured text corpora in the judicial field, preprocessing the text to acquire a fact description part related to the case, and further performing word segmentation, part of speech analysis, dependency syntax analysis and label tagging on the text to construct a judicial case special data set;

the characteristic extraction module is used for coding sentence samples in the data set and dependency syntax trees corresponding to the sentence samples, and inputting the coded sentence samples into a relation extraction model based on a graph convolution network for characteristic extraction;

the relation extraction model training module based on the graph convolution network carries out iterative updating on the parameters of the relation extraction model based on the graph convolution network by utilizing a back propagation algorithm so as to enable the relation extraction model to achieve the best performance;

and the relation extraction module is used for extracting the relation of the non-structured case text in the judicial field by using the trained relation extraction model based on the graph convolution network, and automatically finishing the extraction of the entity triples.

A computer device, the computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the judicial domain relationship extraction method based on graph convolution network as described above is implemented.

A computer-readable storage medium storing a computer program for executing the graph-convolution network-based judicial domain relationship extraction method as described above.

Compared with the prior art, the invention has the advantages that:

1) the relation extraction method introduces additional information, such as part of speech information and a dependency syntax tree, to enrich feature representation of the text, and performs graph convolution coding on the text according to the dependency syntax tree, so that the capability of feature extraction on the lengthy and complex text can be improved.

2) The relation extraction method uses Bi-LSTM to preliminarily extract text information, can enrich context semantics of words and can relieve noise influence caused by wrong syntactic dependency analysis, the wrong syntactic dependency analysis is inevitably introduced by the existing dependency syntactic analysis tool, error propagation is caused by graph convolution on the basis, each word contains context information through the Bi-LSTM, and the strong dependency of graph convolution network on a dependency syntactic tree is relieved.

3) The relation extraction method has stronger robustness, and benefits from a pruning strategy taking SDP as a center and a method for weight assignment. The rule-based pruning strategy can keep most of the core dependency relationships related to the relationship classification, but the hard pruning method always omits some important information. The memory coefficient is added to the pruned information through weight assignment, so that the model can always extract the core structure information from the dependency syntax tree, and other important information cannot be lost, so that the model has stronger robustness.

4) The described relationship extraction method introduces an attention mechanism to further screen the output of the multi-layer graph convolutional network for features used for entity relationship classification. Compared with the method of only using the output of the last layer of graph convolution in the traditional method, the method can solve the problem of selecting the optimal layer number of the graph convolution network. Because the optimal number of layers of the graph convolution network is different for different sentence texts, and the model only has one determined graph convolution layer number, the attention mechanism can calculate the correlation between each layer of output and the entity pair relationship, assign a larger weight to the graph convolution sub-layer output with high correlation, and assign a smaller weight to the graph convolution sub-layer output with low correlation, thereby realizing the dynamic screening of the graph convolution output characteristics.

Drawings

FIG. 1 is a flow chart of a method of an embodiment of the present invention;

FIG. 2 is a block diagram of a relational extraction model according to an embodiment of the invention;

FIG. 3 is a diagram illustrating a convolutional layer structure of a relational extraction model according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an attention extracting layer of a relationship extraction model according to an embodiment of the present invention.

Detailed Description

The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.

The first embodiment is as follows: the text corpus used in the embodiment is mainly an examination judgment book of 'property insurance contract dispute' in a civil case.

As shown in fig. 1, a judicial domain relationship extraction method based on graph convolution network includes the following steps:

step 1) acquiring a large number of unstructured text corpora in the judicial field, preprocessing the texts to obtain fact description parts closely related to cases, and then performing word segmentation, part of speech analysis, dependency syntax analysis and label tagging on the texts to construct a judicial case special data set. The specific process is as follows:

1-1) crawling a large amount of unstructured text data from a Chinese referee document network, wherein the unstructured text data are mainly primary check judgment documents of case reasons of 'property insurance contract disputes', the fact that case judgment is usually influenced is concentrated in a few paragraphs in the judgment documents, and the paragraphs accurately describe the full appearance of the cases and are important sources for extracting information. Extracting case identification fact parts from text data by a rule-based method, namely a method for matching keywords and a text structure and taking paragraphs as granularity;

1-2) segmenting paragraphs of the fact part in 1-1) to obtain independent sentences, then segmenting words of the text by taking the sentences as units by means of the conventional word segmentation tool jieba, and then performing entity labeling and relationship labeling on the segmented text, wherein the entity types and relationship types predefined in the embodiment are as follows:

1-3) filtering the sentences obtained by processing in 1-2), discarding the sentences not containing entity pairs, only keeping the sentences containing entity pairs, then utilizing the existing tool jieba to analyze the part of speech of the filtered sentences, and adopting DDParser (Baidu Dependency Parser) developed by Baidu team for Dependency syntax analysis, and finally forming the special data set used in the embodiment, wherein each sample in the special data set contains information such as complete sentence text, head and tail entity positions, part of speech, Dependency syntax tree and the like.

For example, the sentence "zhangwei driving sue sedan insures motor vehicle loss at the branch of the suzhou city of human insurance. "whose segmentation and part-of-speech analysis results are [ (" zhanwei ", PER), (" drive ", v), (" u ", u), (" threo car ", n), (" on ", p), (" ORG), ("application", v), ("motor vehicle loss insurance", n), ("log", w) ]. After entity labeling, the corresponding labeling sequence is [ "Natural _ person", "O", "Property", "O", "instrument _ company", "O", "instrument", "O" ], wherein the relationship type between the two entities of the sue sedan car and the motor vehicle loss Insurance is "Insurance". The dependency syntax tree structure corresponding to the sentence sequence is [ ('SBV', 2), ('ATT', 4), ('MT', 2), ('SBV', 7), ('MT', 6), ('ADV', 7), ('HED', 0), ('VOB', 7), ('MT', 7) ].

And 2) encoding the text information and the corresponding dependency syntax tree, and inputting the encoded text information and the corresponding dependency syntax tree into a relation extraction model based on a graph convolution network for feature extraction. The simple structure of the relational extraction model is shown in fig. 2, which is a pipelined 5-layer structure in which data is executed in the order of an embedding layer, a contextualization layer, a graph convolution layer, an attention extraction layer, and an output layer. The embedding layer is used for completing conversion from text to vectors, the contextualization layer enables word vectors to contain more context semantics, the graph convolution layer extracts the characteristics of sentences under the assistance of the dependency syntax tree, the attention extraction layer dynamically aggregates sentence characterization vectors from the multi-layer GCN, and finally the judgment of relationship types is completed at the output layer.

The function performed by each layer:

2-1) taking out sentence samples from the judicial case special data set, converting words in the sentences into real-value vectors by using a static word embedding matrix, encoding the parts of speech of the words by using a randomly initialized embedding matrix, splicing the word vectors of the words and the corresponding parts of speech vectors to obtain initial characteristic vectors of the words, wherein one sentence corresponds to one vector sequence.

For example, we use a word embedding matrix with a single word vector of 300 dimensions, and then apply the example sentence "zhangwei driving sue sedan to secure motor vehicle loss insurance at the branch of suzhou peoples. "convert to a 9 x 300 sentence matrix. The part of speech of the word "zhanwei" is "PER", and the corresponding part of speech embedding vector is [0.1156, -0.3487, -0.0861,0.1310, -0.9013,0.5357,0.0125,0.1414, -0.8653, -0.0487 ]. If part-of-speech features are also added to the sentence representation, the sentence is finally represented as a 9 x 310 matrix.

2-2) the initial characterization vectors of the words are independently coded and lack context information, and Bi-LSTM (bidirectional long-short term memory network) is adopted to model the sentences with the relation to be extracted. Specifically, an initial token vector sequence of a sentence is input into a two-layer LSTM network in a positive sequence and a reverse sequence, and then feature vectors extracted by two layers of LSTMs in different directions are spliced, so that the token vector of a word can contain context information of the sentence.

2-3) modeling by using the graph convolution network, and taking the output of the previous layer as the initial input H of the graph convolution network with the structure as shown in FIG. 3⁽⁰⁾The word vector representation of the word is extracted using a graph convolution operation. Particularly, the graph convolution operation also depends on the dependency syntax information of the text sentence, and the adjacent matrix A corresponding to the dependency syntax tree is used as an important auxiliary information to help the graph convolution network to encode the text information, so that the capability of extracting complex text information by the model can be improved.If there is dependency relationship between node i and node j, corresponding to element A in the adjacency matrix _i,j1 and A _j,i1, otherwise A_i,j0 and A_j,iIn order to transfer the characteristics of the nodes into the vectors of the next hidden layer, 0, a self-rotation edge, namely A, is added to each node_k,kAnd k is any term node 1.

The process of a graph convolution operation can be expressed as the formula:

H^(l)＝f(H^(l-1),A)

in more detail, taking the t-th hidden layer vector of the l-th layer in the GCN as an example, the calculation formula is as follows:

wherein A is_t,iValues, W, representing elements of the ith column of the t-th row in the adjacency matrix A^(l)Represents the weight coefficient of the l-th layer, c_tIs the number of words having a dependency relationship with the t-th word, b^(l)Is a bias term and σ () is a nonlinear activation function.

The complete dependency syntax tree contains rich structural information, wherein the dependency information is closely related to the relationship classification, and the dependency information is not related to the relationship classification task. By pruning the original dependency syntax tree, the dependency edges relevant to classification are reserved, and meanwhile, irrelevant dependency edges are pruned, so that the anti-interference capability of the GCN can be enhanced. Firstly, according to the positions of two entities in the dependency syntax tree, the lowest Common ancestor node of the two entities is determined, and a subtree taking the node as a root node is taken as a preliminary pruning result, and the subtree can be called as an LCA (Lowest Common processor) tree. Then, the Shortest Dependent Path (SDP) of the two entities is determined in the LCA tree, and extension is performed outwards based on the SDP Path, and a node on the LCA tree with a distance of less than or equal to D hops from the SDP Path is taken as a final pruning result, where D is an adjustable parameter.

For example, the dependency syntax tree of the above sentence is subjected to SDP route center pruning with D ═ 1, and the result after pruning is [ ('SBV', -), ('ATT', 4), ('MT', -), ('SBV', 7), ('MT', -), ('ADV', 7), ('HED', 0), ('VOB', 7), ('MT', 7) ], where "-" indicates a dependency relationship that is pruned away.

h＝βh_full+(1-β)h_pruning

where β represents the memory coefficient for "dependence on pruned", the sum of the weight coefficients of the two intermediate hidden layer vectors being 1, h_fullRepresenting features extracted using graph convolution on the dependency syntax tree before pruning, h_pruningFeatures extracted using graph convolution on the pruned dependency syntax tree are represented.

2-4) graph convolutional layers typically have multiple layers of GCN overlays, the output of each layer of GCN containing information for relational classification of pairs of entities in text. By adopting an attention mechanism, the structure of the mechanism is shown in FIG. 4, and important semantic information for entity relationship classification in each layer of GCN is extracted. Firstly, using maximum pooling operation to calculate and obtain a sentence characterization vector of each layer of GCN, then calculating distributed attention weight, and finally obtaining final vector representation by adopting a weighted average mode;

e_i＝w_h ^T·γⁱ

wherein gamma isⁱRepresents the output vector of the ith layer GCN after the maximum pooling operation, K represents the total number of layers of the GCN,

representing an attention query vector, is a parameter that can be trained.

2-5) mapping the output of the previous layer to an output layer through nonlinear transformation, and then calculating the probability distribution of each relationship type by adopting a softmax function:

o＝σ(W_dr+b_d)

where | R | represents the total number of relationship categories, with | R | having a size of 6 in this embodiment, W_dRepresenting a discriminant transformation matrix, b_dIs a vector of bias terms, o_rDenotes the r-th element of the output vector o, similarly to o_kRepresenting the value of the kth element.

For example, the prediction sentence "zhangwei driving sue sedan insures motor vehicle loss at the branch of the suzhou city of human insurance. The final relationship classification probability distribution of the relationship between the two entities of the 'Zhongsu' sedan car and the 'loss insurance for motor vehicles' is [0.763,0.027,0.075,0.034,0.089,0.012], which respectively represents probability values of the above 6 relationship types, wherein the probability of the 'insuring' relationship is the maximum, so that the relationship extraction model extracts the 'insuring' type for the two entities.

And 3) iteratively updating the parameters of the model by using a back propagation algorithm to obtain a relation extraction model with optimal performance.

And (3) optimizing the parameters of the relation extraction model by adopting an optimization method of random gradient descent:

and 4) using the trained extraction model for relation extraction of unstructured case texts in the judicial field, inputting unlabeled sentences, performing entity identification by using a named entity identification tool, then performing part-of-speech analysis and dependency syntax analysis, and finally inputting the sentences and corresponding auxiliary information such as part-of-speech and dependency syntax trees into the trained model to predict the relation types of entity pairs to obtain new fact triples.

the characteristic extraction module is used for coding sentence samples in the data set and the dependency syntax trees corresponding to the sentence samples and inputting the coded sentence samples into a relation extraction model based on a graph volume network for characteristic extraction;

The implementation of the modules in the system is partially the same as the specific implementation of the method.

It should be apparent to those skilled in the art that the steps of the graph convolution network based judicial domain relationship extraction method or the modules of the graph convolution network based judicial domain relationship extraction system of the embodiment of the present invention described above can be implemented by a general purpose computing device, they can be centralized on a single computing device or distributed on a network composed of a plurality of computing devices, and they can be alternatively implemented by program codes executable by the computing devices, so that they can be stored in a storage device and executed by the computing devices, and in some cases, the steps shown or described can be executed in a different order from that here, or they can be respectively made into various integrated circuit modules, or a plurality of modules or steps therein can be made into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

Claims

1. A judicial domain relation extraction method based on a graph convolution network is characterized by comprising the following steps:

step 1) acquiring unstructured text corpora in the judicial field, performing text preprocessing to acquire a fact description part related to a case, and further performing word segmentation, part of speech analysis, dependency syntactic analysis and label tagging on the text to construct a judicial case special data set;

step 2) coding sentence samples in the data set constructed in the last step and dependency syntax trees corresponding to the sentence samples, and inputting the coded sentence samples into a relation extraction model based on a graph convolution network for feature extraction;

step 3) utilizing a back propagation algorithm to iteratively update parameters of the relation extraction model based on the graph convolution network, so that the relation extraction model achieves the best performance;

2. The extraction method of judicial domain relations based on graph-convolution network as claimed in claim 1, wherein the construction of the judicial case specific data set of step 1) comprises the following processes:

1-1) extracting case identification fact parts from unstructured text data in the judicial field;

1-2) segmenting the text data of the case identification fact part obtained by the method, and then carrying out entity labeling and relation labeling on the segmented text data by taking a sentence as a unit;

1-3) filtering the sentences subjected to entity labeling and relation labeling in the step 1-2), only reserving the sentences containing entity pairs, and then performing part-of-speech analysis and dependency syntactic analysis on the sentence texts to finally form a judicial case special data set.

3. The method as claimed in claim 1, wherein the relation extraction model based on the convolutional network describes a relation extraction model before training, and encodes the text sentences in the data set and the dependency syntax trees corresponding to the sentences, and inputs the encoded text sentences and the dependency syntax trees into the relation extraction model based on the convolutional network for feature extraction.

4. The method of claim 1, wherein text sentences are extracted from the judicial case specific dataset and converted into real valued vector sequences, the sentences and the dependency syntax tree are encoded by using the graph convolution operation, and finally feature vectors for relation classification are obtained by using an attention mechanism to extract the output of the multi-layer GCN network; the method specifically comprises the following steps:

2-1) converting words in a text sentence into real-value vectors by using a static word embedding matrix, coding the part of speech of the words by using a randomly initialized embedding matrix, splicing the word vectors of the words and the corresponding part of speech vectors to obtain initial characterization vectors of the words, wherein one sentence corresponds to one vector sequence;

2-2) modeling the sentences of the relation to be extracted by adopting Bi-LSTM; specifically, an initial characterization vector sequence of a sentence is input into a two-layer LSTM network with a positive sequence and a reverse sequence, and then feature vectors extracted by the two layers of LSTMs in different directions are spliced, so that the characterization vectors of words can contain context information of the sentence;

2-3) modeling by using graph convolution network, and taking the output of Bi-LSTM as the initial input H of graph convolution network⁽⁰⁾Extracting word vector representations of the words using a graph convolution operation;

2-4) extracting semantic information for relation classification in each layer of GCN by adopting an attention mechanism; firstly, using maximum pooling operation to calculate and obtain a sentence characterization vector of each layer of GCN, then calculating distributed attention weight, and finally, adopting weighted average sum to obtain final vector representation;

2-5) mapping the output of the previous layer to an output layer through nonlinear transformation, and then calculating the probability distribution of each relationship type by adopting a softmax function.

5. The judicial domain relationship extraction method based on graph convolution network according to claim 1, characterized in that the parameters of the relationship extraction model are optimized by an optimization method of random gradient descent:

6. The method of claim 4, wherein the graph convolution operation further depends on the dependency syntax information of the text sentence, and the adjacency matrix A corresponding to the dependency syntax tree is used as an auxiliary information to assist the graph convolution network in encoding the text information.

7. The judicial domain relationship extraction method based on graph convolution network of claim 6, wherein the dependency syntax tree is pruned, the dependency edges related to classification are reserved, and the irrelevant dependency edges are pruned; firstly, determining the lowest common ancestor node of two entities according to the positions of the two entities in a dependency syntax tree, and taking a subtree taking the lowest common ancestor node as a root node as a preliminary pruning result, wherein the subtree can be called an LCA tree; then, determining the shortest dependent path of two entities in the LCA tree, abbreviated as SDP, expanding outwards based on the SDP path, and taking the node on the LCA tree which is less than or equal to D hops away from the SDP path as a final pruning result, wherein D is an adjustable parameter;

and respectively calculating graph convolution expression vectors on the dependency syntax trees before pruning and after pruning, and then adjusting the memory of the pruned dependency relationship in a weight assignment mode.

8. A judicial domain relationship extraction system based on graph convolution network is characterized by comprising:

constructing a data set module special for judicial cases, acquiring unstructured text corpora in the judicial field, preprocessing the texts to acquire fact description parts related to the cases, and further performing word segmentation, part of speech analysis, dependency syntactic analysis and label labeling on the texts to construct a data set special for the judicial cases;

9. A computer device, characterized by: the computer device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the judicial domain relationship extraction method based on graph volume network according to any one of claims 1-7.

10. A computer-readable storage medium characterized by: the computer readable storage medium stores a computer program for executing the graph volume network-based judicial domain relationship extraction method according to any one of claims 1 to 7.