CN114091450A - Judicial domain relation extraction method and system based on graph convolution network - Google Patents

Judicial domain relation extraction method and system based on graph convolution network Download PDF

Info

Publication number
CN114091450A
CN114091450A CN202111374051.7A CN202111374051A CN114091450A CN 114091450 A CN114091450 A CN 114091450A CN 202111374051 A CN202111374051 A CN 202111374051A CN 114091450 A CN114091450 A CN 114091450A
Authority
CN
China
Prior art keywords
judicial
graph convolution
text
relation
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111374051.7A
Other languages
Chinese (zh)
Other versions
CN114091450B (en
Inventor
刘奇
施健伟
潘付军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tongdahai Technology Co ltd
Original Assignee
Nanjing Tongdahai Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tongdahai Technology Co ltd filed Critical Nanjing Tongdahai Technology Co ltd
Priority to CN202111374051.7A priority Critical patent/CN114091450B/en
Publication of CN114091450A publication Critical patent/CN114091450A/en
Application granted granted Critical
Publication of CN114091450B publication Critical patent/CN114091450B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a judicial domain relation extraction method and system based on a graph convolution network, and mainly solves the problem that lengthy texts with complex structures in the judicial domain are difficult to extract information. The method introduces the dependency syntax information of the text, models the dependency syntax information by using a graph convolution network, and finally obtains the feature vectors of the relation classification by using an attention mechanism. Firstly, acquiring a large number of unstructured text corpora in the judicial field, preprocessing the texts to obtain fact description parts closely related to cases, and further performing word segmentation, part of speech analysis, dependency syntax analysis and label tagging on the texts to construct a judicial case special data set; secondly, encoding the text information and the corresponding dependency syntax tree, and inputting the encoded text information and the corresponding dependency syntax tree into a graph convolution network for feature extraction; then, the parameters of the model are updated iteratively by using a back propagation algorithm, so that the extracted model achieves the best performance; and finally, using the trained extraction model for relation extraction of the unstructured case text in the judicial field, and automatically completing extraction of entity triples.

Description

Judicial domain relation extraction method and system based on graph convolution network
Technical Field
The invention relates to a judicial domain relation extraction method and system based on a graph convolution network, and belongs to the technical field of text information extraction.
Background
In recent years, artificial intelligence technology has received great attention and has rapidly developed, and with the advent of AlphaGO, artificial intelligence has been pushed to the view of the general public. The goal of developing artificial intelligence technology is to be put into practical use in human production and activities, thus benefiting the general population. Information Extraction (IE) is a technique that frees up human resources, and aims to automatically and efficiently extract specific and valuable Information from semi-structured and unstructured text and structured data and store the Information in a reasonable structure on a storage medium. Information Extraction includes Named Entity identification (NER), Relationship Extraction (RE), and Event Extraction (Event Extraction).
The relation extraction is used as a subtask of information extraction, aims to extract semantic relation between two entities from unstructured text, and is an important upstream subtask of knowledge graph construction and a knowledge question-answering system. Currently, the development directions of relationship extraction mainly include two types, namely open-domain oriented and specific-domain oriented, and the relationship extraction oriented to the specific domain is the current application and development hot spot and comprises the fields of medical treatment, finance, judicial law and the like.
The number of electronic files related to judicial cases generated in China every year is huge, and the electronic files are astronomical numbers along with the accumulation of time. Most of these e-files are stored in the form of semi-structured and unstructured texts, and the types of texts are many, so that it is time-consuming and laborious for a skilled practitioner to select and extract the required information from the texts.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems and the defects in the prior art, the invention provides a judicial domain relation extraction method and system based on a graph convolution network. The relation extraction is applied to the judicial field texts, so that the automatic information extraction of the judicial text data can be realized, the working efficiency of judicial practitioners is further improved, and a solid foundation is laid for the subsequent construction and application of the knowledge maps in the judicial field.
The technical scheme is as follows: a judicial domain relation extraction method based on a graph convolution network comprises the following steps:
step 1) obtaining unstructured text corpora in the judicial field, preprocessing the text to obtain a fact description part related to a case, and then performing word segmentation, part of speech analysis, dependency syntax analysis and label tagging on the text to construct a judicial case special data set.
And 2) coding the sentence samples in the data set constructed in the last step and the dependency syntax trees corresponding to the sentence samples, and inputting the coded sentence samples into a relation extraction model based on a graph convolution network for feature extraction.
And 3) utilizing a back propagation algorithm to iteratively update the parameters of the relation extraction model based on the graph convolution network, so that the relation extraction model achieves the best performance. The optimal performance means that the evaluation index is not continuously improved along with the increase of the training turns, and the model reaches the optimal performance at this time.
And 4) applying the trained relation extraction model based on the graph convolution network to relation extraction of unstructured case texts in the judicial field, and automatically completing extraction of entity triples.
Wherein, the construction of the judicial case-specific data set of the step 1) comprises the following processes:
1-1) crawling a large amount of unstructured text data from a Chinese referee document network, wherein the unstructured text data are mainly first-aid judgment documents of three types of litigation cases, namely civil affairs, criminals and administration, and the facts influencing case judgment are usually concentrated in a few paragraphs in the judgment documents, and the paragraphs accurately describe the complete picture of the cases and are important sources for extracting information. Extracting case identification fact parts from text data by a rule-based method, namely a method for matching keywords and a text structure and taking paragraphs as granularity;
1-2) carrying out sentence segmentation and word segmentation on the case identification fact part text data obtained in the step 1-1) by means of the existing tool, and then carrying out entity labeling and relation labeling on the text data after word segmentation by taking sentences as units;
1-3) filtering the sentences subjected to entity labeling and relationship labeling in the step 1-2), only reserving the sentences containing entity pairs, and then performing part-of-speech analysis and dependency syntax analysis on the sentence texts to finally form the special data set for the judicial case.
Before training, the relation extraction model based on the graph convolution network describes a relation extraction model, encodes text sentences in a data set and dependency syntax trees corresponding to the sentences, and inputs the encoded text sentences and the dependency syntax trees into the relation extraction model based on the graph convolution network for feature extraction; refers to:
text sentences are taken out from the special data set of the judicial case, the text sentences are converted into real value vector sequences, the sentences and the dependency syntax tree are coded by using graph convolution operation, and finally the feature vectors for relation classification are obtained by extracting the output of the multi-layer GCN network by using an attention mechanism. The method specifically comprises the following steps:
2-1) converting words in the text sentences into real-value vectors by using a static word embedding matrix, coding the part of speech of the words by using a randomly initialized embedding matrix, splicing the word vectors of the words and the corresponding part of speech vectors to obtain initial characteristic vectors of the words, wherein one sentence corresponds to one vector sequence.
2-2) the initial characterization vectors of the words are independently coded and lack context information, and Bi-LSTM (bidirectional long-short term memory network) is adopted to model the sentences with the relation to be extracted. Specifically, an initial characterization vector sequence of a sentence is input into a two-layer LSTM network in a positive sequence and a reverse sequence, and then feature vectors extracted from two layers of LSTMs in different directions are spliced, so that the characterization vectors of words can contain context information of the sentence.
2-3) modeling by using graph convolution network, and taking the output of Bi-LSTM as the initial input H of graph convolution network(0)The word vector representation of the word is extracted using a graph convolution operation. Particularly, the graph convolution operation also depends on the dependency syntax information of the text sentence, and the adjacent matrix A corresponding to the dependency syntax tree is used as an important auxiliary information to help the graph convolution network to encode the text information, so that the modulus can be improvedAnd the ability to extract complex textual information. If there is dependency relationship between node i and node j, corresponding to element A in the adjacency matrix i,j1 and A j,i1, otherwise Ai,j0 and Aj,iIn order to transfer the characteristics of the nodes into the vectors of the next hidden layer, 0, a self-rotation edge, namely A, is added to each nodek,kAnd k is any term node 1.
The process of a graph convolution operation can be expressed as the formula:
H(l)=f(H(l-1),A)
wherein H(l)Represents the output of the first layer of the GCN, H(l-1)Representing the input of the l-th layer of the GCN and the output of the l-1 layer, A being the adjacent matrix corresponding to the dependency syntax tree, in particular, H0Represents the initial input of the GCN;
in a more detailed process, the computation formula of the t-th hidden vector of the l-th layer in the GCN is as follows:
Figure BDA0003363456440000031
wherein, At,iValues, W, representing elements of the ith column of the t-th row in the adjacency matrix A(l)Weight coefficient representing the l-th layer, ctIs the number of words having a dependency relationship with the t-th word, b(l)Is a bias term and σ (-) is a nonlinear activation function.
The complete dependency syntax tree contains rich structural information, wherein the dependency information is closely related to the relationship classification, and the dependency information is not related to the relationship classification task. By pruning the original dependency syntax tree, the dependency edges relevant to classification are reserved, and meanwhile, the irrelevant dependency edges are pruned, so that the anti-interference capability of the GCN can be enhanced. Firstly, according to the positions of two entities in the dependency syntax tree, the lowest Common ancestor node of the two entities is determined, and a subtree taking the lowest Common ancestor node as a root node is taken as a preliminary pruning result, and the subtree can be called an LCA (Lowest Common indicator) tree. Then, the Shortest dependent Path (SDP for short) of the two entities is determined in the LCA tree, and is extended outwards based on the SDP Path, and the node on the LCA tree with the distance less than or equal to D hops from the SDP Path is used as the final pruning result, and D is an adjustable parameter.
In order to avoid some important information which may be ignored by the rule-based pruning method, the computation of the graph convolution expression vector is performed on the dependency syntax trees before and after pruning, and then the memory of the pruned dependency relationship is adjusted by means of weight assignment:
h=βhfull+(1-β)hpruning
where β represents the memory coefficient for "dependence on pruned", the sum of the weight coefficients of the two intermediate hidden layer vectors being 1, hfullRepresenting features extracted by convolution using a graph on the dependency syntax tree before pruning, hpruningFeatures extracted using graph convolution on the pruned dependency syntax tree are represented.
2-4) graph convolutional layers typically have multiple layers of GCN overlays, the output of each layer of GCN containing information for relational classification of pairs of entities in text. And extracting semantic information for relation classification in each layer of GCN by adopting an attention mechanism. Firstly, using maximum pooling operation to calculate and obtain a sentence characterization vector of each layer of GCN, then calculating distributed attention weight, and finally, adopting weighted average sum to obtain final vector representation;
ei=wh T·γi
Figure BDA0003363456440000041
Figure BDA0003363456440000042
wherein gamma isiRepresenting the output vector of the i-th GCN after maximum pooling, eiIndicates the correlation degree value, alpha, between the output vector of the i-th layer GCN and the output vector of the graph convolution layeriOutput vector occupation map representing i-th layer GCNThe weight of the convolutional layer output vector, K represents the total number of layers of the GCN,
Figure BDA0003363456440000043
representing the attention query vector and r the output vector of the map convolution layer.
2-5) mapping the output of the previous layer to an output layer through nonlinear transformation, and then calculating the probability distribution of each relation type by adopting a softmax function:
o=σ(Wdr+bd)
Figure BDA0003363456440000044
wherein, WdRepresenting a discriminant transformation matrix, bdIs a bias term vector, | R | represents the total number of relationship classes, orDenotes the r-th element of the output vector o, similarly to okRepresenting the kth element value.
Further, the parameters of the relation extraction model are optimized by adopting an optimization method of random gradient descent:
Figure BDA0003363456440000045
Figure BDA0003363456440000046
where θ generally refers to all parameters of the model, B represents a training batch containing a fixed number of sentence instances, hiAnd tiRespectively representing the head and tail entities of the ith sentence sample in a batch, and alpha represents the learning rate.
Furthermore, a large number of sentences without artificial labels still exist in the judicial case corpus, entity recognition is carried out by using a named entity recognition tool, then part-of-speech analysis and dependency syntax analysis are carried out, and finally the sentences and corresponding auxiliary information such as part-of-speech and dependency syntax trees are input into the trained relation extraction model to predict the relation types of entity pairs so as to obtain a large number of new fact triples.
A judicial domain relation extraction system based on graph convolution network comprises:
constructing a judicial case special data set module, acquiring unstructured text corpora in the judicial field, preprocessing the text to acquire a fact description part related to the case, and further performing word segmentation, part of speech analysis, dependency syntax analysis and label tagging on the text to construct a judicial case special data set;
the characteristic extraction module is used for coding sentence samples in the data set and dependency syntax trees corresponding to the sentence samples, and inputting the coded sentence samples into a relation extraction model based on a graph convolution network for characteristic extraction;
the relation extraction model training module based on the graph convolution network carries out iterative updating on the parameters of the relation extraction model based on the graph convolution network by utilizing a back propagation algorithm so as to enable the relation extraction model to achieve the best performance;
and the relation extraction module is used for extracting the relation of the non-structured case text in the judicial field by using the trained relation extraction model based on the graph convolution network, and automatically finishing the extraction of the entity triples.
A computer device, the computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the judicial domain relationship extraction method based on graph convolution network as described above is implemented.
A computer-readable storage medium storing a computer program for executing the graph-convolution network-based judicial domain relationship extraction method as described above.
Compared with the prior art, the invention has the advantages that:
1) the relation extraction method introduces additional information, such as part of speech information and a dependency syntax tree, to enrich feature representation of the text, and performs graph convolution coding on the text according to the dependency syntax tree, so that the capability of feature extraction on the lengthy and complex text can be improved.
2) The relation extraction method uses Bi-LSTM to preliminarily extract text information, can enrich context semantics of words and can relieve noise influence caused by wrong syntactic dependency analysis, the wrong syntactic dependency analysis is inevitably introduced by the existing dependency syntactic analysis tool, error propagation is caused by graph convolution on the basis, each word contains context information through the Bi-LSTM, and the strong dependency of graph convolution network on a dependency syntactic tree is relieved.
3) The relation extraction method has stronger robustness, and benefits from a pruning strategy taking SDP as a center and a method for weight assignment. The rule-based pruning strategy can keep most of the core dependency relationships related to the relationship classification, but the hard pruning method always omits some important information. The memory coefficient is added to the pruned information through weight assignment, so that the model can always extract the core structure information from the dependency syntax tree, and other important information cannot be lost, so that the model has stronger robustness.
4) The described relationship extraction method introduces an attention mechanism to further screen the output of the multi-layer graph convolutional network for features used for entity relationship classification. Compared with the method of only using the output of the last layer of graph convolution in the traditional method, the method can solve the problem of selecting the optimal layer number of the graph convolution network. Because the optimal number of layers of the graph convolution network is different for different sentence texts, and the model only has one determined graph convolution layer number, the attention mechanism can calculate the correlation between each layer of output and the entity pair relationship, assign a larger weight to the graph convolution sub-layer output with high correlation, and assign a smaller weight to the graph convolution sub-layer output with low correlation, thereby realizing the dynamic screening of the graph convolution output characteristics.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention;
FIG. 2 is a block diagram of a relational extraction model according to an embodiment of the invention;
FIG. 3 is a diagram illustrating a convolutional layer structure of a relational extraction model according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an attention extracting layer of a relationship extraction model according to an embodiment of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
The first embodiment is as follows: the text corpus used in the embodiment is mainly an examination judgment book of 'property insurance contract dispute' in a civil case.
As shown in fig. 1, a judicial domain relationship extraction method based on graph convolution network includes the following steps:
step 1) acquiring a large number of unstructured text corpora in the judicial field, preprocessing the texts to obtain fact description parts closely related to cases, and then performing word segmentation, part of speech analysis, dependency syntax analysis and label tagging on the texts to construct a judicial case special data set. The specific process is as follows:
1-1) crawling a large amount of unstructured text data from a Chinese referee document network, wherein the unstructured text data are mainly primary check judgment documents of case reasons of 'property insurance contract disputes', the fact that case judgment is usually influenced is concentrated in a few paragraphs in the judgment documents, and the paragraphs accurately describe the full appearance of the cases and are important sources for extracting information. Extracting case identification fact parts from text data by a rule-based method, namely a method for matching keywords and a text structure and taking paragraphs as granularity;
1-2) segmenting paragraphs of the fact part in 1-1) to obtain independent sentences, then segmenting words of the text by taking the sentences as units by means of the conventional word segmentation tool jieba, and then performing entity labeling and relationship labeling on the segmented text, wherein the entity types and relationship types predefined in the embodiment are as follows:
Figure BDA0003363456440000061
Figure BDA0003363456440000071
1-3) filtering the sentences obtained by processing in 1-2), discarding the sentences not containing entity pairs, only keeping the sentences containing entity pairs, then utilizing the existing tool jieba to analyze the part of speech of the filtered sentences, and adopting DDParser (Baidu Dependency Parser) developed by Baidu team for Dependency syntax analysis, and finally forming the special data set used in the embodiment, wherein each sample in the special data set contains information such as complete sentence text, head and tail entity positions, part of speech, Dependency syntax tree and the like.
For example, the sentence "zhangwei driving sue sedan insures motor vehicle loss at the branch of the suzhou city of human insurance. "whose segmentation and part-of-speech analysis results are [ (" zhanwei ", PER), (" drive ", v), (" u ", u), (" threo car ", n), (" on ", p), (" ORG), ("application", v), ("motor vehicle loss insurance", n), ("log", w) ]. After entity labeling, the corresponding labeling sequence is [ "Natural _ person", "O", "Property", "O", "instrument _ company", "O", "instrument", "O" ], wherein the relationship type between the two entities of the sue sedan car and the motor vehicle loss Insurance is "Insurance". The dependency syntax tree structure corresponding to the sentence sequence is [ ('SBV', 2), ('ATT', 4), ('MT', 2), ('SBV', 7), ('MT', 6), ('ADV', 7), ('HED', 0), ('VOB', 7), ('MT', 7) ].
And 2) encoding the text information and the corresponding dependency syntax tree, and inputting the encoded text information and the corresponding dependency syntax tree into a relation extraction model based on a graph convolution network for feature extraction. The simple structure of the relational extraction model is shown in fig. 2, which is a pipelined 5-layer structure in which data is executed in the order of an embedding layer, a contextualization layer, a graph convolution layer, an attention extraction layer, and an output layer. The embedding layer is used for completing conversion from text to vectors, the contextualization layer enables word vectors to contain more context semantics, the graph convolution layer extracts the characteristics of sentences under the assistance of the dependency syntax tree, the attention extraction layer dynamically aggregates sentence characterization vectors from the multi-layer GCN, and finally the judgment of relationship types is completed at the output layer.
The function performed by each layer:
2-1) taking out sentence samples from the judicial case special data set, converting words in the sentences into real-value vectors by using a static word embedding matrix, encoding the parts of speech of the words by using a randomly initialized embedding matrix, splicing the word vectors of the words and the corresponding parts of speech vectors to obtain initial characteristic vectors of the words, wherein one sentence corresponds to one vector sequence.
For example, we use a word embedding matrix with a single word vector of 300 dimensions, and then apply the example sentence "zhangwei driving sue sedan to secure motor vehicle loss insurance at the branch of suzhou peoples. "convert to a 9 x 300 sentence matrix. The part of speech of the word "zhanwei" is "PER", and the corresponding part of speech embedding vector is [0.1156, -0.3487, -0.0861,0.1310, -0.9013,0.5357,0.0125,0.1414, -0.8653, -0.0487 ]. If part-of-speech features are also added to the sentence representation, the sentence is finally represented as a 9 x 310 matrix.
2-2) the initial characterization vectors of the words are independently coded and lack context information, and Bi-LSTM (bidirectional long-short term memory network) is adopted to model the sentences with the relation to be extracted. Specifically, an initial token vector sequence of a sentence is input into a two-layer LSTM network in a positive sequence and a reverse sequence, and then feature vectors extracted by two layers of LSTMs in different directions are spliced, so that the token vector of a word can contain context information of the sentence.
2-3) modeling by using the graph convolution network, and taking the output of the previous layer as the initial input H of the graph convolution network with the structure as shown in FIG. 3(0)The word vector representation of the word is extracted using a graph convolution operation. Particularly, the graph convolution operation also depends on the dependency syntax information of the text sentence, and the adjacent matrix A corresponding to the dependency syntax tree is used as an important auxiliary information to help the graph convolution network to encode the text information, so that the capability of extracting complex text information by the model can be improved.If there is dependency relationship between node i and node j, corresponding to element A in the adjacency matrix i,j1 and A j,i1, otherwise Ai,j0 and Aj,iIn order to transfer the characteristics of the nodes into the vectors of the next hidden layer, 0, a self-rotation edge, namely A, is added to each nodek,kAnd k is any term node 1.
The process of a graph convolution operation can be expressed as the formula:
H(l)=f(H(l-1),A)
wherein H(l)Represents the output of the first layer of the GCN, H(l-1)Representing the input of the l-th layer of the GCN and the output of the l-1 layer, A being the adjacent matrix corresponding to the dependency syntax tree, in particular, H0Represents the initial input of the GCN;
in more detail, taking the t-th hidden layer vector of the l-th layer in the GCN as an example, the calculation formula is as follows:
Figure BDA0003363456440000081
wherein A ist,iValues, W, representing elements of the ith column of the t-th row in the adjacency matrix A(l)Represents the weight coefficient of the l-th layer, ctIs the number of words having a dependency relationship with the t-th word, b(l)Is a bias term and σ () is a nonlinear activation function.
The complete dependency syntax tree contains rich structural information, wherein the dependency information is closely related to the relationship classification, and the dependency information is not related to the relationship classification task. By pruning the original dependency syntax tree, the dependency edges relevant to classification are reserved, and meanwhile, irrelevant dependency edges are pruned, so that the anti-interference capability of the GCN can be enhanced. Firstly, according to the positions of two entities in the dependency syntax tree, the lowest Common ancestor node of the two entities is determined, and a subtree taking the node as a root node is taken as a preliminary pruning result, and the subtree can be called as an LCA (Lowest Common processor) tree. Then, the Shortest Dependent Path (SDP) of the two entities is determined in the LCA tree, and extension is performed outwards based on the SDP Path, and a node on the LCA tree with a distance of less than or equal to D hops from the SDP Path is taken as a final pruning result, where D is an adjustable parameter.
For example, the dependency syntax tree of the above sentence is subjected to SDP route center pruning with D ═ 1, and the result after pruning is [ ('SBV', -), ('ATT', 4), ('MT', -), ('SBV', 7), ('MT', -), ('ADV', 7), ('HED', 0), ('VOB', 7), ('MT', 7) ], where "-" indicates a dependency relationship that is pruned away.
In order to avoid some important information which may be ignored by the rule-based pruning method, the computation of the graph convolution expression vector is performed on the dependency syntax trees before and after pruning, and then the memory of the pruned dependency relationship is adjusted by means of weight assignment:
h=βhfull+(1-β)hpruning
where β represents the memory coefficient for "dependence on pruned", the sum of the weight coefficients of the two intermediate hidden layer vectors being 1, hfullRepresenting features extracted using graph convolution on the dependency syntax tree before pruning, hpruningFeatures extracted using graph convolution on the pruned dependency syntax tree are represented.
2-4) graph convolutional layers typically have multiple layers of GCN overlays, the output of each layer of GCN containing information for relational classification of pairs of entities in text. By adopting an attention mechanism, the structure of the mechanism is shown in FIG. 4, and important semantic information for entity relationship classification in each layer of GCN is extracted. Firstly, using maximum pooling operation to calculate and obtain a sentence characterization vector of each layer of GCN, then calculating distributed attention weight, and finally obtaining final vector representation by adopting a weighted average mode;
ei=wh T·γi
Figure BDA0003363456440000091
Figure BDA0003363456440000092
wherein gamma isiRepresents the output vector of the ith layer GCN after the maximum pooling operation, K represents the total number of layers of the GCN,
Figure BDA0003363456440000093
representing an attention query vector, is a parameter that can be trained.
2-5) mapping the output of the previous layer to an output layer through nonlinear transformation, and then calculating the probability distribution of each relationship type by adopting a softmax function:
o=σ(Wdr+bd)
Figure BDA0003363456440000101
where | R | represents the total number of relationship categories, with | R | having a size of 6 in this embodiment, WdRepresenting a discriminant transformation matrix, bdIs a vector of bias terms, orDenotes the r-th element of the output vector o, similarly to okRepresenting the value of the kth element.
For example, the prediction sentence "zhangwei driving sue sedan insures motor vehicle loss at the branch of the suzhou city of human insurance. The final relationship classification probability distribution of the relationship between the two entities of the 'Zhongsu' sedan car and the 'loss insurance for motor vehicles' is [0.763,0.027,0.075,0.034,0.089,0.012], which respectively represents probability values of the above 6 relationship types, wherein the probability of the 'insuring' relationship is the maximum, so that the relationship extraction model extracts the 'insuring' type for the two entities.
And 3) iteratively updating the parameters of the model by using a back propagation algorithm to obtain a relation extraction model with optimal performance.
And (3) optimizing the parameters of the relation extraction model by adopting an optimization method of random gradient descent:
Figure BDA0003363456440000102
Figure BDA0003363456440000103
and 4) using the trained extraction model for relation extraction of unstructured case texts in the judicial field, inputting unlabeled sentences, performing entity identification by using a named entity identification tool, then performing part-of-speech analysis and dependency syntax analysis, and finally inputting the sentences and corresponding auxiliary information such as part-of-speech and dependency syntax trees into the trained model to predict the relation types of entity pairs to obtain new fact triples.
A judicial domain relation extraction system based on graph convolution network comprises:
constructing a judicial case special data set module, acquiring unstructured text corpora in the judicial field, preprocessing the text to acquire a fact description part related to the case, and further performing word segmentation, part of speech analysis, dependency syntax analysis and label tagging on the text to construct a judicial case special data set;
the characteristic extraction module is used for coding sentence samples in the data set and the dependency syntax trees corresponding to the sentence samples and inputting the coded sentence samples into a relation extraction model based on a graph volume network for characteristic extraction;
the relation extraction model training module based on the graph convolution network carries out iterative updating on the parameters of the relation extraction model based on the graph convolution network by utilizing a back propagation algorithm so as to enable the relation extraction model to achieve the best performance;
and the relation extraction module is used for extracting the relation of the non-structured case text in the judicial field by using the trained relation extraction model based on the graph convolution network, and automatically finishing the extraction of the entity triples.
The implementation of the modules in the system is partially the same as the specific implementation of the method.
It should be apparent to those skilled in the art that the steps of the graph convolution network based judicial domain relationship extraction method or the modules of the graph convolution network based judicial domain relationship extraction system of the embodiment of the present invention described above can be implemented by a general purpose computing device, they can be centralized on a single computing device or distributed on a network composed of a plurality of computing devices, and they can be alternatively implemented by program codes executable by the computing devices, so that they can be stored in a storage device and executed by the computing devices, and in some cases, the steps shown or described can be executed in a different order from that here, or they can be respectively made into various integrated circuit modules, or a plurality of modules or steps therein can be made into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

Claims (10)

1. A judicial domain relation extraction method based on a graph convolution network is characterized by comprising the following steps:
step 1) acquiring unstructured text corpora in the judicial field, performing text preprocessing to acquire a fact description part related to a case, and further performing word segmentation, part of speech analysis, dependency syntactic analysis and label tagging on the text to construct a judicial case special data set;
step 2) coding sentence samples in the data set constructed in the last step and dependency syntax trees corresponding to the sentence samples, and inputting the coded sentence samples into a relation extraction model based on a graph convolution network for feature extraction;
step 3) utilizing a back propagation algorithm to iteratively update parameters of the relation extraction model based on the graph convolution network, so that the relation extraction model achieves the best performance;
and 4) applying the trained relation extraction model based on the graph convolution network to relation extraction of unstructured case texts in the judicial field, and automatically completing extraction of entity triples.
2. The extraction method of judicial domain relations based on graph-convolution network as claimed in claim 1, wherein the construction of the judicial case specific data set of step 1) comprises the following processes:
1-1) extracting case identification fact parts from unstructured text data in the judicial field;
1-2) segmenting the text data of the case identification fact part obtained by the method, and then carrying out entity labeling and relation labeling on the segmented text data by taking a sentence as a unit;
1-3) filtering the sentences subjected to entity labeling and relation labeling in the step 1-2), only reserving the sentences containing entity pairs, and then performing part-of-speech analysis and dependency syntactic analysis on the sentence texts to finally form a judicial case special data set.
3. The method as claimed in claim 1, wherein the relation extraction model based on the convolutional network describes a relation extraction model before training, and encodes the text sentences in the data set and the dependency syntax trees corresponding to the sentences, and inputs the encoded text sentences and the dependency syntax trees into the relation extraction model based on the convolutional network for feature extraction.
4. The method of claim 1, wherein text sentences are extracted from the judicial case specific dataset and converted into real valued vector sequences, the sentences and the dependency syntax tree are encoded by using the graph convolution operation, and finally feature vectors for relation classification are obtained by using an attention mechanism to extract the output of the multi-layer GCN network; the method specifically comprises the following steps:
2-1) converting words in a text sentence into real-value vectors by using a static word embedding matrix, coding the part of speech of the words by using a randomly initialized embedding matrix, splicing the word vectors of the words and the corresponding part of speech vectors to obtain initial characterization vectors of the words, wherein one sentence corresponds to one vector sequence;
2-2) modeling the sentences of the relation to be extracted by adopting Bi-LSTM; specifically, an initial characterization vector sequence of a sentence is input into a two-layer LSTM network with a positive sequence and a reverse sequence, and then feature vectors extracted by the two layers of LSTMs in different directions are spliced, so that the characterization vectors of words can contain context information of the sentence;
2-3) modeling by using graph convolution network, and taking the output of Bi-LSTM as the initial input H of graph convolution network(0)Extracting word vector representations of the words using a graph convolution operation;
2-4) extracting semantic information for relation classification in each layer of GCN by adopting an attention mechanism; firstly, using maximum pooling operation to calculate and obtain a sentence characterization vector of each layer of GCN, then calculating distributed attention weight, and finally, adopting weighted average sum to obtain final vector representation;
2-5) mapping the output of the previous layer to an output layer through nonlinear transformation, and then calculating the probability distribution of each relationship type by adopting a softmax function.
5. The judicial domain relationship extraction method based on graph convolution network according to claim 1, characterized in that the parameters of the relationship extraction model are optimized by an optimization method of random gradient descent:
Figure FDA0003363456430000021
Figure FDA0003363456430000022
where θ generally refers to all parameters of the model, B represents a training batch containing a fixed number of sentence instances, hiAnd tiRespectively representing the head and tail entities of the ith sentence sample in a batch, and alpha represents the learning rate.
6. The method of claim 4, wherein the graph convolution operation further depends on the dependency syntax information of the text sentence, and the adjacency matrix A corresponding to the dependency syntax tree is used as an auxiliary information to assist the graph convolution network in encoding the text information.
7. The judicial domain relationship extraction method based on graph convolution network of claim 6, wherein the dependency syntax tree is pruned, the dependency edges related to classification are reserved, and the irrelevant dependency edges are pruned; firstly, determining the lowest common ancestor node of two entities according to the positions of the two entities in a dependency syntax tree, and taking a subtree taking the lowest common ancestor node as a root node as a preliminary pruning result, wherein the subtree can be called an LCA tree; then, determining the shortest dependent path of two entities in the LCA tree, abbreviated as SDP, expanding outwards based on the SDP path, and taking the node on the LCA tree which is less than or equal to D hops away from the SDP path as a final pruning result, wherein D is an adjustable parameter;
and respectively calculating graph convolution expression vectors on the dependency syntax trees before pruning and after pruning, and then adjusting the memory of the pruned dependency relationship in a weight assignment mode.
8. A judicial domain relationship extraction system based on graph convolution network is characterized by comprising:
constructing a data set module special for judicial cases, acquiring unstructured text corpora in the judicial field, preprocessing the texts to acquire fact description parts related to the cases, and further performing word segmentation, part of speech analysis, dependency syntactic analysis and label labeling on the texts to construct a data set special for the judicial cases;
the characteristic extraction module is used for coding sentence samples in the data set and dependency syntax trees corresponding to the sentence samples, and inputting the coded sentence samples into a relation extraction model based on a graph convolution network for characteristic extraction;
the relation extraction model training module based on the graph convolution network carries out iterative updating on the parameters of the relation extraction model based on the graph convolution network by utilizing a back propagation algorithm so as to enable the relation extraction model to achieve the best performance;
and the relation extraction module is used for extracting the relation of the non-structured case text in the judicial field by using the trained relation extraction model based on the graph convolution network, and automatically finishing the extraction of the entity triples.
9. A computer device, characterized by: the computer device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the judicial domain relationship extraction method based on graph volume network according to any one of claims 1-7.
10. A computer-readable storage medium characterized by: the computer readable storage medium stores a computer program for executing the graph volume network-based judicial domain relationship extraction method according to any one of claims 1 to 7.
CN202111374051.7A 2021-11-19 2021-11-19 Judicial domain relation extraction method and system based on graph convolution network Active CN114091450B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111374051.7A CN114091450B (en) 2021-11-19 2021-11-19 Judicial domain relation extraction method and system based on graph convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111374051.7A CN114091450B (en) 2021-11-19 2021-11-19 Judicial domain relation extraction method and system based on graph convolution network

Publications (2)

Publication Number Publication Date
CN114091450A true CN114091450A (en) 2022-02-25
CN114091450B CN114091450B (en) 2022-11-18

Family

ID=80302120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111374051.7A Active CN114091450B (en) 2021-11-19 2021-11-19 Judicial domain relation extraction method and system based on graph convolution network

Country Status (1)

Country Link
CN (1) CN114091450B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926823A (en) * 2022-05-07 2022-08-19 西南交通大学 WGCN-based vehicle driving behavior prediction method
CN116304748A (en) * 2023-05-17 2023-06-23 成都工业学院 Text similarity calculation method, system, equipment and medium
CN117609519A (en) * 2024-01-22 2024-02-27 云南大学 Entity relation extraction method in electric power carbon emission calculation formula
CN117633245A (en) * 2023-11-24 2024-03-01 重庆赛力斯新能源汽车设计院有限公司 Knowledge graph construction method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977228A (en) * 2019-03-21 2019-07-05 浙江大学 The information identification method of grid equipment defect text
CN111382333A (en) * 2020-03-11 2020-07-07 昆明理工大学 Case element extraction method in news text sentence based on case correlation joint learning and graph convolution
CN112001186A (en) * 2020-08-26 2020-11-27 重庆理工大学 Emotion classification method using graph convolution neural network and Chinese syntax
CN112507699A (en) * 2020-09-16 2021-03-16 东南大学 Remote supervision relation extraction method based on graph convolution network
CN113239186A (en) * 2021-02-26 2021-08-10 中国科学院电子学研究所苏州研究院 Graph convolution network relation extraction method based on multi-dependency relation representation mechanism
CN113449084A (en) * 2021-09-01 2021-09-28 中国科学院自动化研究所 Relationship extraction method based on graph convolution
CN113641820A (en) * 2021-08-10 2021-11-12 福州大学 Visual angle level text emotion classification method and system based on graph convolution neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977228A (en) * 2019-03-21 2019-07-05 浙江大学 The information identification method of grid equipment defect text
CN111382333A (en) * 2020-03-11 2020-07-07 昆明理工大学 Case element extraction method in news text sentence based on case correlation joint learning and graph convolution
CN112001186A (en) * 2020-08-26 2020-11-27 重庆理工大学 Emotion classification method using graph convolution neural network and Chinese syntax
CN112507699A (en) * 2020-09-16 2021-03-16 东南大学 Remote supervision relation extraction method based on graph convolution network
CN113239186A (en) * 2021-02-26 2021-08-10 中国科学院电子学研究所苏州研究院 Graph convolution network relation extraction method based on multi-dependency relation representation mechanism
CN113641820A (en) * 2021-08-10 2021-11-12 福州大学 Visual angle level text emotion classification method and system based on graph convolution neural network
CN113449084A (en) * 2021-09-01 2021-09-28 中国科学院自动化研究所 Relationship extraction method based on graph convolution

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YUHAO ZHANG 等: "Graph Convolution over Pruned Dependency Trees Improves Relation Extraction", 《ARXIV:1809.10185V1》 *
冯兴杰 等: "基于图神经网络与深度学习的商品推荐算法", 《计算机应用研究》 *
王晓霞: "基于注意力与图卷积网络的关系抽取模型", 《计算机应用》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926823A (en) * 2022-05-07 2022-08-19 西南交通大学 WGCN-based vehicle driving behavior prediction method
CN114926823B (en) * 2022-05-07 2023-04-18 西南交通大学 WGCN-based vehicle driving behavior prediction method
CN116304748A (en) * 2023-05-17 2023-06-23 成都工业学院 Text similarity calculation method, system, equipment and medium
CN117633245A (en) * 2023-11-24 2024-03-01 重庆赛力斯新能源汽车设计院有限公司 Knowledge graph construction method and device, electronic equipment and storage medium
CN117609519A (en) * 2024-01-22 2024-02-27 云南大学 Entity relation extraction method in electric power carbon emission calculation formula
CN117609519B (en) * 2024-01-22 2024-04-19 云南大学 Entity relation extraction method in electric power carbon emission calculation formula

Also Published As

Publication number Publication date
CN114091450B (en) 2022-11-18

Similar Documents

Publication Publication Date Title
CN114091450B (en) Judicial domain relation extraction method and system based on graph convolution network
CN109508462B (en) Neural network Mongolian Chinese machine translation method based on encoder-decoder
CN106980683B (en) Blog text abstract generating method based on deep learning
CN106202010B (en) Method and apparatus based on deep neural network building Law Text syntax tree
CN117076653B (en) Knowledge base question-answering method based on thinking chain and visual lifting context learning
CN111831789B (en) Question-answering text matching method based on multi-layer semantic feature extraction structure
CN113642330A (en) Rail transit standard entity identification method based on catalog topic classification
CN112487812B (en) Nested entity identification method and system based on boundary identification
CN111858940B (en) Multi-head attention-based legal case similarity calculation method and system
WO2021082086A1 (en) Machine reading method, system, device, and storage medium
CN114818717B (en) Chinese named entity recognition method and system integrating vocabulary and syntax information
CN111428511B (en) Event detection method and device
CN113360654B (en) Text classification method, apparatus, electronic device and readable storage medium
CN117291265B (en) Knowledge graph construction method based on text big data
CN114077673A (en) Knowledge graph construction method based on BTBC model
CN113920379A (en) Zero sample image classification method based on knowledge assistance
CN114281982B (en) Book propaganda abstract generation method and system adopting multi-mode fusion technology
CN115033706A (en) Method for automatically complementing and updating knowledge graph
CN114638228A (en) Chinese named entity recognition method based on word set self-attention
CN112926323B (en) Chinese named entity recognition method based on multistage residual convolution and attention mechanism
CN114218921A (en) Problem semantic matching method for optimizing BERT
CN117094325B (en) Named entity identification method in rice pest field
CN113076744A (en) Cultural relic knowledge relation extraction method based on convolutional neural network
CN112949293A (en) Similar text generation method, similar text generation device and intelligent equipment
CN114579605B (en) Table question-answer data processing method, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant