CN112528036A

CN112528036A - Knowledge graph automatic construction method for evidence correlation analysis

Info

Publication number: CN112528036A
Application number: CN202011372006.3A
Authority: CN
Inventors: 孙媛媛; 宋文辉
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-03-19
Anticipated expiration: 2040-11-30
Also published as: CN112528036B

Abstract

The invention relates to an automatic knowledge graph construction method, in particular to an automatic knowledge graph construction method for evidence correlation analysis, which comprises the following steps: step 1, constructing an ontology to describe a knowledge graph, step 2, extracting case-related evidences, step 3, extracting case structural elements, step 4, establishing an evidence relation between the evidences and the case structural elements, step 5, fusing the knowledge graph of the high-similarity entity, and step 6, storing the knowledge graph. In the prior art, the knowledge graph for storing and representing is lacked in the evidence field at present, but the method is simple to operate, can construct the high-quality knowledge graph with low labor cost, and improves the evidence analysis efficiency.

Description

Knowledge graph automatic construction method for evidence correlation analysis

Technical Field

The invention relates to a knowledge graph automatic construction method, in particular to a knowledge graph automatic construction method for evidence correlation analysis.

Background

At present, the national has no evidence law, a unified and definite regulation for evidence relevance is lacked, and only a few scattered regulations are provided on legislation. The ambiguity in the definition of evidence relevance sometimes makes it difficult for a judge to distinguish between relevant evidence and non-relevant evidence, resulting in a decrease in the efficiency of litigation and a case that is not easily detected. In judicial practice, the counseling parties may present various evidence interfering with the judgment for the purpose of victory, and the adoption of the evidence can cause bias, confusion and the like although the evidence has relevance. If the relevance of the evidence is not explained, the citizen's trust in the court, legal and governmental agencies may be compromised.

Criminal law and technical development are always closely connected, and technical support cannot be separated from investigation and material evidence. The concept of legal artificial intelligence is disassembled in a colorful way, the feasibility of the artificial intelligence for criminal trial evidence examination is analyzed, the application difficulty and limitation of the artificial intelligence in the criminal trial evidence examination are analyzed, and a reasonable application strategy is provided; the artificial intelligence technology is used for assisting in the improvement of criminal evidence standards, the functions of the intelligent technology in the aspects of evidence verification, leakage detection, gap filling and the like are fully exerted, and a foundation is laid for the application of big data and cloud computing to test the integrity of an evidence chain; by means of deep learning technology, the research and development team of advanced people's court in Shanghai city formulates evidence standard and evidence rule aiming at the problem that evidence is easy to be generated, frequently generated and common in the evidence obtaining link in the combed judicial practice, and provides an intelligent auxiliary case handling system for Shanghai criminal cases. The methods put more attention on evidence standards and lack judgment on evidence relevance, so that it is important to propose a model to make up for the technical gap in the aspect.

Disclosure of Invention

In order to make up for the defects in the prior art, the invention aims to provide an automatic knowledge graph construction method for evidence correlation analysis. The method can complete information extraction of massive legal documents, complete information slot filling according to the designed body, and automatically construct a high-quality knowledge map. The map provides an electronic database of historical case evidence to assist judicial personnel in completing evidence-related services.

In order to achieve the purpose of the invention and solve the problems in the prior art, the invention adopts the technical scheme that: an automatic knowledge graph construction method for evidence correlation analysis comprises the following steps:

step 1, constructing an ontology to describe a knowledge graph, and constructing a high-quality ontology structure to organize and express related knowledge by document research, data lookup, and artificial design of concepts, attributes and constraints, wherein the method specifically comprises the following substeps:

(a) analyzing evidence description in the certificate-taking and quality certificate-submitting lines, dividing evidence concepts into eight subclasses according to the regulations on evidence in the Chinese people's republic of China criminal litigation law (2018 amendment), wherein the eight subclasses comprise material certificates, book certificates, testimony, appraisal opinions, statements of the victim, criminal suspects and the victim for providing and resolving, investigation, inspection, recognition and investigation of experimental notes, audio-visual data and electronic data, simultaneously completing the definition of attributes, and mapping evidence information in the certificate-taking and quality certificate-submitting lines into the evidence concepts;

(b) analyzing the structure of the prosecution book, dividing the concept of the prosecution book into four sub-concepts of a suspect, a criminal fact, an evidence set and a monitoring hospital opinion, wherein the first two parts are used as main evidence objects of the evidence, and simultaneously redefining a natural condition concept and analyzing the concept of the suspect in order to ensure that the evidence strength is more precise;

(c) defining the relationship between the concept of the natural condition and the concept of the suspect, wherein the natural condition is used for describing the suspect, so that the relationship is defined according to the presentation of a judicial text, the head entity of the relationship is restricted to be the natural condition, and the tail entity of the relationship is restricted to be the suspect;

(d) defining the relation between an evidence concept and a prosecution book concept, defining the proving relation after the two concepts of the evidence and the prosecution book are constructed, constraining a head entity of the proving relation as the evidence and a tail entity of the proving relation as the prosecution book, establishing the relation between the two concepts, and completing the construction of a knowledge graph body;

step 2, extracting case-involved evidences, extracting evidences presented in the outline of proof and quality certification by using a named entity recognition technology, and automatically determining the proof direction of the evidence entities by rules, wherein the method specifically comprises the following substeps:

(a) constructing an evidence entity identification data set, wherein the proof and quality evidence synopsis contains description of related evidence, marking the evidence entities in the synopsis in a manual and regular mode, and constructing a training data set of the model;

(b) establishing a neural network to perform named entity recognition, adopting a classic encoder-decoder framework to perform entity recognition, using a pre-training model with strong language representation capability for an encoder, adopting a feedforward neural network for a decoder, describing the calculation process through a formula (l) and a formula (2),

h_t＝PLM(x_t) (1)

wherein PLM represents a Pre-trained Language Model adopted, the Language Model is trained by scientific research institutions to obtain a back open source, x_tInput data representing time t, h_tThe encoded intermediate vector representing the input at time t,

y_t＝FFN(h_t) (2)

where FFN represents a feed-forward neural network, different neural network structures are selected according to different inputs, y_tEntity tags representing corresponding positions of the input sequence;

(c) training the neural network model by using the marked data, firstly cutting a data set, dividing the data set into a training set, a verification set and a test set according to a proportion, then inputting the training set data into the model, calculating the accuracy, the recall rate and the F value of the model, adjusting the training times, the learning rate and the network structure hyper-parameter according to the test result of the model to obtain a parameter combination when the model represents the best, recording the parameters, and storing the model;

(d) packaging the optimal model in the training process, performing text preprocessing on a new input text according to the same pre-trained word vector, serializing text language data into a text vector which can be calculated by the model to express, obtaining a corresponding label set through model prediction, processing the label sequence again through a specific rule method to determine an entity boundary to obtain an evidence entity, and simultaneously obtaining entity type information to determine the evidence type of the entity;

step 3, extracting case structural elements, analyzing case structures in the prosecution book by using a method of combining a neural network and rules, dividing the case structures into different structural elements, and specifically comprising the following substeps:

(a) analyzing the prosecution text in the data set, dividing the document structure according to the designed body, positioning and dividing paragraphs and keywords in the paragraphs, roughly cutting the text by using Boolean operation matched with the keywords, and realizing coarse-grained division of the text;

(b) aiming at the text which can not be segmented or has poor segmentation effect by Boolean operation, a neural network model is built to realize the target, each paragraph in the document is firstly serialized into a word vector by using a neural network method, then a logistic regression model is built to predict, whether the paragraph corresponding to each word vector is a boundary paragraph is judged, the calculation process is described by a formula (3),

lab_i＝LR(NN(par_i)) (3)

wherein par_iRepresenting the text sequence of the ith paragraph in the document, NN representing the neural network method for serializing a paragraph of text into a word vector, LR representing the logistic regression model for determining whether the paragraph is a boundary paragraph, lab_iA label representing the ith paragraph, wherein a result of 1 indicates that the paragraph is a boundary paragraph and a result of 0 indicates a non-boundary paragraph;

(c) training the model and predicting on new text data, inputting a document with a correct Boolean operation result as label data into the model, iteratively training for multiple rounds, adjusting the number of network layers, the learning rate and the optimizer parameters until the model achieves the optimal effect, and then applying the model to the document with the boundary which can not be positioned by Boolean operation to obtain the correct boundary;

(d) obtaining the serial number of the boundary paragraph in the prosecution book through the process, processing the prosecution book by using a rule, dividing the prosecution book into four parts, namely a suspect, a crime fact, an evidence set and a survey department opinion, mapping the content of the prosecution book with a prosecution book body, and instantiating the prosecution book body;

step 4, establishing a proving relation between the evidence and the case structural elements, analyzing the similarity between the proving object description and the structural elements by using a text matching technology, and judging whether the proving relation exists, wherein the method specifically comprises the following substeps:

(a) analyzing the text description of each evidence of the proof and quality certification outline about a proof object, analyzing the text description of four structural elements in a corresponding prosecution book, judging whether the analyzed evidence has a proof relation with a structure, manually designing a marking rule and a frame, carrying out a small amount of manual marking, and then enabling a third party to carry out manual verification to ensure the correctness of the marking;

(b) establishing a proving relation between the neural network model prediction evidence and the case structure elements, calculating the similarity between the proving object text description of the evidence and the case structure element text description through the neural network, and judging whether the proving relation exists or not by taking the relative size of the similarity as a reference;

(c) carrying out model training by using a remote supervision method, marking a small amount of high-quality data in the substep (a) of the step 4, and then carrying out data enhancement by using a remote supervision mode to realize the training of the model on a large data set, wherein parameters are continuously adjusted in the training process until an optimal model structure is stored;

(d) predicting the relationship between each group of text evidence sets and case structural elements by using the trained model, firstly extracting an evidence list and a case structural element list from a text group related to a specific case, then taking Cartesian products of elements in the two sets, calculating an evidence chain label between an evidence entity and the structural element by using the model, and finally adding a combination with a proving relationship into a triple set;

and 5, fusing knowledge of the high-similarity entity, calculating semantic mapping relations among different judicial text instances by using a neural network, and fusing the knowledge, wherein the method specifically comprises the following substeps:

(a) step 1 to step 4, a preliminary knowledge graph is built, but an entity with high similarity similar to household registration information and household registration certificate exists, the knowledge of the entity is expanded through remote supervision, then the attribute information of the entity and the information of a related entity are combined, and the three kinds of information are spliced to be used as vector expression of the entity;

(b) building a model to calculate the similarity between vector expressions of the entities, performing entity association in the horizontal direction to realize example data complementation, and if the similarity of the entities is higher than a threshold value, considering that the two entities describe the same information and performing entity linkage; if the similarity of the entities is low, entity linkage is not carried out, the two entities independently describe the respective information, the calculation process is described by formula (4),

sim＝f(x_exp；x_attr；x_adj) (4)

wherein x is_expRepresenting knowledge expressions, x, of entities in a third-party knowledge base_attrRepresenting an attribute expression, x, of an entity_adjExpressing vector expression of related entities, f expressing a similarity calculation model, sim expressing a similarity value calculated by the model;

(c) performing knowledge fusion according to the similarity values obtained by calculation, firstly determining a central entity in an entity set which is linked with each other, then fusing the relationship and the attribute values of non-central entities to the central entity, and if a relationship or attribute conflict is detected in the fusion process, performing conflict resolution by adopting a voting-based method;

step 6, storing the knowledge graph, and storing the knowledge graph by using a graph database to improve the query efficiency, wherein the method specifically comprises the following substeps:

(a) the entities in the knowledge graph are regarded as nodes, the relations are regarded as edges with labels, the data of the knowledge graph obviously meet the graph model structure, the directed graph is used for modeling the data of the knowledge graph based on the storage method of the graph structure, and the data are represented and stored through the nodes, the edges and the attributes;

(b) and importing the automatically extracted relational data into a graph database in batch, storing the data into a csv structure, respectively defining a node file and a relational file of the csv structure, and importing the data by using a command carried by the graph database to complete automatic construction of the knowledge graph.

The invention has the beneficial effects that: an automatic knowledge graph construction method for evidence correlation analysis comprises the following steps: step 1, constructing an ontology to describe a knowledge graph, step 2, extracting case-related evidences, step 3, extracting case structural elements, step 4, establishing an evidence relation between the evidences and the case structural elements, step 5, fusing the knowledge graph of the high-similarity entity, and step 6, storing the knowledge graph. Compared with the prior art, the method has the advantages that the knowledge graph for storing and representing is lacked in the evidence field at present, the method is simple to operate, the high-quality graph can be constructed at low labor cost, and the evidence analysis efficiency is improved.

Drawings

FIG. 1 is a flow chart of the method steps of the present invention.

FIG. 2 is a diagram of an evidence ontology constructed by the present invention.

FIG. 3 is a representation of an evidence entity identification process of the present invention.

Detailed Description

The invention will be further explained with reference to the drawings.

As shown in fig. 1, an automatic knowledge graph construction method for evidence correlation analysis includes the following steps:

h_t＝PLM(x_t) (1)

y_t＝FFN(h_t) (2)

lab_i＝LR(NN(par_i)) (3)

sim＝f(x_exp；x_attr；x_adj) (4)

Claims

1. An automatic knowledge graph construction method for evidence correlation analysis is characterized by comprising the following steps:

h_t＝PLM(x_t) (1)

y_t＝FFN(h_t) (2)

lab_i＝LR(NN(par_i)) (3)

sim＝f(x_exp；x_attr；x_adj) (4)