CN112528036A - Knowledge graph automatic construction method for evidence correlation analysis - Google Patents

Knowledge graph automatic construction method for evidence correlation analysis Download PDF

Info

Publication number
CN112528036A
CN112528036A CN202011372006.3A CN202011372006A CN112528036A CN 112528036 A CN112528036 A CN 112528036A CN 202011372006 A CN202011372006 A CN 202011372006A CN 112528036 A CN112528036 A CN 112528036A
Authority
CN
China
Prior art keywords
evidence
model
entity
data
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011372006.3A
Other languages
Chinese (zh)
Other versions
CN112528036B (en
Inventor
孙媛媛
宋文辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202011372006.3A priority Critical patent/CN112528036B/en
Publication of CN112528036A publication Critical patent/CN112528036A/en
Application granted granted Critical
Publication of CN112528036B publication Critical patent/CN112528036B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3341Query execution using boolean model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Technology Law (AREA)
  • Artificial Intelligence (AREA)
  • Animal Behavior & Ethology (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an automatic knowledge graph construction method, in particular to an automatic knowledge graph construction method for evidence correlation analysis, which comprises the following steps: step 1, constructing an ontology to describe a knowledge graph, step 2, extracting case-related evidences, step 3, extracting case structural elements, step 4, establishing an evidence relation between the evidences and the case structural elements, step 5, fusing the knowledge graph of the high-similarity entity, and step 6, storing the knowledge graph. In the prior art, the knowledge graph for storing and representing is lacked in the evidence field at present, but the method is simple to operate, can construct the high-quality knowledge graph with low labor cost, and improves the evidence analysis efficiency.

Description

Knowledge graph automatic construction method for evidence correlation analysis
Technical Field
The invention relates to a knowledge graph automatic construction method, in particular to a knowledge graph automatic construction method for evidence correlation analysis.
Background
At present, the national has no evidence law, a unified and definite regulation for evidence relevance is lacked, and only a few scattered regulations are provided on legislation. The ambiguity in the definition of evidence relevance sometimes makes it difficult for a judge to distinguish between relevant evidence and non-relevant evidence, resulting in a decrease in the efficiency of litigation and a case that is not easily detected. In judicial practice, the counseling parties may present various evidence interfering with the judgment for the purpose of victory, and the adoption of the evidence can cause bias, confusion and the like although the evidence has relevance. If the relevance of the evidence is not explained, the citizen's trust in the court, legal and governmental agencies may be compromised.
Criminal law and technical development are always closely connected, and technical support cannot be separated from investigation and material evidence. The concept of legal artificial intelligence is disassembled in a colorful way, the feasibility of the artificial intelligence for criminal trial evidence examination is analyzed, the application difficulty and limitation of the artificial intelligence in the criminal trial evidence examination are analyzed, and a reasonable application strategy is provided; the artificial intelligence technology is used for assisting in the improvement of criminal evidence standards, the functions of the intelligent technology in the aspects of evidence verification, leakage detection, gap filling and the like are fully exerted, and a foundation is laid for the application of big data and cloud computing to test the integrity of an evidence chain; by means of deep learning technology, the research and development team of advanced people's court in Shanghai city formulates evidence standard and evidence rule aiming at the problem that evidence is easy to be generated, frequently generated and common in the evidence obtaining link in the combed judicial practice, and provides an intelligent auxiliary case handling system for Shanghai criminal cases. The methods put more attention on evidence standards and lack judgment on evidence relevance, so that it is important to propose a model to make up for the technical gap in the aspect.
Disclosure of Invention
In order to make up for the defects in the prior art, the invention aims to provide an automatic knowledge graph construction method for evidence correlation analysis. The method can complete information extraction of massive legal documents, complete information slot filling according to the designed body, and automatically construct a high-quality knowledge map. The map provides an electronic database of historical case evidence to assist judicial personnel in completing evidence-related services.
In order to achieve the purpose of the invention and solve the problems in the prior art, the invention adopts the technical scheme that: an automatic knowledge graph construction method for evidence correlation analysis comprises the following steps:
step 1, constructing an ontology to describe a knowledge graph, and constructing a high-quality ontology structure to organize and express related knowledge by document research, data lookup, and artificial design of concepts, attributes and constraints, wherein the method specifically comprises the following substeps:
(a) analyzing evidence description in the certificate-taking and quality certificate-submitting lines, dividing evidence concepts into eight subclasses according to the regulations on evidence in the Chinese people's republic of China criminal litigation law (2018 amendment), wherein the eight subclasses comprise material certificates, book certificates, testimony, appraisal opinions, statements of the victim, criminal suspects and the victim for providing and resolving, investigation, inspection, recognition and investigation of experimental notes, audio-visual data and electronic data, simultaneously completing the definition of attributes, and mapping evidence information in the certificate-taking and quality certificate-submitting lines into the evidence concepts;
(b) analyzing the structure of the prosecution book, dividing the concept of the prosecution book into four sub-concepts of a suspect, a criminal fact, an evidence set and a monitoring hospital opinion, wherein the first two parts are used as main evidence objects of the evidence, and simultaneously redefining a natural condition concept and analyzing the concept of the suspect in order to ensure that the evidence strength is more precise;
(c) defining the relationship between the concept of the natural condition and the concept of the suspect, wherein the natural condition is used for describing the suspect, so that the relationship is defined according to the presentation of a judicial text, the head entity of the relationship is restricted to be the natural condition, and the tail entity of the relationship is restricted to be the suspect;
(d) defining the relation between an evidence concept and a prosecution book concept, defining the proving relation after the two concepts of the evidence and the prosecution book are constructed, constraining a head entity of the proving relation as the evidence and a tail entity of the proving relation as the prosecution book, establishing the relation between the two concepts, and completing the construction of a knowledge graph body;
step 2, extracting case-involved evidences, extracting evidences presented in the outline of proof and quality certification by using a named entity recognition technology, and automatically determining the proof direction of the evidence entities by rules, wherein the method specifically comprises the following substeps:
(a) constructing an evidence entity identification data set, wherein the proof and quality evidence synopsis contains description of related evidence, marking the evidence entities in the synopsis in a manual and regular mode, and constructing a training data set of the model;
(b) establishing a neural network to perform named entity recognition, adopting a classic encoder-decoder framework to perform entity recognition, using a pre-training model with strong language representation capability for an encoder, adopting a feedforward neural network for a decoder, describing the calculation process through a formula (l) and a formula (2),
ht=PLM(xt) (1)
wherein PLM represents a Pre-trained Language Model adopted, the Language Model is trained by scientific research institutions to obtain a back open source, xtInput data representing time t, htThe encoded intermediate vector representing the input at time t,
yt=FFN(ht) (2)
where FFN represents a feed-forward neural network, different neural network structures are selected according to different inputs, ytEntity tags representing corresponding positions of the input sequence;
(c) training the neural network model by using the marked data, firstly cutting a data set, dividing the data set into a training set, a verification set and a test set according to a proportion, then inputting the training set data into the model, calculating the accuracy, the recall rate and the F value of the model, adjusting the training times, the learning rate and the network structure hyper-parameter according to the test result of the model to obtain a parameter combination when the model represents the best, recording the parameters, and storing the model;
(d) packaging the optimal model in the training process, performing text preprocessing on a new input text according to the same pre-trained word vector, serializing text language data into a text vector which can be calculated by the model to express, obtaining a corresponding label set through model prediction, processing the label sequence again through a specific rule method to determine an entity boundary to obtain an evidence entity, and simultaneously obtaining entity type information to determine the evidence type of the entity;
step 3, extracting case structural elements, analyzing case structures in the prosecution book by using a method of combining a neural network and rules, dividing the case structures into different structural elements, and specifically comprising the following substeps:
(a) analyzing the prosecution text in the data set, dividing the document structure according to the designed body, positioning and dividing paragraphs and keywords in the paragraphs, roughly cutting the text by using Boolean operation matched with the keywords, and realizing coarse-grained division of the text;
(b) aiming at the text which can not be segmented or has poor segmentation effect by Boolean operation, a neural network model is built to realize the target, each paragraph in the document is firstly serialized into a word vector by using a neural network method, then a logistic regression model is built to predict, whether the paragraph corresponding to each word vector is a boundary paragraph is judged, the calculation process is described by a formula (3),
labi=LR(NN(pari)) (3)
wherein pariRepresenting the text sequence of the ith paragraph in the document, NN representing the neural network method for serializing a paragraph of text into a word vector, LR representing the logistic regression model for determining whether the paragraph is a boundary paragraph, labiA label representing the ith paragraph, wherein a result of 1 indicates that the paragraph is a boundary paragraph and a result of 0 indicates a non-boundary paragraph;
(c) training the model and predicting on new text data, inputting a document with a correct Boolean operation result as label data into the model, iteratively training for multiple rounds, adjusting the number of network layers, the learning rate and the optimizer parameters until the model achieves the optimal effect, and then applying the model to the document with the boundary which can not be positioned by Boolean operation to obtain the correct boundary;
(d) obtaining the serial number of the boundary paragraph in the prosecution book through the process, processing the prosecution book by using a rule, dividing the prosecution book into four parts, namely a suspect, a crime fact, an evidence set and a survey department opinion, mapping the content of the prosecution book with a prosecution book body, and instantiating the prosecution book body;
step 4, establishing a proving relation between the evidence and the case structural elements, analyzing the similarity between the proving object description and the structural elements by using a text matching technology, and judging whether the proving relation exists, wherein the method specifically comprises the following substeps:
(a) analyzing the text description of each evidence of the proof and quality certification outline about a proof object, analyzing the text description of four structural elements in a corresponding prosecution book, judging whether the analyzed evidence has a proof relation with a structure, manually designing a marking rule and a frame, carrying out a small amount of manual marking, and then enabling a third party to carry out manual verification to ensure the correctness of the marking;
(b) establishing a proving relation between the neural network model prediction evidence and the case structure elements, calculating the similarity between the proving object text description of the evidence and the case structure element text description through the neural network, and judging whether the proving relation exists or not by taking the relative size of the similarity as a reference;
(c) carrying out model training by using a remote supervision method, marking a small amount of high-quality data in the substep (a) of the step 4, and then carrying out data enhancement by using a remote supervision mode to realize the training of the model on a large data set, wherein parameters are continuously adjusted in the training process until an optimal model structure is stored;
(d) predicting the relationship between each group of text evidence sets and case structural elements by using the trained model, firstly extracting an evidence list and a case structural element list from a text group related to a specific case, then taking Cartesian products of elements in the two sets, calculating an evidence chain label between an evidence entity and the structural element by using the model, and finally adding a combination with a proving relationship into a triple set;
and 5, fusing knowledge of the high-similarity entity, calculating semantic mapping relations among different judicial text instances by using a neural network, and fusing the knowledge, wherein the method specifically comprises the following substeps:
(a) step 1 to step 4, a preliminary knowledge graph is built, but an entity with high similarity similar to household registration information and household registration certificate exists, the knowledge of the entity is expanded through remote supervision, then the attribute information of the entity and the information of a related entity are combined, and the three kinds of information are spliced to be used as vector expression of the entity;
(b) building a model to calculate the similarity between vector expressions of the entities, performing entity association in the horizontal direction to realize example data complementation, and if the similarity of the entities is higher than a threshold value, considering that the two entities describe the same information and performing entity linkage; if the similarity of the entities is low, entity linkage is not carried out, the two entities independently describe the respective information, the calculation process is described by formula (4),
sim=f(xexp;xattr;xadj) (4)
wherein x isexpRepresenting knowledge expressions, x, of entities in a third-party knowledge baseattrRepresenting an attribute expression, x, of an entityadjExpressing vector expression of related entities, f expressing a similarity calculation model, sim expressing a similarity value calculated by the model;
(c) performing knowledge fusion according to the similarity values obtained by calculation, firstly determining a central entity in an entity set which is linked with each other, then fusing the relationship and the attribute values of non-central entities to the central entity, and if a relationship or attribute conflict is detected in the fusion process, performing conflict resolution by adopting a voting-based method;
step 6, storing the knowledge graph, and storing the knowledge graph by using a graph database to improve the query efficiency, wherein the method specifically comprises the following substeps:
(a) the entities in the knowledge graph are regarded as nodes, the relations are regarded as edges with labels, the data of the knowledge graph obviously meet the graph model structure, the directed graph is used for modeling the data of the knowledge graph based on the storage method of the graph structure, and the data are represented and stored through the nodes, the edges and the attributes;
(b) and importing the automatically extracted relational data into a graph database in batch, storing the data into a csv structure, respectively defining a node file and a relational file of the csv structure, and importing the data by using a command carried by the graph database to complete automatic construction of the knowledge graph.
The invention has the beneficial effects that: an automatic knowledge graph construction method for evidence correlation analysis comprises the following steps: step 1, constructing an ontology to describe a knowledge graph, step 2, extracting case-related evidences, step 3, extracting case structural elements, step 4, establishing an evidence relation between the evidences and the case structural elements, step 5, fusing the knowledge graph of the high-similarity entity, and step 6, storing the knowledge graph. Compared with the prior art, the method has the advantages that the knowledge graph for storing and representing is lacked in the evidence field at present, the method is simple to operate, the high-quality graph can be constructed at low labor cost, and the evidence analysis efficiency is improved.
Drawings
FIG. 1 is a flow chart of the method steps of the present invention.
FIG. 2 is a diagram of an evidence ontology constructed by the present invention.
FIG. 3 is a representation of an evidence entity identification process of the present invention.
Detailed Description
The invention will be further explained with reference to the drawings.
As shown in fig. 1, an automatic knowledge graph construction method for evidence correlation analysis includes the following steps:
step 1, constructing an ontology to describe a knowledge graph, and constructing a high-quality ontology structure to organize and express related knowledge by document research, data lookup, and artificial design of concepts, attributes and constraints, wherein the method specifically comprises the following substeps:
(a) analyzing evidence description in the certificate-taking and quality certificate-submitting lines, dividing evidence concepts into eight subclasses according to the regulations on evidence in the Chinese people's republic of China criminal litigation law (2018 amendment), wherein the eight subclasses comprise material certificates, book certificates, testimony, appraisal opinions, statements of the victim, criminal suspects and the victim for providing and resolving, investigation, inspection, recognition and investigation of experimental notes, audio-visual data and electronic data, simultaneously completing the definition of attributes, and mapping evidence information in the certificate-taking and quality certificate-submitting lines into the evidence concepts;
(b) analyzing the structure of the prosecution book, dividing the concept of the prosecution book into four sub-concepts of a suspect, a criminal fact, an evidence set and a monitoring hospital opinion, wherein the first two parts are used as main evidence objects of the evidence, and simultaneously redefining a natural condition concept and analyzing the concept of the suspect in order to ensure that the evidence strength is more precise;
(c) defining the relationship between the concept of the natural condition and the concept of the suspect, wherein the natural condition is used for describing the suspect, so that the relationship is defined according to the presentation of a judicial text, the head entity of the relationship is restricted to be the natural condition, and the tail entity of the relationship is restricted to be the suspect;
(d) defining the relation between an evidence concept and a prosecution book concept, defining the proving relation after the two concepts of the evidence and the prosecution book are constructed, constraining a head entity of the proving relation as the evidence and a tail entity of the proving relation as the prosecution book, establishing the relation between the two concepts, and completing the construction of a knowledge graph body;
step 2, extracting case-involved evidences, extracting evidences presented in the outline of proof and quality certification by using a named entity recognition technology, and automatically determining the proof direction of the evidence entities by rules, wherein the method specifically comprises the following substeps:
(a) constructing an evidence entity identification data set, wherein the proof and quality evidence synopsis contains description of related evidence, marking the evidence entities in the synopsis in a manual and regular mode, and constructing a training data set of the model;
(b) establishing a neural network to perform named entity recognition, adopting a classic encoder-decoder framework to perform entity recognition, using a pre-training model with strong language representation capability for an encoder, adopting a feedforward neural network for a decoder, describing the calculation process through a formula (l) and a formula (2),
ht=PLM(xt) (1)
wherein PLM represents a Pre-trained Language Model adopted, the Language Model is trained by scientific research institutions to obtain a back open source, xtInput data representing time t, htThe encoded intermediate vector representing the input at time t,
yt=FFN(ht) (2)
where FFN represents a feed-forward neural network, different neural network structures are selected according to different inputs, ytEntity tags representing corresponding positions of the input sequence;
(c) training the neural network model by using the marked data, firstly cutting a data set, dividing the data set into a training set, a verification set and a test set according to a proportion, then inputting the training set data into the model, calculating the accuracy, the recall rate and the F value of the model, adjusting the training times, the learning rate and the network structure hyper-parameter according to the test result of the model to obtain a parameter combination when the model represents the best, recording the parameters, and storing the model;
(d) packaging the optimal model in the training process, performing text preprocessing on a new input text according to the same pre-trained word vector, serializing text language data into a text vector which can be calculated by the model to express, obtaining a corresponding label set through model prediction, processing the label sequence again through a specific rule method to determine an entity boundary to obtain an evidence entity, and simultaneously obtaining entity type information to determine the evidence type of the entity;
step 3, extracting case structural elements, analyzing case structures in the prosecution book by using a method of combining a neural network and rules, dividing the case structures into different structural elements, and specifically comprising the following substeps:
(a) analyzing the prosecution text in the data set, dividing the document structure according to the designed body, positioning and dividing paragraphs and keywords in the paragraphs, roughly cutting the text by using Boolean operation matched with the keywords, and realizing coarse-grained division of the text;
(b) aiming at the text which can not be segmented or has poor segmentation effect by Boolean operation, a neural network model is built to realize the target, each paragraph in the document is firstly serialized into a word vector by using a neural network method, then a logistic regression model is built to predict, whether the paragraph corresponding to each word vector is a boundary paragraph is judged, the calculation process is described by a formula (3),
labi=LR(NN(pari)) (3)
wherein pariRepresenting the text sequence of the ith paragraph in the document, NN representing the neural network method for serializing a paragraph of text into a word vector, LR representing the logistic regression model for determining whether the paragraph is a boundary paragraph, labiA label representing the ith paragraph, wherein a result of 1 indicates that the paragraph is a boundary paragraph and a result of 0 indicates a non-boundary paragraph;
(c) training the model and predicting on new text data, inputting a document with a correct Boolean operation result as label data into the model, iteratively training for multiple rounds, adjusting the number of network layers, the learning rate and the optimizer parameters until the model achieves the optimal effect, and then applying the model to the document with the boundary which can not be positioned by Boolean operation to obtain the correct boundary;
(d) obtaining the serial number of the boundary paragraph in the prosecution book through the process, processing the prosecution book by using a rule, dividing the prosecution book into four parts, namely a suspect, a crime fact, an evidence set and a survey department opinion, mapping the content of the prosecution book with a prosecution book body, and instantiating the prosecution book body;
step 4, establishing a proving relation between the evidence and the case structural elements, analyzing the similarity between the proving object description and the structural elements by using a text matching technology, and judging whether the proving relation exists, wherein the method specifically comprises the following substeps:
(a) analyzing the text description of each evidence of the proof and quality certification outline about a proof object, analyzing the text description of four structural elements in a corresponding prosecution book, judging whether the analyzed evidence has a proof relation with a structure, manually designing a marking rule and a frame, carrying out a small amount of manual marking, and then enabling a third party to carry out manual verification to ensure the correctness of the marking;
(b) establishing a proving relation between the neural network model prediction evidence and the case structure elements, calculating the similarity between the proving object text description of the evidence and the case structure element text description through the neural network, and judging whether the proving relation exists or not by taking the relative size of the similarity as a reference;
(c) carrying out model training by using a remote supervision method, marking a small amount of high-quality data in the substep (a) of the step 4, and then carrying out data enhancement by using a remote supervision mode to realize the training of the model on a large data set, wherein parameters are continuously adjusted in the training process until an optimal model structure is stored;
(d) predicting the relationship between each group of text evidence sets and case structural elements by using the trained model, firstly extracting an evidence list and a case structural element list from a text group related to a specific case, then taking Cartesian products of elements in the two sets, calculating an evidence chain label between an evidence entity and the structural element by using the model, and finally adding a combination with a proving relationship into a triple set;
and 5, fusing knowledge of the high-similarity entity, calculating semantic mapping relations among different judicial text instances by using a neural network, and fusing the knowledge, wherein the method specifically comprises the following substeps:
(a) step 1 to step 4, a preliminary knowledge graph is built, but an entity with high similarity similar to household registration information and household registration certificate exists, the knowledge of the entity is expanded through remote supervision, then the attribute information of the entity and the information of a related entity are combined, and the three kinds of information are spliced to be used as vector expression of the entity;
(b) building a model to calculate the similarity between vector expressions of the entities, performing entity association in the horizontal direction to realize example data complementation, and if the similarity of the entities is higher than a threshold value, considering that the two entities describe the same information and performing entity linkage; if the similarity of the entities is low, entity linkage is not carried out, the two entities independently describe the respective information, the calculation process is described by formula (4),
sim=f(xexp;xattr;xadj) (4)
wherein x isexpRepresenting knowledge expressions, x, of entities in a third-party knowledge baseattrRepresenting an attribute expression, x, of an entityadjExpressing vector expression of related entities, f expressing a similarity calculation model, sim expressing a similarity value calculated by the model;
(c) performing knowledge fusion according to the similarity values obtained by calculation, firstly determining a central entity in an entity set which is linked with each other, then fusing the relationship and the attribute values of non-central entities to the central entity, and if a relationship or attribute conflict is detected in the fusion process, performing conflict resolution by adopting a voting-based method;
step 6, storing the knowledge graph, and storing the knowledge graph by using a graph database to improve the query efficiency, wherein the method specifically comprises the following substeps:
(a) the entities in the knowledge graph are regarded as nodes, the relations are regarded as edges with labels, the data of the knowledge graph obviously meet the graph model structure, the directed graph is used for modeling the data of the knowledge graph based on the storage method of the graph structure, and the data are represented and stored through the nodes, the edges and the attributes;
(b) and importing the automatically extracted relational data into a graph database in batch, storing the data into a csv structure, respectively defining a node file and a relational file of the csv structure, and importing the data by using a command carried by the graph database to complete automatic construction of the knowledge graph.

Claims (1)

1. An automatic knowledge graph construction method for evidence correlation analysis is characterized by comprising the following steps:
step 1, constructing an ontology to describe a knowledge graph, and constructing a high-quality ontology structure to organize and express related knowledge by document research, data lookup, and artificial design of concepts, attributes and constraints, wherein the method specifically comprises the following substeps:
(a) analyzing evidence description in the certificate-taking and quality certificate-submitting lines, dividing evidence concepts into eight subclasses according to the regulations on evidence in the Chinese people's republic of China criminal litigation law (2018 amendment), wherein the eight subclasses comprise material certificates, book certificates, testimony, appraisal opinions, statements of the victim, criminal suspects and the victim for providing and resolving, investigation, inspection, recognition and investigation of experimental notes, audio-visual data and electronic data, simultaneously completing the definition of attributes, and mapping evidence information in the certificate-taking and quality certificate-submitting lines into the evidence concepts;
(b) analyzing the structure of the prosecution book, dividing the concept of the prosecution book into four sub-concepts of a suspect, a criminal fact, an evidence set and a monitoring hospital opinion, wherein the first two parts are used as main evidence objects of the evidence, and simultaneously redefining a natural condition concept and analyzing the concept of the suspect in order to ensure that the evidence strength is more precise;
(c) defining the relationship between the concept of the natural condition and the concept of the suspect, wherein the natural condition is used for describing the suspect, so that the relationship is defined according to the presentation of a judicial text, the head entity of the relationship is restricted to be the natural condition, and the tail entity of the relationship is restricted to be the suspect;
(d) defining the relation between an evidence concept and a prosecution book concept, defining the proving relation after the two concepts of the evidence and the prosecution book are constructed, constraining a head entity of the proving relation as the evidence and a tail entity of the proving relation as the prosecution book, establishing the relation between the two concepts, and completing the construction of a knowledge graph body;
step 2, extracting case-involved evidences, extracting evidences presented in the outline of proof and quality certification by using a named entity recognition technology, and automatically determining the proof direction of the evidence entities by rules, wherein the method specifically comprises the following substeps:
(a) constructing an evidence entity identification data set, wherein the proof and quality evidence synopsis contains description of related evidence, marking the evidence entities in the synopsis in a manual and regular mode, and constructing a training data set of the model;
(b) establishing a neural network to perform named entity recognition, adopting a classic encoder-decoder framework to perform entity recognition, using a pre-training model with strong language representation capability for an encoder, adopting a feedforward neural network for a decoder, describing the calculation process through a formula (l) and a formula (2),
ht=PLM(xt) (1)
wherein PLM represents a Pre-trained Language Model adopted, the Language Model is trained by scientific research institutions to obtain a back open source, xtInput data representing time t, htThe encoded intermediate vector representing the input at time t,
yt=FFN(ht) (2)
where FFN represents a feed-forward neural network, different neural network structures are selected according to different inputs, ytEntity tags representing corresponding positions of the input sequence;
(c) training the neural network model by using the marked data, firstly cutting a data set, dividing the data set into a training set, a verification set and a test set according to a proportion, then inputting the training set data into the model, calculating the accuracy, the recall rate and the F value of the model, adjusting the training times, the learning rate and the network structure hyper-parameter according to the test result of the model to obtain a parameter combination when the model represents the best, recording the parameters, and storing the model;
(d) packaging the optimal model in the training process, performing text preprocessing on a new input text according to the same pre-trained word vector, serializing text language data into a text vector which can be calculated by the model to express, obtaining a corresponding label set through model prediction, processing the label sequence again through a specific rule method to determine an entity boundary to obtain an evidence entity, and simultaneously obtaining entity type information to determine the evidence type of the entity;
step 3, extracting case structural elements, analyzing case structures in the prosecution book by using a method of combining a neural network and rules, dividing the case structures into different structural elements, and specifically comprising the following substeps:
(a) analyzing the prosecution text in the data set, dividing the document structure according to the designed body, positioning and dividing paragraphs and keywords in the paragraphs, roughly cutting the text by using Boolean operation matched with the keywords, and realizing coarse-grained division of the text;
(b) aiming at the text which can not be segmented or has poor segmentation effect by Boolean operation, a neural network model is built to realize the target, each paragraph in the document is firstly serialized into a word vector by using a neural network method, then a logistic regression model is built to predict, whether the paragraph corresponding to each word vector is a boundary paragraph is judged, the calculation process is described by a formula (3),
labi=LR(NN(pari)) (3)
wherein pariRepresenting the text sequence of the ith paragraph in the document, NN representing the neural network method for serializing a paragraph of text into a word vector, LR representing the logistic regression model for determining whether the paragraph is a boundary paragraph, labiA label representing the ith paragraph, wherein a result of 1 indicates that the paragraph is a boundary paragraph and a result of 0 indicates a non-boundary paragraph;
(c) training the model and predicting on new text data, inputting a document with a correct Boolean operation result as label data into the model, iteratively training for multiple rounds, adjusting the number of network layers, the learning rate and the optimizer parameters until the model achieves the optimal effect, and then applying the model to the document with the boundary which can not be positioned by Boolean operation to obtain the correct boundary;
(d) obtaining the serial number of the boundary paragraph in the prosecution book through the process, processing the prosecution book by using a rule, dividing the prosecution book into four parts, namely a suspect, a crime fact, an evidence set and a survey department opinion, mapping the content of the prosecution book with a prosecution book body, and instantiating the prosecution book body;
step 4, establishing a proving relation between the evidence and the case structural elements, analyzing the similarity between the proving object description and the structural elements by using a text matching technology, and judging whether the proving relation exists, wherein the method specifically comprises the following substeps:
(a) analyzing the text description of each evidence of the proof and quality certification outline about a proof object, analyzing the text description of four structural elements in a corresponding prosecution book, judging whether the analyzed evidence has a proof relation with a structure, manually designing a marking rule and a frame, carrying out a small amount of manual marking, and then enabling a third party to carry out manual verification to ensure the correctness of the marking;
(b) establishing a proving relation between the neural network model prediction evidence and the case structure elements, calculating the similarity between the proving object text description of the evidence and the case structure element text description through the neural network, and judging whether the proving relation exists or not by taking the relative size of the similarity as a reference;
(c) carrying out model training by using a remote supervision method, marking a small amount of high-quality data in the substep (a) of the step 4, and then carrying out data enhancement by using a remote supervision mode to realize the training of the model on a large data set, wherein parameters are continuously adjusted in the training process until an optimal model structure is stored;
(d) predicting the relationship between each group of text evidence sets and case structural elements by using the trained model, firstly extracting an evidence list and a case structural element list from a text group related to a specific case, then taking Cartesian products of elements in the two sets, calculating an evidence chain label between an evidence entity and the structural element by using the model, and finally adding a combination with a proving relationship into a triple set;
and 5, fusing knowledge of the high-similarity entity, calculating semantic mapping relations among different judicial text instances by using a neural network, and fusing the knowledge, wherein the method specifically comprises the following substeps:
(a) step 1 to step 4, a preliminary knowledge graph is built, but an entity with high similarity similar to household registration information and household registration certificate exists, the knowledge of the entity is expanded through remote supervision, then the attribute information of the entity and the information of a related entity are combined, and the three kinds of information are spliced to be used as vector expression of the entity;
(b) building a model to calculate the similarity between vector expressions of the entities, performing entity association in the horizontal direction to realize example data complementation, and if the similarity of the entities is higher than a threshold value, considering that the two entities describe the same information and performing entity linkage; if the similarity of the entities is low, entity linkage is not carried out, the two entities independently describe the respective information, the calculation process is described by formula (4),
sim=f(xexp;xattr;xadj) (4)
wherein x isexpRepresenting knowledge expressions, x, of entities in a third-party knowledge baseattrRepresenting an attribute expression, x, of an entityadjExpressing vector expression of related entities, f expressing a similarity calculation model, sim expressing a similarity value calculated by the model;
(c) performing knowledge fusion according to the similarity values obtained by calculation, firstly determining a central entity in an entity set which is linked with each other, then fusing the relationship and the attribute values of non-central entities to the central entity, and if a relationship or attribute conflict is detected in the fusion process, performing conflict resolution by adopting a voting-based method;
step 6, storing the knowledge graph, and storing the knowledge graph by using a graph database to improve the query efficiency, wherein the method specifically comprises the following substeps:
(a) the entities in the knowledge graph are regarded as nodes, the relations are regarded as edges with labels, the data of the knowledge graph obviously meet the graph model structure, the directed graph is used for modeling the data of the knowledge graph based on the storage method of the graph structure, and the data are represented and stored through the nodes, the edges and the attributes;
(b) and importing the automatically extracted relational data into a graph database in batch, storing the data into a csv structure, respectively defining a node file and a relational file of the csv structure, and importing the data by using a command carried by the graph database to complete automatic construction of the knowledge graph.
CN202011372006.3A 2020-11-30 2020-11-30 Knowledge graph automatic construction method for evidence correlation analysis Active CN112528036B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011372006.3A CN112528036B (en) 2020-11-30 2020-11-30 Knowledge graph automatic construction method for evidence correlation analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011372006.3A CN112528036B (en) 2020-11-30 2020-11-30 Knowledge graph automatic construction method for evidence correlation analysis

Publications (2)

Publication Number Publication Date
CN112528036A true CN112528036A (en) 2021-03-19
CN112528036B CN112528036B (en) 2021-09-07

Family

ID=74996482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011372006.3A Active CN112528036B (en) 2020-11-30 2020-11-30 Knowledge graph automatic construction method for evidence correlation analysis

Country Status (1)

Country Link
CN (1) CN112528036B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407688A (en) * 2021-06-15 2021-09-17 西安理工大学 Method for establishing knowledge graph-based survey standard intelligent question-answering system
CN113407678A (en) * 2021-06-30 2021-09-17 竹间智能科技(上海)有限公司 Knowledge graph construction method, device and equipment
CN114969384A (en) * 2022-08-02 2022-08-30 联通(四川)产业互联网有限公司 High-value judicial evidence chain acquisition and storage method and device and readable storage medium
CN115238688A (en) * 2022-08-15 2022-10-25 广州市刑事科学技术研究所 Electronic information data association relation analysis method, device, equipment and storage medium
CN116307566A (en) * 2023-03-12 2023-06-23 武汉大学 Dynamic design system for large-scale building construction project construction organization scheme
CN116431835A (en) * 2023-06-06 2023-07-14 中汽数据(天津)有限公司 Automatic knowledge graph construction method, equipment and medium in automobile authentication field
CN116542252A (en) * 2023-07-07 2023-08-04 北京营加品牌管理有限公司 Financial text checking method and system
CN116720786A (en) * 2023-08-01 2023-09-08 中国科学院工程热物理研究所 KG and PLM fusion assembly quality stability prediction method, system and medium
CN116737967A (en) * 2023-08-15 2023-09-12 中国标准化研究院 Knowledge graph construction and perfecting system and method based on natural language
CN117830060A (en) * 2024-03-04 2024-04-05 天津财经大学 Injury crime law enforcement supervision and auxiliary decision-making system based on knowledge graph

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009299A (en) * 2017-12-28 2018-05-08 北京市律典通科技有限公司 Law tries method and device for business processing
CN110457479A (en) * 2019-08-12 2019-11-15 贵州大学 A kind of judgement document's analysis method based on criminal offence chain
CN110837563A (en) * 2018-08-17 2020-02-25 阿里巴巴集团控股有限公司 Case judgment method, device and system
EP3620997A1 (en) * 2018-09-04 2020-03-11 Siemens Aktiengesellschaft Transfer learning of machine-learning models using knowledge graph database
CN111241837A (en) * 2020-01-04 2020-06-05 大连理工大学 Theft case legal document named entity identification method based on anti-migration learning
CN111651557A (en) * 2020-05-09 2020-09-11 清华大学深圳国际研究生院 Automatic text generation method and device and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009299A (en) * 2017-12-28 2018-05-08 北京市律典通科技有限公司 Law tries method and device for business processing
CN110837563A (en) * 2018-08-17 2020-02-25 阿里巴巴集团控股有限公司 Case judgment method, device and system
EP3620997A1 (en) * 2018-09-04 2020-03-11 Siemens Aktiengesellschaft Transfer learning of machine-learning models using knowledge graph database
CN110457479A (en) * 2019-08-12 2019-11-15 贵州大学 A kind of judgement document's analysis method based on criminal offence chain
CN111241837A (en) * 2020-01-04 2020-06-05 大连理工大学 Theft case legal document named entity identification method based on anti-migration learning
CN111651557A (en) * 2020-05-09 2020-09-11 清华大学深圳国际研究生院 Automatic text generation method and device and computer readable storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ERWIN FILTZ: "Building and Processing a Knowledge-Graph for Legal Data", 《EUROPEAN SEMANTIC WEB CONFERENCE》 *
洪文兴等: "面向司法案件的案情知识图谱自动构建", 《中文信息学报》 *
邹爱玲: "基于法律的知识图谱构建", 《万方数据》 *
陈彦光等: "基于刑事案例的知识图谱构建技术", 《郑州大学学报(理学版)》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407688A (en) * 2021-06-15 2021-09-17 西安理工大学 Method for establishing knowledge graph-based survey standard intelligent question-answering system
CN113407688B (en) * 2021-06-15 2022-09-16 西安理工大学 Method for establishing knowledge graph-based survey standard intelligent question-answering system
CN113407678A (en) * 2021-06-30 2021-09-17 竹间智能科技(上海)有限公司 Knowledge graph construction method, device and equipment
CN114969384A (en) * 2022-08-02 2022-08-30 联通(四川)产业互联网有限公司 High-value judicial evidence chain acquisition and storage method and device and readable storage medium
CN114969384B (en) * 2022-08-02 2022-10-21 联通(四川)产业互联网有限公司 High-value judicial evidence chain acquisition and storage method and device and readable storage medium
CN115238688B (en) * 2022-08-15 2023-08-01 广州市刑事科学技术研究所 Method, device, equipment and storage medium for analyzing association relation of electronic information data
CN115238688A (en) * 2022-08-15 2022-10-25 广州市刑事科学技术研究所 Electronic information data association relation analysis method, device, equipment and storage medium
CN116307566A (en) * 2023-03-12 2023-06-23 武汉大学 Dynamic design system for large-scale building construction project construction organization scheme
CN116307566B (en) * 2023-03-12 2024-05-10 武汉大学 Dynamic design system for large-scale building construction project construction organization scheme
CN116431835B (en) * 2023-06-06 2023-09-15 中汽数据(天津)有限公司 Automatic knowledge graph construction method, equipment and medium in automobile authentication field
CN116431835A (en) * 2023-06-06 2023-07-14 中汽数据(天津)有限公司 Automatic knowledge graph construction method, equipment and medium in automobile authentication field
CN116542252A (en) * 2023-07-07 2023-08-04 北京营加品牌管理有限公司 Financial text checking method and system
CN116542252B (en) * 2023-07-07 2023-09-29 北京营加品牌管理有限公司 Financial text checking method and system
CN116720786B (en) * 2023-08-01 2023-10-03 中国科学院工程热物理研究所 KG and PLM fusion assembly quality stability prediction method, system and medium
CN116720786A (en) * 2023-08-01 2023-09-08 中国科学院工程热物理研究所 KG and PLM fusion assembly quality stability prediction method, system and medium
CN116737967A (en) * 2023-08-15 2023-09-12 中国标准化研究院 Knowledge graph construction and perfecting system and method based on natural language
CN116737967B (en) * 2023-08-15 2023-11-21 中国标准化研究院 Knowledge graph construction and perfecting system and method based on natural language
CN117830060A (en) * 2024-03-04 2024-04-05 天津财经大学 Injury crime law enforcement supervision and auxiliary decision-making system based on knowledge graph
CN117830060B (en) * 2024-03-04 2024-05-28 天津财经大学 Injury crime law enforcement supervision and auxiliary decision-making system based on knowledge graph

Also Published As

Publication number Publication date
CN112528036B (en) 2021-09-07

Similar Documents

Publication Publication Date Title
CN112528036B (en) Knowledge graph automatic construction method for evidence correlation analysis
WO2021103492A1 (en) Risk prediction method and system for business operations
CN111737495B (en) Middle-high-end talent intelligent recommendation system and method based on domain self-classification
WO2021031383A1 (en) Intelligent auxiliary judgment method and apparatus, and computer device and storage medium
CN110674840B (en) Multi-party evidence association model construction method and evidence chain extraction method and device
CN112612902A (en) Knowledge graph construction method and device for power grid main device
CN111967761B (en) Knowledge graph-based monitoring and early warning method and device and electronic equipment
CN110675023B (en) Litigation request rationality prediction model training method based on neural network, and litigation request rationality prediction method and device based on neural network
CN113779272B (en) Knowledge graph-based data processing method, device, equipment and storage medium
CN103207855A (en) Fine-grained sentiment analysis system and method specific to product comment information
CN106991161A (en) A kind of method for automatically generating open-ended question answer
WO2020010834A1 (en) Faq question and answer library generalization method, apparatus, and device
CN111899089A (en) Enterprise risk early warning method and system based on knowledge graph
CN116992005B (en) Intelligent dialogue method, system and equipment based on large model and local knowledge base
CN111026880B (en) Joint learning-based judicial knowledge graph construction method
CN113239208A (en) Mark training model based on knowledge graph
CN114331122A (en) Key person risk level assessment method and related equipment
CN109241199A (en) A method of it is found towards financial knowledge mapping
CN116257759A (en) Structured data intelligent classification grading system of deep neural network model
Lai et al. Large language models in law: A survey
Liu et al. Research and citation analysis of data mining technology based on Bayes algorithm
CN116561264A (en) Knowledge graph-based intelligent question-answering system construction method
CN112613611A (en) Tax knowledge base system based on knowledge graph
Zhong et al. Construction project risk prediction model based on EW-FAHP and one dimensional convolution neural network
CN117252255B (en) Disaster emergency knowledge graph construction method oriented to auxiliary decision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant