WO2021164226A1 - 法律案件知识图谱查询方法、装置、设备及存储介质 - Google Patents

法律案件知识图谱查询方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021164226A1
WO2021164226A1 PCT/CN2020/111301 CN2020111301W WO2021164226A1 WO 2021164226 A1 WO2021164226 A1 WO 2021164226A1 CN 2020111301 W CN2020111301 W CN 2020111301W WO 2021164226 A1 WO2021164226 A1 WO 2021164226A1
Authority
WO
WIPO (PCT)
Prior art keywords
relationship
entity
legal
judgment
preset
Prior art date
Application number
PCT/CN2020/111301
Other languages
English (en)
French (fr)
Inventor
刘嘉伟
于修铭
汪伟
陈晨
李可
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021164226A1 publication Critical patent/WO2021164226A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Definitions

  • This application relates to the field of big data technology, and in particular to a method, device, equipment and storage medium for querying a knowledge graph of legal cases.
  • the main purpose of this application is to provide a method, device, equipment, and storage medium for querying a knowledge map of legal cases, aiming to solve the technical problem of how to construct a legal information database with a clear legal logic relationship to improve the efficiency of case query.
  • the first aspect of this application provides a method for querying a knowledge graph of legal cases, including: receiving a query request for legal case information initiated by a client; extracting query keywords in the query request; and according to the query Keywords, search for the target keyword entity object in the preset legal case knowledge graph library, and output the legal case information matching the target keyword entity object to the client; wherein, the legal case knowledge graph It is constructed by extracting the entity object and entity object relationship from the judgment document data by combining the legal principle data and the judgment manual data.
  • the second aspect of the present application provides a legal case knowledge graph query device, including a memory, a processor, and computer-readable instructions stored on the memory and running on the processor, and the processor executes the
  • the computer-readable instructions implement the following steps: receive a query request for legal case information initiated by the client; extract query keywords in the query request; search in a preset legal case knowledge graph database according to the query keywords Target keyword entity object, and output the legal case information matching the target keyword entity object to the client; wherein, the legal case knowledge graph compares the judgment document data by combining the legal principle and regulation data and the judgment manual data It is constructed after extracting entity objects and entity object relationships.
  • the third aspect of the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and when the computer instructions run on the computer, the computer executes the following steps: A query request for legal case information; extract the query keywords in the query request; according to the query keywords, search for the target keyword entity object in the preset legal case knowledge graph database, and compare it with the target keyword
  • the legal case information matched by the entity object is output to the client; wherein, the legal case knowledge graph is constructed by extracting the entity object and the entity-object relationship from the judgment document data by combining the legal principle and regulation data and the judgment manual data.
  • the fourth aspect of the application provides a legal case knowledge graph query device, including: a receiving module for receiving a query request for legal case information initiated by a client; an extraction module for extracting query keywords in the query request The retrieval module is used to retrieve the target keyword entity object in the preset legal case knowledge graph database according to the query keyword, and output the legal case information matching the target keyword entity object to the client The end; wherein, the legal case knowledge graph is constructed by extracting entity objects and entity object relationships from the judgment document data by combining the legal principles and regulations data and the judgment manual data.
  • This application uses the pre-built knowledge map of legal cases as the case trial database, and uses the knowledge map to sort out the various legal logical relationships of the pending cases.
  • the knowledge map of legal cases in this application is constructed to solve legal-related cases.
  • the legal case information in the knowledge map is completely constructed based on judgment documents, legal rules and regulations and judgment manuals, so the authenticity of the knowledge map is beyond doubt.
  • This application further processes the entity object and the entity relationship, so as to obtain the entity relationship of the big fact elements, small fact elements and small fact elements that can better reflect the case.
  • the knowledge graph of legal cases constructed from this can sort out the case more clearly The relationship between the characters, the relationship between evidence and facts in the complex cases, thereby reducing the complexity of the case trial, and improving the quality and efficiency of the trial work.
  • FIG. 1 is a schematic diagram of the structure of the operating environment of the legal case knowledge map query device involved in the embodiment of the application;
  • FIG. 2 is a schematic flowchart of a first embodiment of a method for querying a knowledge map of a legal case applied for;
  • FIG. 3 is a schematic flowchart of a second embodiment of a method for querying a knowledge map of a legal case applied for;
  • FIG. 4 is a detailed flowchart of an embodiment of step S240 in FIG. 3;
  • FIG. 5 is a schematic flowchart of an embodiment of step S2401 in FIG. 4;
  • FIG. 6 is a schematic flowchart of an embodiment of step S2402 in FIG. 4;
  • FIG. 7 is a schematic flowchart of an embodiment of step S2403 in FIG. 4;
  • FIG. 8 is a schematic diagram of functional modules of an embodiment of an apparatus for querying a knowledge graph of a legal case of the application.
  • the embodiment of the application provides a method, device, equipment and storage medium for querying a knowledge graph of legal cases, which are used to use a pre-built knowledge graph of legal cases as a case trial database, and use the knowledge graph to sort out various cases to be tried.
  • the legal logic relationship solves the technical problem of how to construct a legal information database with a clear legal logic relationship to improve the efficiency of case inquiry, reduces the complexity of case trial, and improves the quality and efficiency of case trial work.
  • This application provides a device for querying knowledge graphs of legal cases.
  • FIG. 1 is a schematic structural diagram of the operating environment of the legal case knowledge graph query device involved in the solution of the embodiment of the application.
  • Figure 1 is a schematic structural diagram of the operating environment of the legal case knowledge graph query device involved in the solution of the embodiment of the application.
  • the legal case knowledge graph query device includes a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory.
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
  • the hardware structure of the legal case knowledge graph query device shown in FIG. 1 does not constitute a limitation on the legal case knowledge graph query device, and may include more or less components than those shown in the figure, or a combination Certain components, or different component arrangements.
  • the memory 1005 which is a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and a computer program.
  • the operating system is a program that manages and controls the legal case knowledge graph query equipment and software resources, and supports the operation of other software and/or programs.
  • the network interface 1004 is mainly used to access the network; the user interface 1003 is mainly used to detect and confirm instructions and edit instructions.
  • the processor 1001 may be used to call a computer program stored in the memory 1005, and execute the operations of the following embodiments of the method for querying the knowledge graph of legal cases.
  • Fig. 2 is a schematic flowchart of a first embodiment of a method for querying a knowledge graph of a legal case of an application.
  • the method for querying the knowledge graph of legal cases includes the following steps:
  • Step S110 receiving a query request for legal case information initiated by the client
  • Step S120 extracting query keywords in the query request
  • Step S130 searching for the target keyword entity object in the preset legal case knowledge graph database according to the query keyword, and outputting the legal case information matching the target keyword entity object to the client;
  • the legal case knowledge graph is constructed by extracting entity objects and entity object relationships from the judgment document data in combination with the legal principles and regulations data and the judgment manual data.
  • the user when querying legal case information, the user can initiate a query request for legal case information through the client, and after receiving the query request in the background, extract the query keywords in the query request, and then use the extracted query keywords. Retrieve the pre-built knowledge map of legal cases, and determine the legal case information that needs to be queried through keyword matching and output it.
  • a pre-built knowledge map of legal cases is used as a case trial database, and various legal logical relationships of pending cases are sorted out through the knowledge map.
  • the knowledge map of legal cases in this embodiment is constructed for solving legal-related cases.
  • the legal case information in the knowledge map is constructed entirely based on judgment documents, legal principles and regulations, and judgment manuals, so the authenticity of the knowledge map is beyond doubt.
  • the knowledge map of legal cases proposed in this embodiment can clearly sort out the complex case details such as the relationship between persons, evidence and facts in the case, thereby reducing the complexity of case trial and improving the quality and efficiency of case trial work.
  • FIG. 3 is a schematic flowchart of a second embodiment of a method for querying a knowledge graph of a legal case of an application.
  • step S110 the following steps are included:
  • Step S210 Obtain the judgment document data
  • Step S220 based on the preset entity relationship labeling model, perform structured extraction on the judgment document data to obtain the preset target keyword entity object and the entity relationship of the target keyword entity object in the judgment document data;
  • the target keywords entity objects include: the plaintiff and the court, the plaintiff's evidence and the court's evidence, the plaintiff and the court, the focus of the dispute, the result of the court's judgment, the legal basis, and the reason for the court's judgment;
  • the judgment document data of the designated website by means of a web crawler.
  • the judgment document is used to record the process and results of the people's court hearing a case, including not only the situation of the case, but also the process and results of the judgment.
  • the crawling method there is no limit to the crawling method. It is preferable to use the Docker container as a medium to deploy the specified crawler program on multiple machines, so as to achieve multiple machines to crawl the specified content. It should be further explained that the data of different judgment documents are stored separately, for example, according to the name of the case.
  • Entity relationship is the basic data structure of the knowledge graph, specifically in the form of ⁇ head, relation, tail> triples, where head and tail are entities, and relation is the relationship between entities. Since the judgment document data is stored in text, structured data needs to be extracted from the text data.
  • the structured data includes ⁇ Jia Mouming, occupation, farmer>, ⁇ Jia Mouming, type, plaintiff>, ⁇ Jia Zaiming, lending relationship, Ying Mouyong>, ⁇ Ying Mouming, guarantee relationship, Yang Mouguang> and so on.
  • the content of labeling includes: the evidence provided by the plaintiff, the court, the plaintiff and the court, the plaintiff’s petition, the court’s defense, The dispute focus of the case, the judgment result of the court, the legal principle and the reason for the judgment of the court, etc., and then use the structured data of manual labeling as a training set to train and generate an entity relationship labeling model.
  • structured data in the form of triples can be automatically extracted from the collected judgment document data.
  • entity objects specifically include: plaintiff, lawyer, plaintiff evidence, lawyer evidence, and plaintiff The petition, the court's argument, the focus of the dispute, the result of the court's judgment, the basis of the law, and the reason for the court's judgment.
  • the plaintiff has corresponding entity relationships with the plaintiff’s evidence and the plaintiff’s petition
  • the court has corresponding entity relationships with the lawyer’s evidence and the court’s argument
  • the focus of the dispute has a corresponding entity relationship with the plaintiff’s evidence and the court’s argument.
  • Step S230 generating a big fact element according to the keyword entity object and determining the big fact element as a big fact element entity object;
  • the big fact element refers to important information when the law judges and confirms the focus of the corresponding dispute.
  • the big fact elements are preferably generated in the following manner:
  • each dispute focus is clustered first, so as to obtain the categories to which the multiple dispute focuses belong.
  • Clustering is the process of classifying data into different classes or clusters. Objects in the same cluster have great similarities, while objects between different clusters have great differences. This embodiment is not limited to the implementation of clustering, for example, using a hierarchical clustering algorithm for clustering.
  • the focal points of disputes in the corresponding cases are: whether the loan relationship is established, whether the loan form is reasonable, whether the contract is effective, whether the contract is valid, whether the contract is normally performed, whether the guarantee relationship is established, and whether the loan is a joint debt of husband and wife. category. Therefore, it is necessary to cluster all the dispute focus of the same case into seven categories, and after the clustering is completed and the seven dispute focus categories are obtained, each dispute focus category shall be one-to-one corresponding to the dispute focus of each case. In this embodiment, it is preferable to determine the dispute focus of the case corresponding to each dispute focus category by means of human-computer interaction.
  • the focus of disputes in a case refers to facts that have not been confirmed or denied. Therefore, it is necessary to further determine the focus of disputes in each case, specifically based on the evidence of the plaintiff, the evidence of the court, the petition of the plaintiff extracted from the judgment document.
  • the court argued that the results of the court’s judgment, the legal basis, the reasons for the court’s judgment, and the pre-set judgment rules to make judgments, and then generate a new entity object, that is, the big fact element.
  • This embodiment has no limitation on the judgment and confirmation method of the dispute focus of the case, for example, the judgment is made through human-computer interaction, or different judgment rules can be preset for different dispute focus of the case, for example, "whether the loan relationship is established" can be passed The regular pattern identifies the corresponding "borrowing subject and legal relationship". "Whether the loan is a joint debt of the husband and wife” can be determined by judging whether the information in the evidence relates to the loan of the husband and wife. If there is, it is determined that it is the joint debt of the husband and wife.
  • Step S240 Based on the preset rules, each major fact element is divided into a plurality of small fact elements, and the small fact elements are determined as the physical objects of the small fact elements, wherein the rules are based on legal data and judgments Manual data is pre-set;
  • the big fact element is usually a macro summary of a certain type of fact, which contains more detailed facts, that is, small fact elements, for example, "contract effective” involves the effective time, effective conditions, etc.; “borrowed The "guarantee relationship” involves the basic information of the guarantor, the relationship between the guarantor and the lender, etc.
  • the court usually sets up some principled judgment rules based on legal rules and judgment manuals, and the judgment rules specifically target a variety of cases related to the case. Facts in detail.
  • the determination of the big fact element "borrowing is the joint debt of husband and wife” requires comprehensive proof of detailed facts such as “marital relationship”, “the contract is signed by both husband and wife”, and “the validity of the loan contract”, which is also the big fact element.
  • “Loan is a joint debt of husband and wife” can be further divided into several small facts such as “marital relationship”, “the contract is signed by both husband and wife”, and "validity of the loan contract”.
  • the corresponding judgment rules for splitting the big fact elements are set, and the big fact elements corresponding to the dispute focus of each case are divided into multiple A small fact element.
  • the rules used for adjudicating cases are specifically set in advance based on the data of legal principles and regulations and the data of the adjudication manual.
  • Legal data refers to various legal articles, legal principles, and regulations
  • the judgment manual refers to the knowledge document formulated by the court to assist in the judgment of the case.
  • Step S250 Obtain the entity relationship of each small fact element entity object from the specified target keyword entity object;
  • Step S260 Construct the knowledge graph of the legal case according to the obtained entity objects and entity relationships.
  • a legal structure diagram based on a graph database that is, a knowledge graph of legal cases, is constructed, wherein the graph database preferably uses a Neo4j graph database.
  • a pre-built knowledge map of legal cases is used as a case trial database, and various legal logical relationships of pending cases are sorted out through the knowledge map.
  • the knowledge map of legal cases in this embodiment is constructed to solve legal-related cases, such as the cause of a private loan case.
  • the information constructed is entirely derived from judgment documents, legal rules and regulations, and judgment manuals, so the authenticity of the knowledge map is beyond doubt.
  • This application further processes the entity object and the entity relationship, so as to obtain the entity relationship of the big fact elements, small fact elements and small fact elements that can better reflect the case.
  • the knowledge graph of legal cases constructed from this can sort out the case more clearly The relationship between the characters, the relationship between evidence and facts in the complex cases, thereby reducing the complexity of the case trial, and improving the quality and efficiency of the trial work.
  • FIG. 4 is a detailed flowchart of an embodiment of step S250 in FIG. 3.
  • the above step S250 further includes:
  • Step S2501 Perform entity relationship extraction on the reasons for the court’s judgment to obtain multiple entity relationship triples, wherein the entity relationship triples contain small fact elements that are different from the plaintiff’s evidence and the court’s evidence. Entity relationship between;
  • the reason for the court's judgment includes judgment evidence, judgment facts, and judgment relationship. Therefore, through entity relationship extraction, a triad of ⁇ judgment evidence, judgment relationship, judgment fact> can be obtained.
  • Step S2502 clustering the relationships in the entity relationship triples to obtain a relationship hierarchy structure matrix
  • the relationships between entities in the triples are further clustered to obtain a relationship hierarchy matrix, where the relationship hierarchy structure matrix includes:
  • the relational clustering layer rc is the topmost layer in the relational hierarchy.
  • the relationship layer r' is the middle layer in the relationship hierarchy structure, which is composed of all relationships;
  • the relationship subclass layer rs is the lowest level in the relationship hierarchy.
  • Step S2503 According to the relationship hierarchy structure matrix, a preset entity relationship alignment algorithm is used to determine the entity relationship between each small fact element and the plaintiff evidence and the court evidence.
  • the entity relationship alignment algorithm includes: vectorizing the relationship and calculating the distance between the newly added relationship and other existing relationships. The closer the distance, the higher the similarity.
  • the threshold is set, and the similarity exceeds the Threshold relationships are merged, and if no relationship is similar, it is divided into a new relationship.
  • the embedding from the semantic space to the vector space can be obtained as:
  • calculate the vector distance between the newly added relationship and other relationships preferably calculate the cosine distance between the two vector relationships, and use this distance as the similarity to determine that the small fact elements correspond to the plaintiff’s evidence and the court’s evidence.
  • the entity relationship between the evidence the entity relationship specifically refers to a positive relationship or a negative relationship.
  • FIG. 5 is a schematic flowchart of an embodiment of step S2501 in FIG. 4.
  • the above step S2501 further includes:
  • step S101 the sentence and word segmentation are performed on the reason for the court's judgment, and the word sequence corresponding to each sentence is obtained;
  • each single sentence is further divided into a word sequence through a word segmentation operation, and the word sequence contains multiple sequentially arranged words.
  • word segmentation method uses the jieba word segmentation method to divide a sentence into multiple sequentially arranged words.
  • Step S102 using a preset combined part-of-speech tagger to perform part-of-speech tagging on each word sequence to obtain a part-of-speech tagging result of each word sequence;
  • the part-of-speech tagger is a part-of-speech tagging tool that can be used to process a word sequence and attach a part-of-speech tag to each word.
  • a part-of-speech tagger based on a hidden Markov model is used to mark the word sequence
  • a part-of-speech tagger based on a neural network algorithm is used to mark the word sequence.
  • the combined part-of-speech tagger used in this embodiment includes multiple part-of-speech taggers, such as a regular expression tagger, a bigram tagger, a unigram tagger, and so on. For example, first try to use the regular expression tagger to tag the word sequence. If the regular expression tagger cannot find a tag, try to use the bigram tagger to tag the word sequence, and so on, to get the part of speech of each word sequence Mark the result.
  • a regular expression tagger such as a regular expression tagger, a bigram tagger, a unigram tagger, and so on.
  • Step S103 according to the part-of-speech tagging result and the preset dependency tagging table, identify the dependency relationship between each word in each word sequence;
  • the part-of-speech corresponding to each word is obtained, and then according to the preset dependency tagging table, the dependency relationship between the words in each word sequence is further identified .
  • the dependence between words is mainly reflected in the grammatical relationship between words. For example, if Zhang San/Like/Run, the part of speech is marked as name+modal verb+name, the corresponding grammatical relationship is: subject-predicate-object relationship; Zhang San/working in/school, the part of speech is marked as name+preposition+name, then the corresponding grammatical relationship is: The grammatical relationship is: the preposition-object relationship.
  • the grammatical relationship in a sentence also has dependency relationships such as pre-object, definite-center relationship, inter-object relationship, preposition-object relationship, parallel relationship, verb-object relationship, subject-predicate relationship, core relationship, etc., so as long as it is based on part-of-speech tagging and dependency tagging table , You can identify the dependencies between the words in the sentence.
  • the word part of speech corresponding to the sentence "Hotel General Manager Zhang San” has the structure of "noun+noun+noun”. According to the dependency tagging table, the corresponding combination can be found as a definite middle relationship.
  • the core noun of "Hotel General Manager Zhang San” is "Zhang San”, and "Hotel” and “General Manager” modify "Zhang San”. Therefore, the dependency relationship of these three words is marked as ATT (that is, the relationship between fixed and Chinese). ).
  • Step S104 based on the dependency relationship between the words in each word sequence, construct a corresponding syntax analysis tree
  • the dependency relationship between each word in a sentence can be expressed by constructing a syntax analysis tree. Building a syntactic parse tree usually includes the following four sets:
  • a limited collection of non-terminal grammar identifiers that is, a collection of non-leaf nodes on the syntax analysis tree.
  • the start identifier that is, the position corresponding to the start node of the syntactic analysis tree
  • a limited set of terminal identifiers that is, the set of all words in the sentence, are located in the leaf nodes of the syntax analysis tree, where the leaf nodes of the syntax analysis tree are allowed to be empty;
  • a syntax analysis tree corresponding to a word sequence (ie, a sentence) can be constructed, in which there is a dependency relationship between the child leaf node and the parent leaf node.
  • Step S105 traversing the syntax analysis tree, and identifying the core words in the syntax analysis tree and the subject and object corresponding to the core words based on preset Chinese grammar rules;
  • the traversal starts from the root node, and during the traversal process, based on the preset Chinese grammar rules, the core words and core words in the syntax analysis tree are identified Corresponding subject and object.
  • the relation words are not pre-set categories, but exist in the current sentence.
  • the pre-defined relationship of the sentence may be "Zhang San: founder”, and the word “founder” does not exist in the sentence, but in the sentence There is a similar word “founding”. Therefore, in the syntactic analysis, we can extract the core word “Chuang”, which has a noun “Guangzhou” in front of the word, and a preposition "Zai” in front of "Guangzhou”. Therefore, "ZaiGuang” is a prepositional phrase with a dependency relationship.
  • Step S106 using the core word as the entity relationship and the subject and object corresponding to the core word as the named entity object, construct an entity relationship triplet, wherein the entity relationship triplet is used to describe the reason for the court decision Named entity objects and the entity relationships between named entity objects.
  • the core word extracted is the entity relationship
  • the subject and object corresponding to the core word are named entity objects to construct entity relationship
  • three Tuples are used to describe each named entity object and the entity relationship between each named entity object in the court's judgment.
  • the reason for the judgment of the court specifically includes the judgment facts and judgment evidence, as well as the relationship between fact and evidence, such as affirmative relationship or negative relationship.
  • the entity relationship triples constructed through the above embodiments include relationships between small fact elements and various evidences.
  • FIG. 6 is a schematic flowchart of an embodiment of step S2502 in FIG. 4.
  • the above step S2502 further includes:
  • Step S201 Perform relation vector conversion on the data in each entity relation triplet through a preset vector conversion model to obtain a relation vector;
  • the basic data structure of the knowledge graph is mainly entity relationship triples.
  • a triple h is the subject, t is the object, and r is the relationship.
  • the triple New York, belongs to, the United States
  • New York is the subject
  • the United States is the object, which belongs to the relationship.
  • a triple is an intuitive data structure.
  • the subject and the object are collectively called entities, and the relationship has irreversible properties.
  • the subject and the object cannot be interchanged.
  • the triple data is converted through the preset vector conversion model to obtain the relational vector.
  • the preset vector conversion model can convert character-type triple data into vector-type triple data, that is, convert character-type relational data into relational vectors Convert character type body data into body vector Convert character object data into object vector
  • TransE translating embedding
  • Step S202 clustering all relation vectors and all relation vectors of each relation respectively through a preset clustering algorithm, and correspondingly obtaining relation clustering vectors and relation sub-vectors of each relation;
  • the meaning of different relationships may be the same. For example, "country” and “nationality” have the same meaning. Therefore, it is necessary to classify different relationships that represent the same meaning into one category, and set the relationship aggregation of the same type of relationship.
  • Class vector Among the relations of the same kind, the Euclidean distance between the relation vector of each relation and the relation cluster vector of this class is the smallest.
  • Step S203 Construct a relationship hierarchy structure matrix based on the relationship vector and the corresponding relationship cluster vector and relationship sub-vector;
  • the relationship hierarchy structure matrix is composed of a top-level relationship clustering layer, a middle-level relationship layer, and a bottom-level relationship subclass layer.
  • the relationship vector is formed, and the relationship sub-class layer is formed by all the relationship sub-vectors of each relationship.
  • the TransE model assumes that th ⁇ r, then each triple can be defined Where t and h both come from the Embedding layer in the TransE model. For each relationship, collect all its corresponding Then use the K-means algorithm for all Perform clustering and divide it into sub-categories of the relationship. All relationship sub-categories constitute the lowest level in the relationship hierarchy, that is, the relationship sub-category level r s .
  • FIG. 7 is a schematic flowchart of an embodiment of step S2503 in FIG. 4.
  • the above step S2503 further includes:
  • Step S301 Calculate the relationship similarity between any two relationships in all entity relationship triples according to the relationship vector, the relationship cluster vector and the relationship sub-vector of each relationship in the relationship hierarchy structure matrix;
  • Combining relationship vector, relationship clustering vector and relationship sub-vector to calculate the relationship similarity can use the hierarchical structure information of the relationship to more accurately determine whether the meaning of the relationship in each triplet is the same, which is beneficial to classify the relationship and improve Identify the accuracy of the relationship.
  • the vector distance is preferably used to calculate the relationship similarity, and the specific implementation manner is as follows:
  • Step S302 taking any one of all entity relationship triples as a reference relationship for similarity comparison, and sequentially determining whether the relationship similarity between other relationships and the reference relationship exceeds a preset threshold;
  • Step S303 if yes, determine that the current comparison relationship is similar to the reference relationship and perform category merging; otherwise, use the current comparison relationship as the new category relationship;
  • Step S304 randomly select a relationship from the remaining unmatched relationships as the new reference relationship to continue the relationship similarity comparison until the comparison between all the relationships is completed.
  • the upper and lower hierarchical structure of the relationship is constructed through the method of clustering, and the information in the hierarchical structure is fully utilized to vectorize the relationship, and the similarity between the relationships is measured by calculating the distance between the newly added relationship and other existing relationships . The closer the distance, the higher the similarity.
  • the similarity threshold is set. If the similarity of a newly added relationship exceeds the threshold, it will be merged. If the newly added relationship is not similar to any one, it will be divided into A new relationship.
  • the substantive relationship between the small fact elements and the plaintiff's evidence and the court's evidence is a positive relationship or a negative relationship.
  • relationship alignment technology the relationship between evidence and small fact elements can be extracted, where the relationship has positive (or negative) and positive reasons (or negative reasons).
  • the extracted triples are: [Loan contract-[There must be a loan ⁇ reason: signature ⁇ ] -> Whether the loan contract is signed or appended by the borrower], where the [Loan contract] is evidence, [Is the loan contract Signed or appended by the borrower] is a small fact element, [definitely borrowed ⁇ reason: with signature ⁇ ] is the relationship between the evidence and the small fact element, which means that the physical relationship between the small fact element and the plaintiff’s evidence is a positive relationship .
  • This application also provides a device for querying knowledge graphs of legal cases.
  • FIG. 8 is a schematic diagram of functional modules of an embodiment of an apparatus for querying a knowledge graph of legal cases of the application.
  • the legal case knowledge graph query device includes:
  • the receiving module 10 is used to receive a query request for legal case information initiated by the client;
  • the extraction module 20 is used to extract the query keywords in the query request
  • the retrieval module 30 is configured to retrieve the target keyword entity object in the preset legal case knowledge graph database according to the query keyword, and output the legal case information matching the target keyword entity object to the client end;
  • the legal case knowledge graph is constructed by extracting entity objects and entity object relationships from the judgment document data in combination with the legal principles and regulations data and the judgment manual data.
  • the present application also provides a legal case knowledge graph query device, including: a memory and at least one processor, the memory stores instructions, the memory and the at least one processor are interconnected by wires; the at least one processor The device invokes the instructions in the memory, so that the legal case knowledge graph query device executes the steps in the above legal case knowledge graph query method.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer executes the following steps:
  • search for the target keyword entity object in the preset legal case knowledge graph library search for the target keyword entity object in the preset legal case knowledge graph library, and output the legal case information matching the target keyword entity object to the client;
  • the legal case knowledge graph is constructed by extracting entity objects and entity object relationships from the judgment document data in combination with the legal principles and regulations data and the judgment manual data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请公开了一种法律案件知识图谱查询方法,包括:接收客户端发起的法律案件信息的查询请求;提取所述查询请求中的查询关键词;根据所述查询关键词,在预置的法律案件知识图谱库中检索目标关键词实体对象,并将与所述目标关键词实体对象匹配的法律案件信息输出至所述客户端;其中,所述法律案件知识图谱通过结合法理法规数据以及判案手册数据对裁判文书数据进行实体对象及实体对象关系抽取后构建得到。本申请还公开了一种法律案件知识图谱查询装置、设备及计算机可读存储介质。本申请能够清晰梳理出案件中的各种关系,降低了案件审理的复杂度,提升了案件审判的工作质量和效率。

Description

法律案件知识图谱查询方法、装置、设备及存储介质
本申请要求于2020年2月20日提交中国专利局、申请号为202010103656.1、发明名称为“法律案件知识图谱查询方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及大数据技术领域,尤其涉及一种法律案件知识图谱查询方法、装置、设备及存储介质。
背景技术
随着社会与经济的快速发展,各类民事刑事纠纷案件也越来越多,各类案件处理涉及大量的法律知识,因而对于法律知识的智能化应用需求也越来越强烈。
发明人意识到,法律知识体系非常复杂,是多种逻辑的结合,传统的法律智能化方法并不实用。例如,传统法律信息数据库通过都是存储未经加工处理的原始案件信息,比如直接存储整篇裁判文书,然后直接基于关键字对整篇裁判文书进行查询,查询到的信息比较零散,且内容上并不具有很好的逻辑关系,因而查询效率和准确性都不能满足用户实际需求;基于传统自然语言处理技术的方法,由于法律领域是十分垂直的领域,准确性面临着极大的挑战,同时,自然语言处理技术无法对自身产生的结果进行解释,因而也让其在严肃的法律领域无法让人信服。也即传统针对法律信息数据库的智能化处理的效果较差、法律逻辑关系不够清晰,因而不能较好地提升案件查询效率。
发明内容
本申请的主要目的在于提供一种法律案件知识图谱查询方法、装置、设备及存储介质,旨在解决如何构建出法律逻辑关系清晰的法律信息数据库以提升案件查询效率的技术问题。
为实现上述目的,本申请第一方面提供了一种法律案件知识图谱查询方法,包括:接收客户端发起的法律案件信息的查询请求;提取所述查询请求中的查询关键词;根据所述查询关键词,在预置的法律案件知识图谱库中检索目标关键词实体对象,并将与所述目标关键词实体对象匹配的法律案件信息输出至所述客户端;其中,所述法律案件知识图谱通过结合法理法规数据以及判案手册数据对裁判文书数据进行实体对象及实体对象关系抽取后构建得到的。
本申请第二方面提供了一种法律案件知识图谱查询设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:接收客户端发起的法律案件信息的查询请求;提取所述查询请求中的查询关键词;根据所述查询关键词,在预置的法律案件知识图谱库中检索目标关键词实体对象,并将与所述目标关键词实体对象匹配的法律案件信息输出至所述客户端;其中,所述法律案件知识图谱通过结合法理法规数据以及判案手册数据对裁判文书数据进行实体对象及实体对象关系抽取后构建得到的。
本申请第三方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:接收客户端发起的法律案件信息的查询请求;提取所述查询请求中的查询关键词;根据所述查询关键词,在预置的法律案件知识图谱库中检索目标关键词实体对象,并将与所述目标关键词实体对象匹配的法律案件信息输出至所述客户端;其中,所述法律案件知识图谱通过结合法理法规数据以及判案手册数据对裁判文书数据进行实体对象及实体对象关系抽取后构建得到的。
本申请第四方面提供了一种法律案件知识图谱查询装置,包括:接收模块,用于接收 客户端发起的法律案件信息的查询请求;提取模块,用于提取所述查询请求中的查询关键词;检索模块,用于根据所述查询关键词,在预置的法律案件知识图谱库中检索目标关键词实体对象,并将与所述目标关键词实体对象匹配的法律案件信息输出至所述客户端;其中,所述法律案件知识图谱通过结合法理法规数据以及判案手册数据对裁判文书数据进行实体对象及实体对象关系抽取后构建得到。
本申请以预先构建的法律案件的知识图谱为案件审理数据库,通过知识图谱以梳理出待审理案件的各种法律逻辑关系。当查询法律案件信息时,只需输入要查询的关键字,通过检索本申请构建的法律案件的知识图谱,即可输出相应的法律案件信息。本申请的法律案件知识图谱是为解决法律相关案件而构建的,知识图谱中的法律案件信息完全根据裁判文书、法理法规及判案手册进行构建,因此知识图谱的真实性上无容置疑。本申请通过对实体对象与实体关系的进一步处理,从而获得更能反映案情的大事实要素、小事实要素以及小事实要素的实体关系,由此构建的法律案件知识图谱能够更加清晰地梳理出案件中的人物关系、证据与事实关系等复杂案情,从而降低了案件审理的复杂度,提升了案件审判的工作质量和效率。
附图说明
图1为本申请实施例方案涉及的法律案件知识图谱查询设备运行环境的结构示意图;
图2为本申请法律案件知识图谱查询方法第一实施例的流程示意图;
图3为本申请法律案件知识图谱查询方法第二实施例的流程示意图;
图4为图3中步骤S240一实施例的细化流程示意图;
图5为图4中步骤S2401一实施例的流程示意图;
图6为图4中步骤S2402一实施例的流程示意图;
图7为图4中步骤S2403一实施例的流程示意图;
图8为本申请法律案件知识图谱查询装置一实施例的功能模块示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
本申请实施例提供了一种法律案件知识图谱查询方法、装置、设备及存储介质,用于以预先构建的法律案件的知识图谱为案件审理数据库,通过知识图谱以梳理出待审理案件的各种法律逻辑关系,解决了如何构建出法律逻辑关系清晰的法律信息数据库以提升案件查询效率的技术问题,降低了案件审理的复杂度,提升了案件审判的工作质量和效率。
应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
本申请提供一种法律案件知识图谱查询设备。
参照图1,图1为本申请实施例方案涉及的法律案件知识图谱查询设备运行环境的结构示意图。图1为本申请实施例方案涉及的法律案件知识图谱查询设备运行环境的结构示意图。
如图1所示,该法律案件知识图谱查询设备包括:处理器1001,例如CPU,通信总线1002、用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图1中示出的法律案件知识图谱查询设备的硬件结构并不构成对法律案件知识图谱查询设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机可读存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及计算机程序。其中,操作系统是管理和控制法律案件知识图谱查询设备和软件资源的程序,支持其它软件和/或程序的运行。
在图1所示的法律案件知识图谱查询设备的硬件结构中,网络接口1004主要用于接入网络;用户接口1003主要用于侦测确认指令和编辑指令等。而处理器1001可以用于调用存储器1005中存储的计算机程序,并执行以下法律案件知识图谱查询方法的各实施例的操作。
基于上述法律案件知识图谱查询设备硬件结构,提出本申请法律案件知识图谱查询方法的各个实施例。
参照图2,图2为本申请法律案件知识图谱查询方法第一实施例的流程示意图。本实施例中,所述法律案件知识图谱查询方法包括以下步骤:
步骤S110,接收客户端发起的法律案件信息的查询请求;
步骤S120,提取所述查询请求中的查询关键词;
步骤S130,根据所述查询关键词,在预置的法律案件知识图谱库中检索目标关键词实体对象,并将与所述目标关键词实体对象匹配的法律案件信息输出至所述客户端;
其中,所述法律案件知识图谱通过结合法理法规数据以及判案手册数据对裁判文书数据进行实体对象及实体对象关系抽取后构建得到。
本实施例中,当进行法律案件信息查询时,用户可以通过客户端发起法律案件信息的查询请求,后台接收查询请求后,提取查询请求中的查询关键词,然后再使用提取的查询关键词,检索预先构建的法律案件知识图谱,通过关键词匹配的方式,确定需要查询的法律案件信息并输出。
本实施例以预先构建的法律案件的知识图谱为案件审理数据库,通过知识图谱以梳理出待审理案件的各种法律逻辑关系。当查询法律案件信息时,只需输入要查询的关键字,通过检索本申请构建的法律案件的知识图谱,即可输出相应的法律案件信息。本实施例中的法律案件知识图谱是为解决法律相关案件而构建的,知识图谱中的法律案件信息完全基于裁判文书、法理法规及判案手册构建,因此知识图谱的真实性上无容置疑。同时,本实施例提出的法律案件知识图谱能够清晰梳理出案件中的人物关系、证据与事实关系等复杂案情,进而可降低了案件审理的复杂度,提升案件审判的工作质量和效率。
参照图3,图3为本申请法律案件知识图谱查询方法第二实施例的流程示意图。本实施例中,在上述步骤S110之前,包括以下步骤:
步骤S210,获取裁判文书数据;
步骤S220,基于预置实体关系标注模型,对所述裁判文书数据进行结构化抽取,得到所述裁判文书数据中的预置目标关键词实体对象及目标关键词实体对象的实体关系;
其中,所述目标关键词实体对象包括:原告与被告、原告证据与被告证据、原告诉请与被告辩称、争议焦点、法院判定结果、法条依据以及法院判定原因;
本实施例优选通过网络爬虫方式采集指定网站的裁判文书数据。裁判文书用于记载人民法院审理案件的过程和结果,既包含了案件情况,也包含了判案过程和结果。
本实施例中,对于爬取方式不限。优选通过Docker容器作为媒介,来进行多机器部署指定的爬虫程序,实现多机器爬取指定的内容。需要进一步说明的是,不同裁判文书数据分别独立存储,比如按案件名称独立存储。
实体关系是知识图谱的基础数据结构,具体为<head,relation,tail>三元组形式,其中head和tail为实体,relation为实体间的关系。由于裁判文书数据都是以文本方式储存,因而需要从文本数据中提取出结构化数据。
例如,裁判文书中“原告:贾某明,农民”,“原告于2009年3月21日借给被告应某勇10万元款项,由被告杨某光担保”,则从上述内容中提取出的结构化数据包括<贾某明,职业,农民>,<贾某明,类型,原告>,<贾载明,借贷关系,应某勇>,<应某勇,担保关系,杨某光>等。
本实施例中,为实现裁判文书数据的自动化抽取,因此需要训练出相应的数学模型。比如,预先使用序列标注的方法对大量的裁判文书进行标注而获得训练用的结构化语料,标注的内容有:原告、被告、原告和被告提供的证据、原告的诉请、被告的辩称、案件的争议焦点、法院的判定结果、依据的法理法规、法院判定的原因等,然后利用人工标注的结构化数据作为训练集训练生成实体关系标注模型。
本实施例中,通过预置的实体关系标注模型可自动从采集到的裁判文书数据中抽取出三元组形式的结构化数据,实体对象具体包括:原告、被告、原告证据、被告证据、原告诉请、被告辩称、争议焦点、法院判定结果、法条依据、法院判定原因。其中,原告分别与原告证据、原告诉请存在相应的实体关系,被告分别与被告证据、被告辩称存在相应的实体关系,争议焦点分别与原告诉请、被告辩称存在相应的实体关系。
步骤S230,根据关键词实体对象,生成大事实要素并将所述大事实要素确定为大事实要素实体对象;
本实施例中,大事实要素是指法律对相应争议焦点进行判断确认时的重要信息。本实施例优选通过以下方式生成大事实要素:
(1)对所述争议焦点进行聚类,得到多个争议焦点类别,并确定各争议焦点类别各自对应的预置案件争议焦点,其中,所述案件争议焦点指未被确认的事实;
(2)根据所述原告证据与被告证据、所述原告诉请与被告辩称、所述法院判定结果、法条依据以及法院判定原因,对所述案件争议焦点对应的未被确认的事实进行确认,并将进行确认时所使用的信息作为大事实要素以及将所述大事实要素确定为大事实要素实体对象。
通常案件的争议焦点会有很多,基于案件类型的不同,案件对应的争议焦点亦不相同。因此,本实施例中先对各争议焦点进行聚类,从而得到多个争议焦点所属的类别。聚类是将数据分类到不同的类或者簇的过程,同一个簇中的对象具有很大的相似性,而不同簇间的对象则具有很大的差异性。本实施例对于聚类实现方式不限,比如使用分层聚类算法进行聚类。
通过聚类而将争议焦点划分为多个类别后,还需进一步确定各类别对应的案件争议焦点,也即相当于对争议焦点进行类别命名。通常,不同案件的案件争议焦点不同。比如民间借贷案件,其对应的案件争议焦点主要有:借贷关系是否成立、借贷形式是否合理、合同是否生效、合同是否有效、合同是否正常履行、担保关系是否成立、借款是否为夫妻共同债务七大类别。因此,需要将同一案件的所有争议焦点聚类为七个类别,并在完成聚类并得到七个争议焦点类别后,将各争议焦点类别分别与各案件争议焦点进行一一对应。本实施例优选通过人机交互的方式确定各争议焦点类别所对应的案件争议焦点。
本实施例中,案件争议焦点是指未被肯定或者未被否定的事实,因此,需要进一步对各案件争议焦点进行判断,具体基于从裁判文书中抽取的原告证据、被告证据、原告诉请、被告辩称、法院判定结果、法条依据、法院判定原因以及预置判断规则进行判断,进而生成新的实体对象,也即大事实要素。
例如,以民间借贷对应的案件争议焦点为例,通过判断所形成的大事实要素形式如下所示:
(1)【借贷关系是否成立—借贷主体及法律关系】,本案件争议焦点对应的大事实要素 为“借贷主体及法律关系”的内容,用于作为判断确认“借贷关系是否成立”的重要信息。
(2)【借贷形式是否合理—借贷形式及主要条款】,本案件争议焦点对应的大事实要素为“借贷形式及主要条款”的内容,用于作为判断确认“借贷形式是否合理”的重要信息。
(3)【合同是否生效—合同生效】,本案件争议焦点对应的大事实要素为“合同生效”,也即对“合同是否生效”进行了肯定确认。
(4)【合同是否有效—合同有效性】,本案件争议焦点对应的大事实要素为“合同有效性”的内容,用于作为判断确认“合同是否有效”的重要信息。
(5)【合同是否正常履行—合同的履行】,本案件争议焦点对应的大事实要素为“合同的履行”的内容,用于作为判断确认“合同是否正常履行”的重要信息。
(6)【担保关系是否成立—借贷的担保关系】,本案件争议焦点对应的大事实要素为“借贷的担保关系”的内容,用于作为判断确认“担保关系是否成立”的重要信息。
(7)【借款是否为夫妻共同债务—借款是夫妻共同债务】,本案件争议焦点对应的大事实要素为“借款是夫妻共同债务”,也即对“借款是否为夫妻共同债务”进行了肯定确认。
本实施例对于案件争议焦点的判断确认方式不限,例如通过人机交互方式进行判断,也可以是针对不同的案件争议焦点而预先设置不同的判断规则,比如,“借贷关系是否成立”可通过正则模式识别出对应的“借贷主体及法律关系”,“借款是否为夫妻共同债务”则可通过判断证据中是否有关夫妻二人借款的信息,若有,则确定是夫妻共同债务。
步骤S240,基于预置规则,将各大事实要素分别拆分为多个小事实要素,并将所述小事实要素确定为小事实要素实体对象,其中,所述规则根据法理法规数据以及判案手册数据预先设定;
本实施例中,通常大事实要素是对某类事实的宏观概述,包含有更多的细节事实,也即小事实要素,比如,“合同生效”涉及生效的时间、生效条件等;“借贷的担保关系”涉及担保人基本信息、担保人与借贷人关系等。而在判案过程中,为真实、准确、全面对案件进行审理,通常法院会根据法理法规及判案手册设置一些原则性的判案规则,并且该判案规则具体针对与案件相关的多种细节事实。例如,大事实要素“借款是夫妻共同债务”的确定,需要“夫妻关系”、“合同上有夫妻双方签名”、“借款合同的有效性”等细节事实的综合证明,也即大事实要素“借款是夫妻共同债务”可以进一步拆分为“夫妻关系”、“合同上有夫妻双方签名”、“借款合同的有效性”等多个小事实要素。
本实施例中,基于法理法规及法院判案手册中的判案原则,设置相应的用于拆分大事实要素的判案规则,将各案件争议焦点各自对应的大事实要素分别拆分为多个小事实要素。其中,用于判案的规则具体根据法理法规数据以及判案手册数据预先设定。法理法规数据指各种法条、法理、法规,而判案手册指法院制定的用于辅助判案的知识文件。
步骤S250,从指定的目标关键词实体对象中获取各小事实要素实体对象的实体关系;
通常,在法律案件中,法院判定原因会陈述大量的案件事实与相应证据,也即法院判定原因中存在小事实要素分别与原告证据、被告证据之间的实体关系,因此,本实施例优选以法院判定原因作为指定的目标关键词实体对象,以从中获取各小事实要素实体对象的实体关系。
步骤S260,根据获得的各实体对象及各实体关系,构建所述法律案件知识图谱。
本实施例中,具体基于获得的实体对象+不同实体对象之间的关系,构建基于图数据库的法律结构图,也即法律案件知识图谱,其中,图数据库优选使用Neo4j图数据库。
本实施例以预先构建的法律案件的知识图谱为案件审理数据库,通过知识图谱以梳理出待审理案件的各种法律逻辑关系。当查询法律案件信息时,只需输入要查询的关键字,通过检索本申请构建的法律案件的知识图谱,即可输出相应的法律案件信息。本实施例的 法律案件知识图谱是为解决法律相关案件而构建的,比如民间借贷案由,构建的信息完全来自裁判文书、法理法规及判案手册,因此知识图谱的真实性上无容置疑。本申请通过对实体对象与实体关系的进一步处理,从而获得更能反映案情的大事实要素、小事实要素以及小事实要素的实体关系,由此构建的法律案件知识图谱能够更加清晰地梳理出案件中的人物关系、证据与事实关系等复杂案情,从而降低了案件审理的复杂度,提升了案件审判的工作质量和效率。
参照图4,图4为图3中步骤S250一实施例的细化流程示意图。本实施例中,上述步骤S250进一步包括:
步骤S2501,对所述法院判定原因进行实体关系抽取,得到多个实体关系三元组,其中,所述实体关系三元组中包含有小事实要素分别与所述原告证据、所述被告证据之间的实体关系;
本实施例中,法院判定原因包含有判决证据、判决事实以及判定关系,因此通过实体关系抽取,可得到<判决证据,判定关系,判决事实>三元组。
步骤S2502,对所述各实体关系三元组中的关系进行聚类,得到关系层级结构矩阵;
本实施例中,进一步对三元组中各实体之间关系进行聚类,进而得到关系层级矩阵,其中,关系层级结构矩阵包括:
A、关系聚类层rc,是关系层级结构中的最顶层。
B、关系层r',是关系层级结构中的中间层,由所有的关系构成;
C、关系子类层rs,是关系层级结构中的最底层。
步骤S2503,根据所述关系层级结构矩阵,采用预置实体关系对齐算法,确定各小事实要素分别与所述原告证据、所述被告证据之间的实体关系。
本实施例中,实体关系对齐算法包括:对关系进行向量化,计算新加入的关系与其他已有关系之间距离,距离越近则代表相似度越高,设定阈值,与相似度超过该阈值的关系进行合并,如果没有任何一个关系相似,则划分为新的一个关系。
根据得到的关系层级矩阵,对于一个三元组(h,r,t)中的关系,可以得到它从语义空间到向量空间的映射embedding为:
r=r c+r′+r s
基于上述embedding过程,对新加入的关系与其他的关系计算向量距离,优选计算两向量关系之间的余弦距离,并将该距离作为相似度,进而确定各小事实要素分别对应与原告证据、被告证据之间的实体关系,该实体关系具体指肯定关系或否定关系。
参照图5,图5为图4中步骤S2501一实施例的流程示意图。本实施例中,上述步骤S2501进一步包括:
步骤S101,对所述法院判定原因进行分句与切词,得到每个语句对应的单词序列;
本实施例中,为更好地识别出命名实体对象以及实体关系,因此需要预先对法院判定原因数据进行分句,具体以逗号、句号为标识进行语句划分,进而将整篇文档内容划分为多个语句。同时还进一步通过切词操作将各单个语句切分为一个单词序列,该单词序列中包含有多个顺序排列的单词。例如使用jieba分词方式将一条语句划分为多个顺序排列的单词。
步骤S102,使用预置的组合词性标注器对各单词序列进行词性标注,得到各单词序列的词性标注结果;
词性标注器是一种词性标记工具,可用于处理一个词序列,以为每个词附加一个词性标记。例如,使用基于隐马尔可夫模型的词性标注器对词序列进行标记,或者使用基于神经网络算法的词性标注器对词序列进行标记。
本实施例中采用的组合词性标注器包含有多个词性标注器,比如正则表达式标注器、bigram标注器、unigram标注器等。例如,先尝试使用正则表达式标注器对词序列进行标记,如果正则表达式标注器无法找到一个标记,则尝试使用bigram标注器对词序列进行标记,以此类推,从而得到各单词序列的词性标注结果。
步骤S103,根据所述词性标注结果与预置依存标注表,标识各单词序列中各单词之间的依存关系;
本实施例中,对每个分句对应的单词序列进行词性标注后,得到每个单词对应的词性,然后再根据预置的依存标注表,进一步标识各单词序列中各单词之间的依存关系。其中,单词之间的依存关系主要体现在各单词之间的语法关系上。例如,张三/喜欢/跑步,词性标注为名称+情态动词+名称,则对应的语法关系为:主谓宾关系;张三/就职于/学校,词性标注为名称+介词+名称,则对应的语法关系为:介宾关系。
句子中的语法关系还有前置宾语、定中关系、间宾关系、介宾关系、并列关系、动宾关系、主谓关系、核心关系等依存关系,因此,只要基于词性标注和依存标注表,就可以标识出句子中各单词之间的依存关系。
例如句子“酒店总经理张三”对应标注的单词词性为“名词+名词+名词”结构,根据依存标注表可查对应的组合为定中关系,前一个名词作为定于修饰后一个名词,因此“酒店总经理张三”的核心名词因为“张三”,而“酒店”和“总经理”是修饰“张三”的,因此这三个词依存关系被标注为ATT(也即定中关系)。
步骤S104,基于各单词序列中各单词之间的依存关系,构建对应的句法分析树;
本实施例中,一个句子中各各单词之间的依存关系可以通过构建句法分析树的形式来进行表达。构建句法分析树通常包含有以下四个集合:
(1)有限的非终端语法标识的集合,即句法分析树上非叶子结点的集合。
(2)起始标识,即句法分析树的开始节点对应的位置;
(3)有限的终端标识集合,即语句中所有单词的集合,位于句法分析树的叶子结点,其中,句法分析树的叶子节点允许为空;
(4)构建句法树的有限个规则的集合,该规则表述了句法树的构建过程。
通过以上方式即可构建一个单词序列(也即一条语句)对应的句法分析树,其中,子叶子节点与父叶子节点之间存在依存关系。
步骤S105,遍历所述句法分析树,并基于预置的中文语法规则,识别所述句法分析树中的核心词以及所述核心词对应的主语和宾语;
本实施例中,在构建好整个语句对应的句法分析树后,从根节点开始进行遍历,并在遍历过程中,基于预置的中文语法规则,识别该句法分析树中的核心词以及核心词对应的主语和宾语。
在基于依存关系的实体关系抽取模型中,关系词并非是预先设置的类别,而是存在于当前的句子中。例如“张三在广州创办了一家酒店”,基于中文语法规则可知,该句子预定义的关系可能是“张三:创始人”,而“创始人”一词在句子中不存在,但是句中存在一个与其相似的词“创办”。因此在句法分析中,能够提取出核心词“创办”,该词前面有一个名词“广州”,而“广州”前面有一个介词“在”,因此“在广州”是一个介宾短语,依存关系被标记为POB(介宾关系),所以“广州”不是“创办”的主语,而是“张三”。“创办”一词后面是助词“了”可以省略,再往后则是名称“酒店”,因此“创办酒店”为动宾关系VOB。因此可分析得到该句的语义为“张三创办酒店”,核心词“创办”即为实体关系,“而张三”和“酒店”则是两个命名实体对象。
步骤S106,以所述核心词为实体关系、以所述核心词对应的主语和宾语为命名实体对 象,构建实体关系三元组,其中,所述实体关系三元组用于描述法院判定原因中命名实体对象以及命名实体对象之间的实体关系。
本实施例中,在识别出句子中的命名实体对象及其之间的实体关系之后,再提取的核心词为实体关系、以该核心词对应的主语和宾语为命名实体对象,构建实体关系三元组,以用于描述法院判定原因中各命名实体对象以及各命名实体对象之间的实体关系。
需要说明的是,法院判定原因具体包含有判定事实和判定证据,以及还包含有事实与证据之间的关系,比如肯定关系或否定关系。本实施例中,通过上述实施例构建的实体关系三元组中包含有小事实要素与各种证据之间的关系。
参照图6,图6为图4中步骤S2502一实施例的流程示意图。本实施例中,上述步骤S2502进一步包括:
步骤S201,通过预置向量转换模型对各实体关系三元组中数据进行关系向量转换,得到关系向量;
知识图谱的数据基本结构主要为实体关系三元组,在一个三元组(h,r,t)中,h为主体、t为客体、r为关系,例如三元组(纽约,属于,美国),纽约为主体,美国为客体,属于为关系。三元组是一种直观的数据结构,主体和客体统称为实体,关系有不可逆的属性,在一个三元组中,主体和客体不能互换。
通过预置的向量转换模型对三元组数据进行转换,得到关系向量。预置的向量转换模型可以将字符型的三元组数据转换成向量型的三元组数据,也即将字符型的关系数据转换成关系向量
Figure PCTCN2020111301-appb-000001
将字符型的主体数据转换成主体向量
Figure PCTCN2020111301-appb-000002
将字符型的客体数据转换成客体向量
Figure PCTCN2020111301-appb-000003
需要说明的是,预置的向量转换模型为转换嵌入(translating embedding,TransE)模型,该模型可以将三元组数据转换成向量形式,具体实现方式如下:
(1)将三元组中的主体和客体映射为低维度向量
Figure PCTCN2020111301-appb-000004
Figure PCTCN2020111301-appb-000005
(2)通过预置的损失函数调整
Figure PCTCN2020111301-appb-000006
Figure PCTCN2020111301-appb-000007
直至
Figure PCTCN2020111301-appb-000008
最小,其中
Figure PCTCN2020111301-appb-000009
(3)当
Figure PCTCN2020111301-appb-000010
最小时,将
Figure PCTCN2020111301-appb-000011
设置为向量型的三元组,
Figure PCTCN2020111301-appb-000012
为关系向量。
步骤S202,通过预置聚类算法分别对所有关系向量、每个关系的所有关系向量进行聚类,对应得到关系聚类向量以及每个关系的关系子向量;
在知识图谱中,不同关系的含义可能会相同,例如,“所属国家”和“国籍”具有相同的含义,因此需要将表示相同含义的不同关系分为一类,并设置同一类关系的关系聚类向量。在同类的关系中,每个关系的关系向量与该类的关系聚类向量之间的欧式距离最小。
步骤S203,基于关系向量以及对应的关系聚类向量、关系子向量,构建关系层级结构矩阵;
本实施例中,关系层级结构矩阵由顶层关系聚类层、中间层关系层以及底层关系子类层构成,其中,所述关系聚类层由所有关系聚类向量构成,所述关系层由所有关系向量构成,所述关系子类层由每个关系的所有关系子向量构成。
对于三元组(h,r,t),TransE模型假设t-h≈r,则对每一个三元组可定义
Figure PCTCN2020111301-appb-000013
其中t和h都来自TransE模型中的Embedding层。对每个关系,收集其对应的所有的
Figure PCTCN2020111301-appb-000014
然后采用K-means算法进行对所有的
Figure PCTCN2020111301-appb-000015
进行聚类,并将其划分为该关系的子类,所有关系的子类构成了关系层级结构中的最底层,也即关系子类层r s
参照图7,图7为图4中步骤S2503一实施例的流程示意图。本实施例中,上述步骤S2503进一步包括:
步骤S301,根据所述关系层级结构矩阵中的关系向量、关系聚类向量以及每个关系的关系子向量,计算所有实体关系三元组中任意两个关系之间的关系相似度;
结合关系向量、关系聚类向量和关系子向量来计算关系相似度,能够利用关系的层级结构信息,更准确的确定各个三元组中关系的意义是否相同,有利于对关系进行分类,提高了识别关系的准确性。
本实施例优选采用向量距离计算关系相似度,具体实现方式如下:
首先,通过预置公式得到每个三元组中关系的embedding为:
Figure PCTCN2020111301-appb-000016
其中,
Figure PCTCN2020111301-appb-000017
表示关系的embedding,
Figure PCTCN2020111301-appb-000018
表示关系聚类向量,
Figure PCTCN2020111301-appb-000019
表示初始的关系向量,
Figure PCTCN2020111301-appb-000020
表示关系子向量。
然后,通过以下余弦距离计算关系相似度,其中,
Figure PCTCN2020111301-appb-000021
Figure PCTCN2020111301-appb-000022
表示任意两个关系向量。
Figure PCTCN2020111301-appb-000023
步骤S302,以所有实体关系三元组中任意一个关系为相似性比对的基准关系,并依次判断其他关系与该基准关系之间的关系相似度是否超过预置阈值;
步骤S303,若是,则确定当前比对的关系与该基准关系相似并进行类别合并,否则将当前比对的关系作为新类别的关系;
步骤S304,从剩余未比对的关系中任意选择一个关系作为新的基准关系继续进行关系相似性比对,直至所有关系两两之间比对完成时结束。
通过聚类的方法构建了关系的上下层级结构,并充分利用了层级结构中的信息对关系进行向量化,通过计算新加入的关系与其他已有关系之间距离来衡量关系之间的相似度。距离越近则代表相似度越高,同时设定相似度阈值,若某新加入的关系的相似度超过该阈值则进行合并,如果新加入的关系不与任何一个关系相似,则将其划分为一个新的关系。
具体地,小事实要素分别与原告证据、被告证据之间的实体关系为肯定关系或否定关系。利用关系对齐技术,可将证据和小事实要素的关系抽取出来,此处的关系有肯定(或否定)以及肯定原因(或否定原因)。比如抽取出来的三元组为:【借款合同–[肯定有借款{原因:有签名}]->借款合同是否有借款人签名或追加】,这里的【借款合同】是证据,【借款合同是否有借款人签名或追加】是小事实要素,【肯定有借款{原因:有签名}】是证据和小事实要素的关系,也即可以得出小事实要素与原告证据自己的实体关系为肯定关系。
本申请还提供一种法律案件知识图查询装置。
参照图8,图8为本申请法律案件知识图谱查询装置一实施例的功能模块示意图。本实施例中,法律案件知识图谱查询装置包括:
接收模块10,用于接收客户端发起的法律案件信息的查询请求;
提取模块20,用于提取所述查询请求中的查询关键词;
检索模块30,用于根据所述查询关键词,在预置的法律案件知识图谱库中检索目标关键词实体对象,并将与所述目标关键词实体对象匹配的法律案件信息输出至所述客户端;
其中,所述法律案件知识图谱通过结合法理法规数据以及判案手册数据对裁判文书数据进行实体对象及实体对象关系抽取后构建得到。
基于与上述本申请法律案件知识图谱查询方法相同的实施例说明内容,因此本实施例对法律案件知识图谱查询装置的实施例内容不做过多赘述。
本申请还提供一种法律案件知识图谱查询设备,包括:存储器和至少一个处理器,所述存储器中存储有指令,所述存储器和所述至少一个处理器通过线路互连;所述至少一个处理器调用所述存储器中的所述指令,以使得所述法律案件知识图谱查询设备执行上述法律案件知识图谱查询方法中的步骤。
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,也可以为易失性计算机可读存储介质。计算机可读存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:
接收客户端发起的法律案件信息的查询请求;
提取所述查询请求中的查询关键词;
根据所述查询关键词,在预置的法律案件知识图谱库中检索目标关键词实体对象,并将与所述目标关键词实体对象匹配的法律案件信息输出至所述客户端;
其中,所述法律案件知识图谱通过结合法理法规数据以及判案手册数据对裁判文书数据进行实体对象及实体对象关系抽取后构建得到。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器或者网络设备等)执行本申请各个实施例所述的方法。

Claims (20)

  1. 一种法律案件知识图谱查询方法,其中,包括:
    接收客户端发起的法律案件信息的查询请求;
    提取所述查询请求中的查询关键词;
    根据所述查询关键词,在预置的法律案件知识图谱库中检索目标关键词实体对象,并将与所述目标关键词实体对象匹配的法律案件信息输出至所述客户端;
    其中,所述法律案件知识图谱通过结合法理法规数据以及判案手册数据对裁判文书数据进行实体对象及实体对象关系抽取后构建得到。
  2. 根据权利要求1所述的法律案件知识图谱查询方法,其中,在所述接收客户端发起的法律案件信息的查询请求的步骤之前,还包括:
    获取法律案件的裁判文书数据;
    基于预置实体关系标注模型,对所述裁判文书数据进行结构化抽取,得到所述裁判文书数据中的预置目标关键词实体对象及目标关键词实体对象的实体关系,其中,所述目标关键词实体对象包括:原告与被告、原告证据与被告证据、原告诉请与被告辩称、争议焦点、法院判定结果、法条依据以及法院判定原因;
    根据关键词实体对象,生成大事实要素并将所述大事实要素确定为大事实要素实体对象;
    基于预置规则,将各大事实要素分别拆分为多个小事实要素,并将所述小事实要素确定为小事实要素实体对象,其中,所述规则根据法理法规数据以及判案手册数据预先设定;
    从指定的目标关键词实体对象中获取各小事实要素实体对象的实体关系;
    根据获得的各实体对象及各实体关系,构建所述法律案件知识图谱。
  3. 根据权利要求2所述的法律案件知识图谱查询方法,其中,所述根据关键词实体对象,生成大事实要素并将所述大事实要素确定为大事实要素实体对象包括:
    对所述争议焦点进行聚类,得到多个争议焦点类别,并确定各争议焦点类别各自对应的预置案件争议焦点,其中,所述案件争议焦点指未被确认的事实;
    根据所述原告证据与被告证据、所述原告诉请与被告辩称、所述法院判定结果、法条依据以及法院判定原因,对所述案件争议焦点对应的未被确认的事实进行确认,并将进行确认时所使用的信息作为大事实要素以及将所述大事实要素确定为大事实要素实体对象。
  4. 根据权利要求2所述的法律案件知识图谱查询方法,其中,所述从指定的目标关键词实体对象中获取各小事实要素实体对象的实体关系包括:
    对所述法院判定原因进行实体关系抽取,得到多个实体关系三元组,其中,所述实体关系三元组中包含有小事实要素分别与所述原告证据、所述被告证据之间的实体关系;
    对所述各实体关系三元组中的关系进行聚类,得到关系层级结构矩阵;
    根据所述关系层级结构矩阵,采用预置实体关系对齐算法,确定各小事实要素分别与所述原告证据、所述被告证据之间的实体关系。
  5. 根据权利要求4所述的法律案件知识图谱查询方法,其中,所述对所述法院判定原因进行实体关系抽取,得到多个实体关系三元组包括:
    对所述法院判定原因进行分句与切词,得到每个语句对应的单词序列;
    使用预置的组合词性标注器对各单词序列进行词性标注,得到各单词序列的词性标注结果;
    根据所述词性标注结果与预置依存标注表,标识各单词序列中各单词之间的依存关系;
    基于各单词序列中各单词之间的依存关系,构建对应的句法分析树;
    遍历所述句法分析树,并基于预置的中文语法规则,识别所述句法分析树中的核心词 以及所述核心词对应的主语和宾语;
    以所述核心词为实体关系、以所述核心词对应的主语和宾语为命名实体对象,构建实体关系三元组,其中,所述实体关系三元组用于描述法院判定原因中命名实体对象以及命名实体对象之间的实体关系。
  6. 根据权利要求4所述的法律案件知识图谱查询方法,其中,所述对所述各实体关系三元组中的关系进行聚类,得到关系层级结构矩阵包括:
    通过预置向量转换模型对各实体关系三元组中数据进行关系向量转换,得到关系向量;
    通过预置聚类算法分别对所有关系向量、每个关系的所有关系向量进行聚类,对应得到关系聚类向量以及每个关系的关系子向量;
    基于关系向量以及对应的关系聚类向量、关系子向量,构建关系层级结构矩阵;
    其中,所述关系层级结构矩阵由顶层关系聚类层、中间层关系层以及底层关系子类层构成,其中,所述关系聚类层由所有关系聚类向量构成,所述关系层由所有关系向量构成,所述关系子类层由每个关系的所有关系子向量构成。
  7. 根据权利要求6所述的法律案件知识图谱查询方法,其中,所述根据所述关系层级结构矩阵,采用预置实体关系对齐算法,确定各小事实要素分别与所述原告证据、所述被告证据之间的实体关系包括:
    根据所述关系层级结构矩阵中的关系向量、关系聚类向量以及每个关系的关系子向量,计算所有实体关系三元组中任意两个关系之间的关系相似度;
    以所有实体关系三元组中任意一个关系为相似性比对的基准关系,并依次判断其他关系与该基准关系之间的关系相似度是否超过预置阈值;
    若是,则确定当前比对的关系与该基准关系相似并进行类别合并,否则将当前比对的关系作为新类别的关系;
    从剩余未比对的关系中任意选择一个关系作为新的基准关系继续进行关系相似性比对,直至所有关系两两之间比对完成时结束。
  8. 一种法律案件知识图谱查询设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
    接收客户端发起的法律案件信息的查询请求;
    提取所述查询请求中的查询关键词;
    根据所述查询关键词,在预置的法律案件知识图谱库中检索目标关键词实体对象,并将与所述目标关键词实体对象匹配的法律案件信息输出至所述客户端;
    其中,所述法律案件知识图谱通过结合法理法规数据以及判案手册数据对裁判文书数据进行实体对象及实体对象关系抽取后构建得到。
  9. 根据权利要求8所述的法律案件知识图谱查询设备,所述处理器执行所述计算机程序时还实现以下步骤:
    获取法律案件的裁判文书数据;
    基于预置实体关系标注模型,对所述裁判文书数据进行结构化抽取,得到所述裁判文书数据中的预置目标关键词实体对象及目标关键词实体对象的实体关系,其中,所述目标关键词实体对象包括:原告与被告、原告证据与被告证据、原告诉请与被告辩称、争议焦点、法院判定结果、法条依据以及法院判定原因;
    根据关键词实体对象,生成大事实要素并将所述大事实要素确定为大事实要素实体对象;
    基于预置规则,将各大事实要素分别拆分为多个小事实要素,并将所述小事实要素确 定为小事实要素实体对象,其中,所述规则根据法理法规数据以及判案手册数据预先设定;
    从指定的目标关键词实体对象中获取各小事实要素实体对象的实体关系;
    根据获得的各实体对象及各实体关系,构建所述法律案件知识图谱。
  10. 根据权利要求9所述的法律案件知识图谱查询设备,所述处理器执行所述计算机程序时还实现以下步骤:
    对所述争议焦点进行聚类,得到多个争议焦点类别,并确定各争议焦点类别各自对应的预置案件争议焦点,其中,所述案件争议焦点指未被确认的事实;
    根据所述原告证据与被告证据、所述原告诉请与被告辩称、所述法院判定结果、法条依据以及法院判定原因,对所述案件争议焦点对应的未被确认的事实进行确认,并将进行确认时所使用的信息作为大事实要素以及将所述大事实要素确定为大事实要素实体对象。
  11. 根据权利要求9所述的法律案件知识图谱查询设备,所述处理器执行所述计算机程序时还实现以下步骤:
    对所述法院判定原因进行实体关系抽取,得到多个实体关系三元组,其中,所述实体关系三元组中包含有小事实要素分别与所述原告证据、所述被告证据之间的实体关系;
    对所述各实体关系三元组中的关系进行聚类,得到关系层级结构矩阵;
    根据所述关系层级结构矩阵,采用预置实体关系对齐算法,确定各小事实要素分别与所述原告证据、所述被告证据之间的实体关系。
  12. 根据权利要求11所述的法律案件知识图谱查询设备,所述处理器执行所述计算机程序时还实现以下步骤:
    对所述法院判定原因进行分句与切词,得到每个语句对应的单词序列;
    使用预置的组合词性标注器对各单词序列进行词性标注,得到各单词序列的词性标注结果;
    根据所述词性标注结果与预置依存标注表,标识各单词序列中各单词之间的依存关系;
    基于各单词序列中各单词之间的依存关系,构建对应的句法分析树;
    遍历所述句法分析树,并基于预置的中文语法规则,识别所述句法分析树中的核心词以及所述核心词对应的主语和宾语;
    以所述核心词为实体关系、以所述核心词对应的主语和宾语为命名实体对象,构建实体关系三元组,其中,所述实体关系三元组用于描述法院判定原因中命名实体对象以及命名实体对象之间的实体关系。
  13. 根据权利要求11所述的法律案件知识图谱查询设备,所述处理器执行所述计算机程序时还实现以下步骤:
    通过预置向量转换模型对各实体关系三元组中数据进行关系向量转换,得到关系向量;
    通过预置聚类算法分别对所有关系向量、每个关系的所有关系向量进行聚类,对应得到关系聚类向量以及每个关系的关系子向量;
    基于关系向量以及对应的关系聚类向量、关系子向量,构建关系层级结构矩阵;
    其中,所述关系层级结构矩阵由顶层关系聚类层、中间层关系层以及底层关系子类层构成,其中,所述关系聚类层由所有关系聚类向量构成,所述关系层由所有关系向量构成,所述关系子类层由每个关系的所有关系子向量构成。
  14. 根据权利要求13所述的法律案件知识图谱查询设备,所述处理器执行所述计算机程序时还实现以下步骤:
    根据所述关系层级结构矩阵中的关系向量、关系聚类向量以及每个关系的关系子向量,计算所有实体关系三元组中任意两个关系之间的关系相似度;
    以所有实体关系三元组中任意一个关系为相似性比对的基准关系,并依次判断其他关 系与该基准关系之间的关系相似度是否超过预置阈值;
    若是,则确定当前比对的关系与该基准关系相似并进行类别合并,否则将当前比对的关系作为新类别的关系;
    从剩余未比对的关系中任意选择一个关系作为新的基准关系继续进行关系相似性比对,直至所有关系两两之间比对完成时结束。
  15. 一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:
    接收客户端发起的法律案件信息的查询请求;
    提取所述查询请求中的查询关键词;
    根据所述查询关键词,在预置的法律案件知识图谱库中检索目标关键词实体对象,并将与所述目标关键词实体对象匹配的法律案件信息输出至所述客户端;
    其中,所述法律案件知识图谱通过结合法理法规数据以及判案手册数据对裁判文书数据进行实体对象及实体对象关系抽取后构建得到。
  16. 根据权利要求15所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:
    获取法律案件的裁判文书数据;
    基于预置实体关系标注模型,对所述裁判文书数据进行结构化抽取,得到所述裁判文书数据中的预置目标关键词实体对象及目标关键词实体对象的实体关系,其中,所述目标关键词实体对象包括:原告与被告、原告证据与被告证据、原告诉请与被告辩称、争议焦点、法院判定结果、法条依据以及法院判定原因;
    根据关键词实体对象,生成大事实要素并将所述大事实要素确定为大事实要素实体对象;
    基于预置规则,将各大事实要素分别拆分为多个小事实要素,并将所述小事实要素确定为小事实要素实体对象,其中,所述规则根据法理法规数据以及判案手册数据预先设定;
    从指定的目标关键词实体对象中获取各小事实要素实体对象的实体关系;
    根据获得的各实体对象及各实体关系,构建所述法律案件知识图谱。
  17. 根据权利要求16所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:
    对所述争议焦点进行聚类,得到多个争议焦点类别,并确定各争议焦点类别各自对应的预置案件争议焦点,其中,所述案件争议焦点指未被确认的事实;
    根据所述原告证据与被告证据、所述原告诉请与被告辩称、所述法院判定结果、法条依据以及法院判定原因,对所述案件争议焦点对应的未被确认的事实进行确认,并将进行确认时所使用的信息作为大事实要素以及将所述大事实要素确定为大事实要素实体对象。
  18. 根据权利要求16所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:
    对所述法院判定原因进行实体关系抽取,得到多个实体关系三元组,其中,所述实体关系三元组中包含有小事实要素分别与所述原告证据、所述被告证据之间的实体关系;
    对所述各实体关系三元组中的关系进行聚类,得到关系层级结构矩阵;
    根据所述关系层级结构矩阵,采用预置实体关系对齐算法,确定各小事实要素分别与所述原告证据、所述被告证据之间的实体关系。
  19. 根据权利要求18所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:
    对所述法院判定原因进行分句与切词,得到每个语句对应的单词序列;
    使用预置的组合词性标注器对各单词序列进行词性标注,得到各单词序列的词性标注结果;
    根据所述词性标注结果与预置依存标注表,标识各单词序列中各单词之间的依存关系;
    基于各单词序列中各单词之间的依存关系,构建对应的句法分析树;
    遍历所述句法分析树,并基于预置的中文语法规则,识别所述句法分析树中的核心词以及所述核心词对应的主语和宾语;
    以所述核心词为实体关系、以所述核心词对应的主语和宾语为命名实体对象,构建实体关系三元组,其中,所述实体关系三元组用于描述法院判定原因中命名实体对象以及命名实体对象之间的实体关系。
  20. 一种法律案件知识图谱查询装置,其中,所述法律案件知识图谱查询包括:
    接收模块,用于接收客户端发起的法律案件信息的查询请求;
    提取模块,用于提取所述查询请求中的查询关键词;
    检索模块,用于根据所述查询关键词,在预置的法律案件知识图谱库中检索目标关键词实体对象,并将与所述目标关键词实体对象匹配的法律案件信息输出至所述客户端;
    其中,所述法律案件知识图谱通过结合法理法规数据以及判案手册数据对裁判文书数据进行实体对象及实体对象关系抽取后构建得到。
PCT/CN2020/111301 2020-02-20 2020-08-26 法律案件知识图谱查询方法、装置、设备及存储介质 WO2021164226A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010103656.1A CN111291161A (zh) 2020-02-20 2020-02-20 法律案件知识图谱查询方法、装置、设备及存储介质
CN202010103656.1 2020-02-20

Publications (1)

Publication Number Publication Date
WO2021164226A1 true WO2021164226A1 (zh) 2021-08-26

Family

ID=71024635

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111301 WO2021164226A1 (zh) 2020-02-20 2020-08-26 法律案件知识图谱查询方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN111291161A (zh)
WO (1) WO2021164226A1 (zh)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742494A (zh) * 2021-09-06 2021-12-03 湘潭大学 一种基于标签图转化的领域文本相似度计算方法及系统
CN113779358A (zh) * 2021-09-14 2021-12-10 支付宝(杭州)信息技术有限公司 一种事件检测方法和系统
CN113821647A (zh) * 2021-11-22 2021-12-21 山东捷瑞数字科技股份有限公司 一种工程机械行业知识图谱构建方法及系统
CN114237829A (zh) * 2021-12-27 2022-03-25 南方电网物资有限公司 一种电力设备的数据采集与处理方法
CN114238418A (zh) * 2022-02-24 2022-03-25 佛山市禅城区人民法院 信用卡要素表生成方法、系统和可读存储介质
CN114547345A (zh) * 2022-04-18 2022-05-27 支付宝(杭州)信息技术有限公司 结合图谱模式的输入提示方法及装置
CN114780083A (zh) * 2022-06-17 2022-07-22 之江实验室 一种知识图谱系统的可视化构建方法及装置
CN115238688A (zh) * 2022-08-15 2022-10-25 广州市刑事科学技术研究所 电子信息数据关联关系分析方法、装置、设备和存储介质
CN115809256A (zh) * 2023-02-22 2023-03-17 中关村科学城城市大脑股份有限公司 治安管理综合信息系统和可视化展示方法
CN115952290A (zh) * 2023-03-09 2023-04-11 太极计算机股份有限公司 基于主动学习和半监督学习的案情特征标注方法、装置和设备
CN115982388A (zh) * 2023-03-06 2023-04-18 共道网络科技有限公司 案件质控图谱建立、案件文书质检方法、设备及存储介质
WO2023060633A1 (zh) * 2021-10-12 2023-04-20 深圳前海环融联易信息科技服务有限公司 增强语义的关系抽取方法、装置、计算机设备及存储介质
CN116484010A (zh) * 2023-03-15 2023-07-25 北京擎盾信息科技有限公司 知识图谱构建方法、装置、存储介质及电子装置
CN116629258A (zh) * 2023-07-24 2023-08-22 北明成功软件(山东)有限公司 基于复杂信息项数据的司法文书的结构化分析方法及系统
CN116756324A (zh) * 2023-08-14 2023-09-15 北京分音塔科技有限公司 基于庭审音频的关联度挖掘方法、装置、设备及存储介质
CN117057425A (zh) * 2023-10-11 2023-11-14 人民法院信息技术服务中心 一种规律型知识分析方法及装置
CN117149821A (zh) * 2023-10-19 2023-12-01 北京人大金仓信息技术股份有限公司 一种查询优化方法、存储介质与计算机设备
CN117540799A (zh) * 2023-10-20 2024-02-09 上海歆广数据科技有限公司 一种个案图谱创建生成方法及系统
CN117609440A (zh) * 2023-10-27 2024-02-27 中国司法大数据研究院有限公司 一种面向裁判文书的文档级智能问答实现方法
CN117609519A (zh) * 2024-01-22 2024-02-27 云南大学 一种电力碳排放计算公式中的实体关系抽取方法
CN117763156A (zh) * 2023-11-24 2024-03-26 上海歆广数据科技有限公司 一种动态全息个案管理系统

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291161A (zh) * 2020-02-20 2020-06-16 平安科技(深圳)有限公司 法律案件知识图谱查询方法、装置、设备及存储介质
CN111753025A (zh) * 2020-06-24 2020-10-09 南方科技大学 案件信息的自动获取方法、装置、设备和存储介质
CN111753517A (zh) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 基于rpa及ai的文档对比方法、装置、设备及介质
CN111798344B (zh) * 2020-07-01 2023-09-22 北京金堤科技有限公司 主体名称确定方法和装置、电子设备和存储介质
CN111859969B (zh) * 2020-07-20 2024-05-03 航天科工智慧产业发展有限公司 数据分析方法及装置、电子设备、存储介质
CN111797246B (zh) * 2020-09-08 2020-12-29 共道网络科技有限公司 庭审方法、装置、电子设备及机器可读存储介质
CN111932413B (zh) * 2020-09-14 2021-01-12 平安国际智慧城市科技股份有限公司 案件要素提取方法、装置、设备及介质
CN112487146B (zh) * 2020-12-02 2022-05-31 重庆邮电大学 一种法律案件争议焦点获取方法、装置以及计算机设备
CN112632226B (zh) * 2020-12-29 2021-10-26 天津汇智星源信息技术有限公司 基于法律知识图谱的语义搜索方法、装置和电子设备
CN112632225B (zh) * 2020-12-29 2022-08-30 天津汇智星源信息技术有限公司 基于案事件知识图谱的语义搜索方法、装置和电子设备
CN112883196B (zh) * 2021-02-01 2022-08-16 上海交通大学 基于知识图谱的案件分配方法、系统、介质及电子设备
CN113486187A (zh) * 2021-03-24 2021-10-08 平安科技(深圳)有限公司 佛学知识图谱构建方法、装置、设备及存储介质
CN113239130A (zh) * 2021-06-18 2021-08-10 广东博维创远科技有限公司 一种基于刑事司法文书的知识图谱的构建方法、装置和电子设备、存储介质
CN113868391B (zh) * 2021-09-27 2024-05-07 平安国际智慧城市科技股份有限公司 基于知识图谱的法律文书生成方法、装置、设备及介质
TWI800971B (zh) * 2021-11-03 2023-05-01 財團法人資訊工業策進會 失能等級自動判斷裝置及失能等級自動判斷方法
CN114092119A (zh) * 2021-11-29 2022-02-25 北京金堤科技有限公司 供应关系获取方法、装置、存储介质及电子设备
CN114239561B (zh) * 2021-12-10 2023-04-28 北京天眼查科技有限公司 供应关系获取方法、装置、存储介质及电子设备
CN114187143A (zh) * 2021-12-21 2022-03-15 厦门大学 基于人工智能的建筑施工合同风险审查方法及系统
CN114637822A (zh) * 2022-03-15 2022-06-17 平安国际智慧城市科技股份有限公司 法律信息查询方法、装置、设备及存储介质
CN115269879B (zh) * 2022-09-05 2023-05-05 北京百度网讯科技有限公司 知识结构数据的生成方法、数据搜索方法和风险告警方法
CN115730078A (zh) * 2022-11-04 2023-03-03 南京擎盾信息科技有限公司 用于类案检索的事件知识图谱构建方法、装置及电子设备
CN116304019B (zh) * 2023-01-09 2023-09-12 中国司法大数据研究院有限公司 一种争议焦点体系构建与识别方法
CN117743590A (zh) * 2023-11-30 2024-03-22 北京汉勃科技有限公司 一种基于大语言模型的法律辅助方法及系统
CN117725235B (zh) * 2023-12-25 2024-04-30 武汉百智诚远科技有限公司 一种基于人工智能算法的法律知识增强检索系统及方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160063093A1 (en) * 2014-08-27 2016-03-03 Facebook, Inc. Keyword Search Queries on Online Social Networks
CN106991092A (zh) * 2016-01-20 2017-07-28 阿里巴巴集团控股有限公司 基于大数据挖掘相似裁判文书的方法和设备
CN108614860A (zh) * 2018-03-27 2018-10-02 成都律云科技有限公司 一种律师信息处理方法和系统
CN108681977A (zh) * 2018-03-27 2018-10-19 成都律云科技有限公司 一种律师信息处理方法和系统
CN111291161A (zh) * 2020-02-20 2020-06-16 平安科技(深圳)有限公司 法律案件知识图谱查询方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160063093A1 (en) * 2014-08-27 2016-03-03 Facebook, Inc. Keyword Search Queries on Online Social Networks
CN106991092A (zh) * 2016-01-20 2017-07-28 阿里巴巴集团控股有限公司 基于大数据挖掘相似裁判文书的方法和设备
CN108614860A (zh) * 2018-03-27 2018-10-02 成都律云科技有限公司 一种律师信息处理方法和系统
CN108681977A (zh) * 2018-03-27 2018-10-19 成都律云科技有限公司 一种律师信息处理方法和系统
CN111291161A (zh) * 2020-02-20 2020-06-16 平安科技(深圳)有限公司 法律案件知识图谱查询方法、装置、设备及存储介质

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742494B (zh) * 2021-09-06 2024-03-15 湘潭大学 一种基于标签图转化的领域文本相似度计算方法及系统
CN113742494A (zh) * 2021-09-06 2021-12-03 湘潭大学 一种基于标签图转化的领域文本相似度计算方法及系统
CN113779358A (zh) * 2021-09-14 2021-12-10 支付宝(杭州)信息技术有限公司 一种事件检测方法和系统
CN113779358B (zh) * 2021-09-14 2024-05-24 支付宝(杭州)信息技术有限公司 一种事件检测方法和系统
WO2023060633A1 (zh) * 2021-10-12 2023-04-20 深圳前海环融联易信息科技服务有限公司 增强语义的关系抽取方法、装置、计算机设备及存储介质
CN113821647A (zh) * 2021-11-22 2021-12-21 山东捷瑞数字科技股份有限公司 一种工程机械行业知识图谱构建方法及系统
CN113821647B (zh) * 2021-11-22 2022-02-22 山东捷瑞数字科技股份有限公司 一种工程机械行业知识图谱构建方法及系统
CN114237829A (zh) * 2021-12-27 2022-03-25 南方电网物资有限公司 一种电力设备的数据采集与处理方法
CN114238418B (zh) * 2022-02-24 2022-05-10 佛山市禅城区人民法院 信用卡要素表生成方法、系统和可读存储介质
CN114238418A (zh) * 2022-02-24 2022-03-25 佛山市禅城区人民法院 信用卡要素表生成方法、系统和可读存储介质
CN114547345A (zh) * 2022-04-18 2022-05-27 支付宝(杭州)信息技术有限公司 结合图谱模式的输入提示方法及装置
US11907390B2 (en) 2022-06-17 2024-02-20 Zhejiang Lab Method and apparatus for visual construction of knowledge graph system
CN114780083A (zh) * 2022-06-17 2022-07-22 之江实验室 一种知识图谱系统的可视化构建方法及装置
CN115238688A (zh) * 2022-08-15 2022-10-25 广州市刑事科学技术研究所 电子信息数据关联关系分析方法、装置、设备和存储介质
CN115238688B (zh) * 2022-08-15 2023-08-01 广州市刑事科学技术研究所 电子信息数据关联关系分析方法、装置、设备和存储介质
CN115809256A (zh) * 2023-02-22 2023-03-17 中关村科学城城市大脑股份有限公司 治安管理综合信息系统和可视化展示方法
CN115809256B (zh) * 2023-02-22 2023-06-06 中关村科学城城市大脑股份有限公司 治安管理综合信息系统和可视化展示方法
CN115982388A (zh) * 2023-03-06 2023-04-18 共道网络科技有限公司 案件质控图谱建立、案件文书质检方法、设备及存储介质
CN115982388B (zh) * 2023-03-06 2024-04-19 共道网络科技有限公司 案件质控图谱建立、案件文书质检方法、设备及存储介质
CN115952290B (zh) * 2023-03-09 2023-06-02 太极计算机股份有限公司 基于主动学习和半监督学习的案情特征标注方法、装置和设备
CN115952290A (zh) * 2023-03-09 2023-04-11 太极计算机股份有限公司 基于主动学习和半监督学习的案情特征标注方法、装置和设备
CN116484010A (zh) * 2023-03-15 2023-07-25 北京擎盾信息科技有限公司 知识图谱构建方法、装置、存储介质及电子装置
CN116484010B (zh) * 2023-03-15 2024-01-16 北京擎盾信息科技有限公司 知识图谱构建方法、装置、存储介质及电子装置
CN116629258A (zh) * 2023-07-24 2023-08-22 北明成功软件(山东)有限公司 基于复杂信息项数据的司法文书的结构化分析方法及系统
CN116629258B (zh) * 2023-07-24 2023-10-13 北明成功软件(山东)有限公司 基于复杂信息项数据的司法文书的结构化分析方法及系统
CN116756324B (zh) * 2023-08-14 2023-10-27 北京分音塔科技有限公司 基于庭审音频的关联度挖掘方法、装置、设备及存储介质
CN116756324A (zh) * 2023-08-14 2023-09-15 北京分音塔科技有限公司 基于庭审音频的关联度挖掘方法、装置、设备及存储介质
CN117057425A (zh) * 2023-10-11 2023-11-14 人民法院信息技术服务中心 一种规律型知识分析方法及装置
CN117057425B (zh) * 2023-10-11 2023-12-22 人民法院信息技术服务中心 一种规律型知识分析方法及装置
CN117149821A (zh) * 2023-10-19 2023-12-01 北京人大金仓信息技术股份有限公司 一种查询优化方法、存储介质与计算机设备
CN117149821B (zh) * 2023-10-19 2024-01-30 北京人大金仓信息技术股份有限公司 一种查询优化方法、存储介质与计算机设备
CN117540799B (zh) * 2023-10-20 2024-04-09 上海歆广数据科技有限公司 一种个案图谱创建生成方法及系统
CN117540799A (zh) * 2023-10-20 2024-02-09 上海歆广数据科技有限公司 一种个案图谱创建生成方法及系统
CN117609440A (zh) * 2023-10-27 2024-02-27 中国司法大数据研究院有限公司 一种面向裁判文书的文档级智能问答实现方法
CN117763156A (zh) * 2023-11-24 2024-03-26 上海歆广数据科技有限公司 一种动态全息个案管理系统
CN117763156B (zh) * 2023-11-24 2024-05-07 上海歆广数据科技有限公司 一种动态全息个案管理系统
CN117609519A (zh) * 2024-01-22 2024-02-27 云南大学 一种电力碳排放计算公式中的实体关系抽取方法
CN117609519B (zh) * 2024-01-22 2024-04-19 云南大学 一种电力碳排放计算公式中的实体关系抽取方法

Also Published As

Publication number Publication date
CN111291161A (zh) 2020-06-16

Similar Documents

Publication Publication Date Title
WO2021164226A1 (zh) 法律案件知识图谱查询方法、装置、设备及存储介质
US10289717B2 (en) Semantic search apparatus and method using mobile terminal
US9424294B2 (en) Method for facet searching and search suggestions
US20180032930A1 (en) System and method to Generate Queries for a Business Database
CN110704743B (zh) 一种基于知识图谱的语义搜索方法及装置
US9280535B2 (en) Natural language querying with cascaded conditional random fields
JP5936698B2 (ja) 単語意味関係抽出装置
US20150120738A1 (en) System and method for document classification based on semantic analysis of the document
US20120136649A1 (en) Natural Language Interface
CN110502642B (zh) 一种基于依存句法分析与规则的实体关系抽取方法
CN110097278B (zh) 一种科技资源智能共享融合训练系统和应用系统
US20160147878A1 (en) Semantic search engine
CN102622453A (zh) 基于本体的食品安全事件语义检索系统
CN116501875B (zh) 一种基于自然语言和知识图谱的文档处理方法和系统
JP2016192202A (ja) 照合処理システム、方法、及びプログラム
CN114997288A (zh) 一种设计资源关联方法
CN112486919A (zh) 文档管理方法、系统及存储介质
CN113157887A (zh) 知识问答意图识别方法、装置、及计算机设备
KR20220074576A (ko) 마케팅 지식 그래프 구축을 위한 딥러닝 기반 신조어 추출 방법 및 그 장치
Song et al. Semantic query graph based SPARQL generation from natural language questions
CN112183110A (zh) 一种基于数据中心的人工智能数据应用系统及应用方法
CN114091464B (zh) 一种融合五维特征的高普适性多对多关系三元组抽取方法
CN116303923A (zh) 一种知识图谱问答方法、装置、计算机设备和存储介质
CN114881019A (zh) 面向多模态网络的数据混合存储方法及装置
Hoshiai et al. A Semantic Category Matching Approach to Ontology Alignment.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20920135

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20920135

Country of ref document: EP

Kind code of ref document: A1