CN111475623B

CN111475623B - Case Information Semantic Retrieval Method and Device Based on Knowledge Graph

Info

Publication number: CN111475623B
Application number: CN202010273401.XA
Authority: CN
Inventors: 赵文; 张君福; 王靖琨; 李皓辰
Original assignee: Beijing Peking University Software Engineering Co ltd
Current assignee: Beijing Peking University Software Engineering Co ltd
Priority date: 2020-04-09
Filing date: 2020-04-09
Publication date: 2023-08-22
Anticipated expiration: 2040-04-09
Also published as: CN111475623A

Abstract

The application relates to a case information semantic retrieval method and a device based on a knowledge graph, wherein the case information semantic retrieval method based on the knowledge graph comprises the steps of treating the knowledge graph according to a legal document construction method; the method comprises the steps of carrying out simple recognition and intention recognition on a problem input by a user; defining a sparql language query template, matching the corresponding sparql language query template according to the intention recognition result, performing first retrieval in the legal knowledge graph, and endowing a first confidence coefficient for the first retrieval result; building a full-text search engine, carrying out second retrieval on the simple identification result in the full-text search engine, and endowing a second confidence coefficient to the second retrieval result; and outputting a final retrieval result according to the first confidence coefficient and the second confidence coefficient. The application can fully mine the relation between the entities, complete complex multi-jump semantic retrieval, and can return the retrieval result based on the full text search engine even if the semantic retrieval of the legal knowledge graph appears and can not meet the output of the requirement, thereby further improving the retrieval efficiency and accuracy.

Description

Case information semantic retrieval method and device based on knowledge graph

Technical Field

The application belongs to the technical field of intelligent legal treatment, and particularly relates to a case information semantic retrieval method and device based on a knowledge graph.

Background

Intelligent therapy has become an important form of development in the therapeutic field under the background of the informatization age. The intelligent method is to construct an intelligent environment by utilizing an intelligent technical means, and the relationship between data is more and more complex along with the increasing of the data volume of judicial cases, so that at present, it is difficult to efficiently and rapidly search information and mine the relationship between information from a large number of legal documents, and the text information is not effectively utilized.

The concept of "knowledge graph" defines a new way of knowledge organization. It attempts to transform unstructured data into structured data from the data itself and concatenate the various data together to form a graphical model containing a vast amount of structured data. The structured graph model data provides a new development direction for the development of legal field retrieval and question-answering systems, and because the structured graph model data can fully utilize the characteristics of structured data in a knowledge graph to fully mine the connection between the data, a very concise and accurate answer is provided for users, and an effective mode can be provided for information retrieval in legal fields.

At present, some works are carried out on a knowledge-graph-based retrieval and question-answering system in the legal field, but the works have the following problems:

1. in the knowledge graph retrieval and question-answering system, most of the retrieval is based on simple one-hop query, and complex semantic retrieval involving multiple hops often cannot obtain good results and cannot fully mine the connection between entities.

2. The current knowledge graph application field is mainly encyclopedic knowledge question and answer in the general field, and cannot be effectively used in the specific field.

3. The search system is only based on semantic search of the knowledge graph, and can not stably output the query result.

Disclosure of Invention

In order to overcome the problems in the search and question-answering system in the heart legal field at least to a certain extent, the application provides a case information semantic search method and device based on a knowledge graph.

In a first aspect, the present application provides a case information semantic retrieval method based on a knowledge graph, including:

constructing a legal treatment knowledge graph according to legal documents;

the method comprises the steps of carrying out simple recognition and intention recognition on a problem input by a user;

defining a sparql language query template, matching the corresponding sparql language query template according to the intention recognition result, performing first retrieval in the legal knowledge graph, and endowing a first confidence coefficient for the first retrieval result;

building a full-text search engine, carrying out second retrieval on the simple identification result in the full-text search engine, and endowing a second confidence coefficient to the second retrieval result;

and outputting a final retrieval result according to the first confidence coefficient and the second confidence coefficient.

Further, the method further comprises:

generating a third search result by using a neural network algorithm;

assigning a first confidence coefficient to the third search result and the first search result together;

Further, the generating the third search result by using the neural network algorithm includes:

collecting user search questions, and labeling the questions to prepare question-answer pairs, wherein the question-answer pairs comprise a question corpus Q and answer candidates A;

respectively representing the corpus Q and the answer candidates A of the questions as two vectors f (Q) and g (A), calculating the distance between f (Q) and g (A) on a vector space, and scoring different answer candidates A according to the distance;

training a neural network according to the question-answer pairs and the scoring;

inputting the questions input by the user into a trained neural network to obtain the best matching answer vector;

and converting the best matching answer vector into a third retrieval result in a natural language form.

Further, the method for constructing legal knowledge graph according to legal documents includes:

extracting key information according to the case description, the fact part and the constituent elements in the legal documents, wherein the constituent elements comprise at least one of laws related to each case, the judged crime name of the person to be warned and the criminal period;

and generating a triplet according to the key information, and constructing a legal knowledge graph according to the triplet.

Further, the simple recognition of the problem input by the user includes:

performing spelling error correction on the problem input by the user by combining the n-gram language model and the confusion set;

adding all entity names and attribute names into a custom word segmentation dictionary;

setting a recognition result confidence level for an entity matching method, a template segmentation method, a synonym dictionary query method, a similarity calculation method and a longest public substring matching method;

and taking the recognition result corresponding to the confidence level screened according to the priority as a simple recognition result.

Further, after the simple recognition result is the attribute value, the method further comprises:

counting the most frequent attribute names in the attribute names corresponding to the attribute values;

when the attribute names are defaulted in the query statement, the most frequent attribute names are taken as the completed attribute names.

Further, the first confidence = intent template confidence x entity confidence x predicate confidence.

Further, the second searching the simple recognition result in the full text search engine includes:

presetting entity and attribute level policies;

obtaining entity and attribute pairs and corresponding grades thereof according to the preset entity and attribute grade strategy by using an entity identification method;

inputting entity and attribute pairs of different grades into a full text search engine to obtain an answer list;

scoring answers in the answer list according to the entity attribute and the corresponding grade;

and screening out a second search result according to the scoring result.

Further, the defining sparql language query templates includes:

according to the relation between the entity and the attribute, defining a sparql language query template by the relation between the entity and the entity;

and/or,

the sparql language query template is defined based on the query terms.

In a second aspect, the present application provides a case information semantic retrieval apparatus based on a knowledge graph, including:

the construction module is used for constructing a legal knowledge graph according to legal documents;

the recognition module is used for simply recognizing and intention recognizing the problem input by the user;

the first retrieval module is used for defining a sparql language query template, matching the corresponding sparql language query template according to the intention recognition result, performing first retrieval in the legal knowledge graph, and endowing a first confidence degree for the first retrieval result;

the second search module is used for building a full-text search engine, performing second search on the simple identification result in the full-text search engine, and endowing a second confidence degree to the second search result;

and the output module is used for outputting a final search result according to the first confidence coefficient and the second confidence coefficient.

The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:

according to the case information semantic retrieval method and device based on the knowledge graph, the knowledge graph is treated by means of construction, the problem input by the user is simply identified and the intention is identified, the first retrieval is conducted in the knowledge graph by matching the intention identification result with the corresponding sparql language query template, the first confidence coefficient is given to the first retrieval result, the second retrieval is conducted on the simple identification result in the full-text search engine, the second confidence coefficient is given to the second retrieval result, the final retrieval result is output according to the first confidence coefficient and the second confidence coefficient, and because the knowledge graph can fully mine the relation between the entities, complex multi-jump semantic retrieval can be completed, and even if the semantic retrieval of the knowledge graph treated by the intention cannot meet the output requirement, the retrieval result can be returned based on the full-text search engine, so that the retrieval efficiency and accuracy are further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

Fig. 1 is a flowchart of a case information semantic retrieval method based on a knowledge graph according to an embodiment of the present application.

Fig. 2 is a flowchart of a case information semantic retrieval method based on a knowledge graph according to another embodiment of the present application.

Fig. 3 is a flowchart of a case information semantic retrieval method based on a knowledge graph according to another embodiment of the present application.

Fig. 4 is a legal knowledge graph in a case information semantic retrieval method based on a knowledge graph according to an embodiment of the present application.

Fig. 5 is a flowchart of an error correction method in a case information semantic retrieval method based on a knowledge graph according to an embodiment of the present application.

Fig. 6 is a flowchart of an error correction method in a case information semantic retrieval method based on a knowledge graph according to an embodiment of the present application.

Fig. 7 is a functional block diagram of a case information semantic retrieval device based on a knowledge graph according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail below. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, based on the examples herein, which are within the scope of the application as defined by the claims, will be within the scope of the application as defined by the claims.

Fig. 1 is a flowchart of a case information semantic retrieval method based on a knowledge graph according to an embodiment of the present application, as shown in fig. 1, the case information semantic retrieval method based on a knowledge graph includes:

s11: constructing a legal treatment knowledge graph according to legal documents;

s12: the method comprises the steps of carrying out simple recognition and intention recognition on a problem input by a user;

s13: defining a sparql language query template, matching the corresponding sparql language query template according to the intention recognition result, performing first retrieval in the legal knowledge graph, and endowing a first confidence coefficient for the first retrieval result;

s14: building a full-text search engine, carrying out second retrieval on the simple identification result in the full-text search engine, and endowing a second confidence coefficient to the second retrieval result;

s15: and outputting a final retrieval result according to the first confidence coefficient and the second confidence coefficient.

Conventionally, there are many problems in a knowledge-graph-based retrieval and question-answering system in the legal field, for example, the field of knowledge-graph application is mainly encyclopedic knowledge question-answering in the general field, and cannot be effectively used in a specific field; most of searches are based on simple one-hop queries, and complex semantic searches involving multiple hops often cannot obtain good results and cannot fully mine the links between entities; the retrieval system is only based on semantic retrieval of the knowledge graph, and can not stably output the query result when the semantic retrieval cannot find a proper answer.

For legal field, the information in legal document is very important, but the fact that the content of the case is complicated in legal document makes it difficult for users to accurately match the content which the users want to inquire through certain characteristics in the process of searching. According to the embodiment, a legal document is used for constructing a legal knowledge graph, so that a specific knowledge graph in the legal field is constructed, the knowledge graph can be effectively applied in the legal field, the legal knowledge graph can fully identify the relation among entities to complete complex multi-jump semantic retrieval, and a first retrieval result obtained by performing the first retrieval on the legal knowledge graph is used as a final retrieval result when the first confidence exceeds a confidence threshold by presetting the confidence threshold for the first confidence; otherwise, the second search result obtained by performing the second search in the full-text search engine such as the elastic search is taken as the final search result. Therefore, when the first search result obtained by carrying out the first search based on the legal knowledge graph cannot reach the confidence threshold, namely the first search result is not ideal, the system outputs the second search result as the final search result, so that the stable output of the system can be realized.

It should be noted that the confidence threshold may be selected according to actual needs, and the present application is not limited.

In this embodiment, by constructing a method knowledge graph, simple recognition and intention recognition are performed on the problem input by the user, a corresponding sparql language query template is matched according to the intention recognition result to perform first search in the method knowledge graph, a first confidence coefficient is given to the first search result, a second search is performed on the simple recognition result in a full-text search engine, a second confidence coefficient is given to the second search result, a final search result is output according to the first confidence coefficient and the second confidence coefficient, and because the method knowledge graph can fully mine the relation between entities, complex multi-hop semantic search can be completed, and even if the semantic search of the method knowledge graph fails to meet the required output, the search result can be returned based on the full-text search engine, thereby further improving the search efficiency and accuracy.

The embodiment of the application provides another case information semantic retrieval method based on a knowledge graph, as shown in a flow chart of fig. 2, wherein the case information semantic retrieval method based on the knowledge graph comprises the following steps:

s21: collecting user search questions, and labeling the questions to prepare question-answer pairs, wherein the question-answer pairs comprise a question corpus Q and answer candidates A;

s22: respectively representing the corpus Q and the answer candidates A of the questions as two vectors f (Q) and g (A), calculating the distance between f (Q) and g (A) on a vector space, and scoring different answer candidates A according to the distance;

through a question-answer pair and word-embedding space, carrying out vector representation on the question subjected to word segmentation processing by utilizing an LSTM algorithm to obtain question vector representation f (Q) and knowledge graph vector representation g (A); the knowledge graph candidate list after question entity extraction is subjected to vector characterization by a transition method, and the word embedding space, the LSTM algorithm and the transition method are described in the prior art and are not described in detail here.

The distance between f (Q) and g (a), i.e., answer candidate a score, is calculated on the vector space:

S(Q,A)＝f(Q) ^T g(A)。

s23: training the neural network according to question and answer pairs and scoring, which comprises the following steps:

a negative sampling method is utilized to generate a plurality of wrong answers for each question-answer pair of the training set, and the training purpose is to enable the distance between the question Q and the correct answer A to be as large as possible compared with the distance between the question Q and the wrong answer.

S24: inputting the questions input by the user into a trained neural network to obtain the best matching answer vector, which comprises the following steps of

Performing spelling error correction, word segmentation and entity extraction on the user problem, and matching with predicates and types;

finding out the corresponding entity relation in the knowledge graph from the named entity set in the question sentence to obtain an answer vector;

and converting the answer vector into a pre-edited knowledge graph entity and a relation number through a trained TransE model, and obtaining the best matching answer vector through a trained neural network.

S25: converting the best matching answer vector into a third retrieval result in a natural language form;

and restoring the best matching answer vector into a third retrieval result of natural language through the knowledge graph query interface.

S26: assigning a first confidence coefficient for the third search result and the first search result together;

s27: and outputting a final retrieval result according to the first confidence coefficient and the second confidence coefficient.

It should be noted that, the first search result, the second search result, and the third search result may be search lists, when the search result is a search list, the confidence coefficient of each search result needs to be ordered, and if the confidence coefficient of the first search result ranked after the third search result and the first search result are comprehensively ordered is higher than a preset confidence coefficient threshold value, the search result corresponding to the highest confidence coefficient is output as the final search result.

In this embodiment, after the user problem is accumulated to a certain extent, the accuracy of semantic retrieval can be further improved by outputting the retrieval result by using the neural network.

The embodiment of the application provides another case information semantic retrieval method based on a knowledge graph, as shown in a flow chart of fig. 3, wherein the case information semantic retrieval method based on the knowledge graph comprises the following steps:

s31: extracting key information according to the case description, the fact part and the constituent elements in the legal document;

the composition elements comprise at least one of laws related to each case, criminal names judged by the interviewee and criminal period length;

extracting key information can expand the entities, relationships and attributes in the legal knowledge graph.

S32: generating a triplet according to the key information, and constructing a legal knowledge graph according to the triplet;

triplets include, but are not limited to: a description word depending on the entity is used as a triplet of relation words; a triplet generated in the form of a master predicate guest and a master lineage table; triples in the form of a combination of relational words describing an entity and a "master guest".

The legal knowledge graph is shown in fig. 4, and includes law enforcement body ID, law enforcement personnel ID, administrative law enforcement case ID, principal ID, legal strip ID, specific content, and the like.

S33: the method comprises the steps of simply identifying a problem input by a user;

in some embodiments, specific implementations of S33 include, but are not limited to:

The recognition result comprises an entity, an attribute and an attribute value recognition result, and the setting of the confidence level priority of the recognition result comprises the following steps:

giving corresponding confidence to each entity, specifically comprising:

completely matching the entity in the legal knowledge graph, wherein the confidence coefficient is 1;

according to the matching of the entity synonym dictionary and the entity, the confidence coefficient is 0.9;

and the confidence coefficient of the entity obtained by fuzzy matching similarity calculation and longest public substring matching is 0.8.

Giving corresponding confidence to each predicate, specifically comprising:

the method is completely matched with the attribute in the legal knowledge graph, and the confidence coefficient is 1;

based on predicate synonym dictionary matching, the confidence is 0.9;

and predicates obtained through fuzzy matching similarity calculation and longest public substring matching have a confidence level of 0.8.

In some embodiments, after the simple recognition result is the attribute value, assigning a corresponding confidence level to each attribute value specifically includes: counting the most frequent attribute names in the attribute names corresponding to the attribute values; when the attribute names are defaulted in the query statement, the most frequent attribute names are taken as the completed attribute names.

The type matching includes: and establishing a type synonym matching dictionary for matching the types of the entities in the legal knowledge graph.

S34: performing second retrieval on the simple recognition result in the full-text search engine, wherein the second retrieval comprises the following steps:

presetting entity and attribute level policies;

and screening out a second search result according to the scoring result.

As an alternative implementation of the present application, defining the sparql language query templates includes, but is not limited to:

firstly, according to the relation between an entity and an attribute, defining a sparql language query template by the relation between the entity and the entity;

and secondly, defining a sparql language query template based on the query words.

Therefore, a general template for constructing the problem input by the user is realized, a large amount of corpus is needed when the neural network algorithm is applied to question answering and retrieval of the knowledge graph, for real projects, the data volume of an effective training model is difficult to find according to specific data, and the training result is difficult to migrate. Therefore, the mode of combining the template and the neural network is provided by the embodiment, so that a large amount of user questioning data can be fully utilized, and the accuracy of the retrieval result can be ensured.

It should be noted that, although the present application is a retrieval method customized according to legal data, the present application may construct knowledge maps of other fields according to actual needs by using the same or similar method for constructing a legal knowledge map as the present application, so that the retrieval method is generally used in each field.

S35: carrying out intention recognition on the problem input by the user;

s36: defining a sparql language query template, matching the corresponding sparql language query template according to the intention recognition result, performing first retrieval in the legal knowledge graph, and endowing a first confidence coefficient for the first retrieval result;

giving confidence to the intention recognition result to the intention template specifically comprises the following steps:

the confidence of the complete matching intention template is 1;

the incomplete match intent template confidence is 0.

And carrying out entity recognition on the user problem to obtain a possible subject list, and giving corresponding entity confidence to each subject according to a preset rule.

And identifying the attributes and the attribute values of the user problems to obtain a possible predicate and attribute value list, and endowing each predicate and attribute value with corresponding predicate confidence according to a preset rule.

First confidence = intent template confidence x entity confidence x predicate confidence.

Because the results can not be obtained stably at times only based on semantic retrieval of the knowledge graph, in the embodiment, the results obtained by the two retrieval modes are scored and sequenced by combining the legal knowledge graph with the full-text search engine, and the highest-ranking retrieval result is output, so that each problem can be ensured to obtain the retrieval result stably.

As shown in fig. 5, an embodiment of the present application provides a flowchart of a case information semantic retrieval method based on a knowledge graph, where the case information semantic retrieval method based on the knowledge graph includes:

step 1: and establishing legal knowledge graph based on legal documents, question template library and search based on an elastiscearch search engine.

In some embodiments, establishing the legal document-based legal knowledge graph specifically includes: crawling criminal legal documents on the Chinese referee document network, and filtering out webpage elements such as html labels and text of irrelevant knowledge; and (3) extracting entities and attributes by taking legal documents in a certain time period of a certain region as corpus, and constructing a structured law treatment knowledge graph.

In some embodiments, further comprising: constructing a basic education knowledge graph, which specifically comprises the following steps:

perfecting the ontology by referring to general domain knowledge maps such as schema. Org, DBpedia and the like;

determining concepts and relationships between the concepts and their constraints based on legal knowledge;

inviting law domain experts and teachers to audit and complete the ontology construction process;

information extraction from text is performed by using a machine learning method, including entity set expansion, relation extraction and the like.

The knowledge graph in various forms in the therapeutic field is constructed, so that the knowledge graph can be effectively used in the therapeutic field.

The method for establishing the problem template library according to the existing method treatment knowledge graph specifically comprises the following steps: and establishing a query corresponding sparql language query template for legal relationships between the entities and the attributes in the knowledge graph and multi-hop relationships between the entities. The sparql language query template correspondence is shown in table 1.

TABLE 1 sparql language query template correspondence

In some embodiments, establishing the search based on the elastiscearch search engine specifically includes:

an extensible open source full text search and analysis engine is built, which provides a distributed multi-user capable full text search engine to support instant query and retrieval of massive text.

The triples are added to the elastiscearch index in an elastiscearch index format.

Step 2: performing spelling error correction on the user input problem;

because most of data in the legal field are unsupervised data, and manual labeling is time-consuming and labor-consuming, an unsupervised method is selected for spelling error correction. The realization of Chinese spelling error correction is carried out by combining an n-gram language model and a confusion set.

As shown in fig. 6, the error correction method specifically includes: acquiring a user problem, and using a word segmentation tool to segment words by combining a user-defined dictionary to obtain a word segmentation text; outputting a word-level replacement candidate sentence by using a word-level confusion set on a single word; outputting word-staggering correction candidate sentences based on the trained n-gram language model; and outputting error word correction candidate sentences for the words by using the word level confusion set through word-to-word conversion, and outputting error correction results based on the n-gram language model score.

Step 3: the method for identifying the attribute of the entity by the word segmentation, the part of speech tagging for the problem sentence specifically comprises the following steps:

for the attribute names and the entity names, to ensure the correctness, all the entity names and the attribute names are added into a custom word segmentation dictionary, and the custom word segmentation dictionary comprises an attribute synonym dictionary, a type synonym matching dictionary and the like.

Two entities, an entity and an attribute value are retrieved by a general search engine (e.g., a must search engine), and characters appearing between the entities and the attribute are used as attribute synonym dictionary.

For the attribute values, a fuzzy matching method can be adopted, or an n-gram language model searching method after word segmentation can be adopted, after judging that one phrase is the attribute value, the attribute names corresponding to the attribute value are counted to obtain the most frequent attribute names corresponding to the attribute value, and when the attribute names are defaulted in the query statement, the most frequent attribute names corresponding to the attribute value are used as the completed attribute names.

For types, a type synonym matching dictionary is established for matching the types of the entities in the knowledge base.

The entity attribute identification comprises entity identification, attribute identification and attribute value identification, and is realized by adopting methods such as knowledge base matching, template segmentation, synonym dictionary inquiry, similarity calculation, longest public substring matching and the like, setting priority according to the confidence level of each method, and obtaining a candidate entity set.

The entity identification priority setting rule is as follows:

the confidence coefficient is 1 through the complete matching of the template segmentation and the entity in the knowledge base;

the similarity calculation and the confidence of the longest common substring matching method are set to 0.8.

The attribute identification priority setting rule is as follows:

obtaining a possible predicate list according to the user problem, and giving corresponding confidence to each predicate according to a preset rule:

completely matching with the attribute in the knowledge base, wherein the confidence coefficient is 1;

based on predicate synonym dictionary matching, the confidence is 0.9;

the confidence level of the example obtained by fuzzy matching similarity calculation and longest public substring matching is 0.8.

The attribute value identification priority setting rule is as follows:

completely matching with attribute values in a knowledge base, wherein the confidence coefficient is 1;

Step 4: the method for identifying the intention of the problem statement specifically comprises the following steps:

and carrying out entity recognition on the user problem to obtain a possible subject list, and giving corresponding confidence to each subject according to a preset rule.

And identifying the attributes and the attribute values of the user problems to obtain a possible predicate and attribute value list, and endowing each predicate and attribute value with corresponding confidence according to a preset rule.

Step 5: generating a structured query statement through query and matching of a problem template, wherein the method specifically comprises the following steps:

after identifying all entity names, attribute names and attribute values in the query, determining the type of the query according to the number and the positions of the entity names, the attribute names and the attribute values, wherein the flow is as follows:

step 51: if there is only an entity name, i.e., an entity query.

Step 52: if there is one entity name, there is only one attribute name. Such as: what is the (name) of "(case_c39? "is the entity attribute direct query.

Step 53: if there is one entity name, multiple attribute names:

what are multiple attributes of the lookup entity, such as (name) and (law enforcement number) of "(staff_c39? "

Select to where { < entity name > < attribute name >? o } unit { < entity name > < attribute name >? o }

What are (case names) that may be multi-hop lookups of attributes, such as "(staff_053) (law enforcement? "

Select to where { < entity name > < attribute name >? s.? s < attribute name >? o. }

If the query does not get a result, it may be what is a question with a skip of a relationship, such as "(staff_053) (law enforcement's (name of the report))? ", search again after addition is required:

select to where { < entity name > < attribute name >? s.? sp? s2.? s2< attribute name >? o }

In some embodiments, the query goes through at most three hops, and no results are added.

Step 54: if there are only attribute names and attribute values:

one or more attribute names and attribute values (the attribute names and the number of attribute values are not necessarily the same), a type is identified. Such as "(2019) (fines) (1000 yuan) which of the (cases) are. The entity is looked up according to the attributes. The specific query sentence is defined by several attribute names in the question.

Select wheree {? s < attribute name > < attribute value >? s < type > < case > }

One or more attribute names, an attribute value, no type is identified. Then other attributes are looked up based on the entity attributes or attributes of another entity are looked up based on the entity attributes.

In this case, the corresponding entity is searched according to the attribute value, and then sequentially searched according to the attribute name.

Looking up other attributes based on the entity attributes, such as: what is the (law enforcement number) like "(Li Yan? "

Select to where {? sp < attribute value >? s < attribute name >? o. }

What is the (case name) looking up the attributes of another entity, such as "(Li Yan) (law enforcement), based on the entity attributes? "

Select to where {? sp < attribute value >? s < attribute name >? s2.s2< attribute name >? o }

What is the (case name) of an attribute value and attribute names, the attribute names being intermediate to the attribute values, such as "(Li Yan) (law enforcement? "

Then the attribute value is multi-hop looked up based on the attribute value: selector where {? s < attribute name > < attribute value >? s < attribute name >? .}

Step 55: what are cases (in the area of the sea) if only attribute values, such as "(2019) occur? "

According to the type dictionary, the type "case" is identified.

Select <? s < type > < case >? sp < attribute value >? sp < attribute value > }

Step 56: if there are two entities, another entity is found according to the entity and the relationship between the entities.

Step 6: and returning to the answer list, selecting the answer with the highest score, and displaying the matching answer template.

Generating a sparql sentence according to a sparql language query template, querying in a legal knowledge graph to obtain an answer list, giving each answer a score according to a preset rule, and sorting the answers based on the legal knowledge graph according to the scores;

and obtaining entity and attribute pairs of different grades in the problem according to a preset strategy by using an entity identification method.

Inputting different level entity and attribute pairs in the semantic analysis step into an elastiscearch engine to obtain an answer list; scoring the answers according to the entity attribute levels; ranking answers based on the full-text search engine according to scores;

if the highest scoring answer based on the legal knowledge graph source exceeds a preset threshold, returning the answer.

And if the highest scoring answer based on the legal knowledge graph source does not exceed the preset threshold, returning the highest scoring answer based on the full-text search engine.

Step 7: after a certain number of user questions are accumulated, marking the questions to make question-answer pairs;

step 8: the method comprises the steps of respectively representing a question corpus Q and an answer candidate A as two vectors f (Q) and g (A), calculating the distance between the Q and the A on a vector space, and training a neural network by utilizing a question-answer pair;

step 9: and converting the best matching answer vector obtained through the neural network into a natural language form search result, and adding the natural language form search result into an answer list which is also added into the legal knowledge graph to participate in scoring and sorting.

Fig. 7 is a functional block diagram of a case information semantic retrieving apparatus based on a knowledge graph according to an embodiment of the present application, as shown in fig. 7, the case information semantic retrieving apparatus based on a knowledge graph includes:

a construction module 71, configured to construct a legal knowledge graph according to legal documents;

an identification module 72 for simply identifying and intention-identifying a problem inputted by a user;

the first search module 73 is configured to define a sparql language query template, match the corresponding sparql language query template according to the intention recognition result, perform a first search in the legal knowledge graph, and assign a first confidence to the first search result;

a second search module 74, configured to build a full-text search engine, perform a second search on the simple recognition result in the full-text search engine, and assign a second confidence to the second search result;

and an output module 75 for outputting a final search result according to the first confidence and the second confidence.

A third retrieval module 76 for generating a third retrieval result using a neural network algorithm.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

According to the case information semantic retrieval device based on the knowledge graph, the knowledge graph is constructed according to legal documents through the construction module, the recognition module is used for simply recognizing and intention recognition of the problem input by the user, the first retrieval module is used for defining a sparql language query template, matching the corresponding sparql language query template according to the intention recognition result to perform first retrieval in the legal knowledge graph, the second retrieval module is used for constructing a full-text search engine, the simple recognition result is used for performing second retrieval in the full-text search engine, the third retrieval module is used for generating a third retrieval result through a neural network algorithm, and the output module is used for outputting a final retrieval result. The complex multi-jump semantic retrieval can be completed, the retrieval result can be returned based on the full text search engine even if the semantic retrieval of the legal knowledge graph fails to meet the output of the requirements, and in addition, the third retrieval result is generated through the neural network, so that the retrieval efficiency and the accuracy are further improved.

It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.

It should be noted that in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present application, unless otherwise indicated, the meaning of "plurality" means at least two.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

It should be noted that the present application is not limited to the above-mentioned preferred embodiments, and those skilled in the art can obtain other products in various forms without departing from the scope of the present application, however, any changes in shape or structure of the present application, and all technical solutions that are the same or similar to the present application, fall within the scope of the present application.

Claims

1. A case information semantic retrieval method based on a knowledge graph is characterized by comprising the following steps:

constructing a legal treatment knowledge graph according to legal documents;

the simple identification includes:

taking the recognition result corresponding to the confidence level screened according to the priority as a simple recognition result;

the intent recognition includes:

performing entity recognition on the user problem to obtain a possible subject list, and endowing each subject with corresponding confidence according to a preset rule;

carrying out attribute and attribute value recognition on the user problem to obtain a possible predicate and attribute value list, and endowing each predicate and attribute value with corresponding confidence according to a preset rule;

2. The knowledge-based case information semantic retrieval method according to claim 1, further comprising:

generating a third search result by using a neural network algorithm;

3. The knowledge-graph-based case information semantic retrieval method according to claim 2, wherein the generating a third retrieval result using a neural network algorithm comprises:

4. The knowledge-based case information semantic retrieval method according to claim 1, wherein the knowledge-based case information semantic retrieval method according to legal document construction method comprises:

5. The knowledge-based case information semantic retrieval method according to claim 1, wherein the simple recognition of the problem inputted by the user comprises:

6. The knowledge-based case information semantic retrieval method according to claim 5, further comprising, after the simple recognition result is the attribute value:

7. The knowledge-graph-based case information semantic retrieval method according to claim 1, wherein the first confidence = intent template confidence x entity confidence x predicate confidence.

8. The knowledge-based case information semantic retrieval method according to claim 1, wherein the performing a second retrieval of the simple recognition result in the full-text search engine comprises:

presetting entity and attribute level policies;

and screening out a second search result according to the scoring result.

9. The knowledge-based case information semantic retrieval method according to claim 1, wherein the defining sparql language query templates includes:

and/or,

the sparql language query template is defined based on the query terms.

10. The case information semantic retrieval device based on the knowledge graph is characterized by comprising:

the simple identification includes:

the intent recognition includes: