CN111475623A

CN111475623A - Case information semantic retrieval method and device based on knowledge graph

Info

Publication number: CN111475623A
Application number: CN202010273401.XA
Authority: CN
Inventors: 赵文; 张君福; 王靖琨; 李皓辰
Original assignee: Beijing Peking University Software Engineering Co ltd
Current assignee: Beijing Peking University Software Engineering Co ltd
Priority date: 2020-04-09
Filing date: 2020-04-09
Publication date: 2020-07-31
Anticipated expiration: 2040-04-09
Also published as: CN111475623B

Abstract

The application relates to a case information semantic retrieval method and device based on a knowledge graph, wherein the case information semantic retrieval method based on the knowledge graph comprises the steps of constructing a legal knowledge graph according to legal documents; performing simple recognition and intention recognition on a problem input by a user; defining a sparql language query template, matching a corresponding sparql language query template according to the intention recognition result, and performing first retrieval in the legal knowledge base map to give a first confidence coefficient to the first retrieval result; building a full-text search engine, performing second retrieval on the simple recognition result in the full-text search engine, and giving a second confidence coefficient to the second retrieval result; and outputting a final retrieval result according to the first confidence degree and the second confidence degree. The method and the device can fully mine the relation among the entities, complete complex multi-hop semantic retrieval, and return the retrieval result based on the full-text search engine even if the semantic retrieval of the legal knowledge base fails to meet the output requirement, thereby further improving the retrieval efficiency and accuracy.

Description

Case information semantic retrieval method and device based on knowledge graph

Technical Field

The application belongs to the technical field of intelligent law control, and particularly relates to a case information semantic retrieval method and device based on a knowledge graph.

Background

The intelligent treatment has become an important form for the development of the treatment field under the background of the information age. The intelligent law, namely, an intelligent environment is constructed by utilizing an intelligent technical means, the relation between data is more and more complex along with the continuous increase of the data volume of judicial cases, and at present, the information is difficult to efficiently and quickly search and mine the relation between the information from a large number of legal documents, so that the text information is not effectively utilized.

The concept of "knowledge graph" defines a completely new way of knowledge organization. It attempts to convert unstructured data into structured data and concatenate the various data together to form a graphical model containing a large amount of structured data, starting from the data itself. The structured graph model data provides a new development direction for the development of a legal field retrieval and question-answering system, and because the method can fully utilize the characteristics of the structured data in the knowledge graph to fully mine the relation between data, a very concise and accurate answer is provided for a user, and an effective mode can be undoubtedly provided for the information retrieval in the legal field.

At present, some work has been carried out on knowledge-graph-based retrieval and question-answering systems in the legal field, but the work has the following problems:

1. in a knowledge graph retrieval and question-answering system, most of retrieval is based on simple one-hop query, and complex semantic retrieval involving multiple hops often cannot obtain good results and cannot fully mine the relation between entities.

2. At present, the field of application of the knowledge graph is mainly encyclopedic knowledge question and answer in the general field, and the knowledge graph cannot be effectively used in the specific field.

3. The retrieval system is based on semantic retrieval of knowledge graph only and can not output the query result stably.

Disclosure of Invention

In order to overcome the problems in the field of cardiology and law retrieval and question-and-answer systems at least to a certain extent, the application provides a case information semantic retrieval method and device based on a knowledge graph.

In a first aspect, the present application provides a case information semantic retrieval method based on a knowledge graph, including:

constructing a legal knowledge map according to the legal documents;

performing simple recognition and intention recognition on a problem input by a user;

defining a sparql language query template, matching a corresponding sparql language query template according to the intention recognition result, and performing first retrieval in the legal knowledge base map to give a first confidence coefficient to the first retrieval result;

building a full-text search engine, performing second retrieval on the simple identification result in the full-text search engine, and giving a second confidence coefficient to the second retrieval result;

and outputting a final retrieval result according to the first confidence degree and the second confidence degree.

Further, the method further comprises:

generating a third retrieval result by using a neural network algorithm;

assigning a first confidence to the third retrieval result and the first retrieval result together;

Further, the generating a third search result by using a neural network algorithm includes:

collecting user search questions, labeling the questions to make question-answer pairs, wherein the question-answer pairs comprise question corpus Q and answer candidates A;

expressing question corpus Q and answer candidate A as two vectors f (Q) and g (A) respectively, calculating the distances of f (Q) and g (A) in a vector space, and scoring different answer candidates A according to the distances;

training a neural network according to the question-answer pairs and the scores;

inputting the questions input by the user into the trained neural network to obtain the best matching answer vector;

and converting the best matching answer vector into a third retrieval result in a natural language form.

Further, the method for constructing the legal knowledge base according to the legal documents comprises the following steps:

extracting key information according to case description, fact parts and component elements in a legal document, wherein the component elements comprise at least one of a law case related law article, a crime name of an advised person and a criminal period length;

generating a triple according to the key information, and constructing a legal knowledge base map according to the triple.

Further, the simple identification of the problem input by the user includes:

spelling error correction is carried out on the problem input by the user by combining the n-gram language model and the confusion set;

adding all entity names and attribute names into a user-defined word segmentation dictionary;

setting confidence degree priority of recognition results for an entity matching method, a template segmentation method, a synonym dictionary query method, a similarity calculation method and a longest common substring matching method;

and taking the recognition result corresponding to the confidence coefficient screened out according to the priority as a simple recognition result.

Further, after the simple identification result is an attribute value, the method further includes:

counting the most frequent attribute names in the attribute names corresponding to the attribute values;

and when the attribute name is default in the query statement, taking the most frequent attribute name as a completed attribute name.

Further, the first confidence is the intended template confidence × entity confidence × predicate confidence.

Further, the second retrieving of the simple recognition result in the full-text search engine includes:

presetting entity and attribute peer-to-peer strategies;

obtaining entity and attribute pairs and corresponding grades thereof by using an entity identification method according to the preset entity and attribute peer-to-peer strategy;

inputting the entity and attribute pairs with different grades into a full-text search engine to obtain an answer list;

scoring the answers in the answer list according to the corresponding grades of the entity attributes;

and screening out a second retrieval result according to the grading result.

Further, the defining the spark ql language query template comprises:

defining a spark ql language query template according to the entity-attribute relation;

and/or (c) and/or,

sparql language query templates are defined based on the query terms.

In a second aspect, the present application provides a case information semantic retrieval device based on knowledge graph, including:

the construction module is used for constructing a legal knowledge base according to the legal documents;

the identification module is used for carrying out simple identification and intention identification on the problems input by the user;

the first retrieval module is used for defining a spark ql language query template, matching a corresponding spark ql language query template according to the intention recognition result, and performing first retrieval in the legal control knowledge graph to endow a first confidence coefficient for a first retrieval result;

the second retrieval module is used for building a full-text search engine, performing second retrieval on the simple identification result in the full-text search engine and endowing a second confidence coefficient for the second retrieval result;

and the output module is used for outputting a final retrieval result according to the first confidence coefficient and the second confidence coefficient.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

the case information semantic retrieval method and device based on the knowledge graph provided by the embodiment of the invention can be used for solving the problem of the case information semantic retrieval problem by constructing the legal knowledge graph, performing simple recognition and intention recognition on the problems input by the user, performing first retrieval in a legal knowledge base map by matching a corresponding spark ql language query template according to an intention recognition result, giving a first confidence coefficient to the first retrieval result, performing second retrieval on the simple recognition result in a full-text search engine, giving a second confidence coefficient to the second retrieval result, outputting a final retrieval result according to the first confidence coefficient and the second confidence coefficient, because the legal knowledge graph can fully mine the connection between the entities, the complex multi-hop semantic retrieval can be completed, and in addition, and even if the semantic retrieval of the legal knowledge graph has output which cannot meet the requirements, the retrieval result can be returned based on the full-text search engine, so that the retrieval efficiency and accuracy are further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a flowchart of a case information semantic retrieval method based on a knowledge graph according to an embodiment of the present application.

Fig. 2 is a flowchart of a case information semantic retrieval method based on a knowledge graph according to another embodiment of the present application.

Fig. 3 is a flowchart of a semantic case information retrieval method based on a knowledge-graph according to another embodiment of the present application.

Fig. 4 is a legal knowledge base in a case information semantic retrieval method based on a knowledge base according to an embodiment of the present application.

Fig. 5 is a flowchart of an error correction method in a case information semantic retrieval method based on a knowledge graph according to an embodiment of the present application.

Fig. 6 is a flowchart of an error correction method in a case information semantic retrieval method based on a knowledge graph according to an embodiment of the present application.

Fig. 7 is a functional block diagram of a case information semantic retrieval device based on a knowledge-graph according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail below. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a flowchart of a case information semantic retrieval method based on a knowledge-graph according to an embodiment of the present application, and as shown in fig. 1, the case information semantic retrieval method based on a knowledge-graph includes:

s11: constructing a legal knowledge map according to the legal documents;

s12: performing simple recognition and intention recognition on a problem input by a user;

s13: defining a sparql language query template, matching a corresponding sparql language query template according to the intention recognition result, and performing first retrieval in the legal knowledge base map to give a first confidence coefficient to the first retrieval result;

s14: building a full-text search engine, performing second retrieval on the simple recognition result in the full-text search engine, and giving a second confidence coefficient to the second retrieval result;

s15: and outputting a final retrieval result according to the first confidence degree and the second confidence degree.

Conventionally, there are many problems in knowledge-graph-based retrieval and question-answering systems in the legal field, for example, the field of knowledge-graph application is mainly encyclopedic knowledge question-answering in the general field, and cannot be effectively used in a specific field; most of the retrieval is based on simple one-hop query, good results cannot be obtained by multi-hop complex semantic retrieval, and the relation between entities cannot be fully mined; the retrieval system is based on semantic retrieval of knowledge graph only, and can not stably output query results when the semantic retrieval can not find suitable answers.

For the law and law field, the information in the legal documents is very important, but as the fact content of the case in the legal documents is complicated, the user is difficult to accurately match the content which the user wants to query through certain characteristics in the process of retrieval. The method comprises the steps that a legal knowledge graph is built according to legal documents, so that the knowledge graph specific to the legal field is built, the knowledge graph can be effectively applied to the legal field, the legal knowledge graph can fully identify the relation between entities, complex multi-hop semantic retrieval is completed, a confidence threshold value is preset for a first confidence coefficient, and when the first confidence coefficient exceeds the confidence threshold value, a first retrieval result obtained by performing first retrieval on the legal knowledge graph is used as a final retrieval result; otherwise, a second retrieval result obtained by performing a second retrieval in a full-text search engine such as an elastic search is taken as a final retrieval result. Therefore, when the first retrieval result obtained by the first retrieval based on the legal knowledge graph cannot reach the confidence coefficient threshold value, namely the first retrieval result is not ideal, the system outputs the second retrieval result as the final retrieval result, and therefore stable output of the system can be achieved.

It should be noted that the confidence threshold may be selected according to actual needs, and the present application is not limited thereto.

In the embodiment, the legal knowledge base is constructed, the problems input by the user are simply identified and intented to be identified, the corresponding spark ql language query template is matched according to the intented identification result to carry out first retrieval in the legal knowledge base, a first confidence coefficient is given to the first retrieval result, the simple identification result is subjected to second retrieval in the full-text search engine, a second confidence coefficient is given to the second retrieval result, and the final retrieval result is output according to the first confidence coefficient and the second confidence coefficient.

The embodiment of the invention provides another case information semantic retrieval method based on a knowledge graph, which is a flow chart shown in fig. 2 and comprises the following steps:

s21: collecting user search questions, labeling the questions to make question-answer pairs, wherein the question-answer pairs comprise question corpus Q and answer candidates A;

s22: expressing question corpus Q and answer candidate A as two vectors f (Q) and g (A) respectively, calculating the distances of f (Q) and g (A) in a vector space, and scoring different answer candidates A according to the distances;

the method comprises the steps of utilizing L STM algorithm to conduct vector representation on a question subjected to word segmentation processing through a question-answer pair and word embedding space to obtain question vector representation f (Q) and knowledge map vector representation g (A), and conducting vector representation on a knowledge map candidate list after a question entity is extracted through a TransE method, wherein the word embedding space, L STM algorithm and TransE method are required to be the prior art and are not described in detail here.

Calculating the distance between f (Q) and g (A) on the vector space, namely the score of answer candidate A:

S(Q,A)＝f(Q)^Tg(A)。

s23: training the neural network according to question and answer pairs and scores, specifically comprising:

a plurality of wrong answers are generated for each question-answer pair of the training set by utilizing a negative sampling method, and the training aim is to enable the distance between the question Q and the correct answer A to be larger than the difference between the distance between the question Q and the wrong answer A.

S24: inputting the question input by the user into the trained neural network to obtain the best matching answer vector, which specifically comprises

Spelling error correction, word segmentation, entity extraction, predicate matching and type matching are carried out on the user problems;

finding the corresponding entity relation in the knowledge graph by the named entity set in the question to obtain an answer vector;

and converting the answer vector into a pre-edited knowledge graph entity and relationship number through a trained TransE model, and obtaining the best matching answer vector through a trained neural network.

S25: converting the best matching answer vector into a third retrieval result in a natural language form;

and restoring the best matching answer vector into a third retrieval result of the natural language through a knowledge graph query interface.

S26: assigning a first confidence to the third retrieval result and the first retrieval result together;

s27: and outputting a final retrieval result according to the first confidence degree and the second confidence degree.

It should be noted that the first search result, the second search result, and the third search result may be search lists, when the search result is a search list, the confidence of each search result needs to be sorted, and if the confidence of the first search result after the third search result and the first search result are comprehensively sorted is higher than a preset confidence threshold, the search result corresponding to the highest confidence is output as the final search result.

In this embodiment, after the user problems are accumulated to a certain extent, the accuracy of semantic retrieval can be further improved by outputting the retrieval result by using the neural network.

The embodiment of the invention provides another case information semantic retrieval method based on a knowledge graph, which is a flow chart shown in fig. 3 and comprises the following steps:

s31: extracting key information according to case description, fact part and constituent elements in the legal document;

the constituent elements comprise at least one of a law case related to each case, a crime name of a judged notifier and a criminal period length;

extracting key information can expand the entities, relationships and attributes in the legal knowledge base.

S32: generating a triple according to the key information, and constructing a legal knowledge base map according to the triple;

triplets include, but are not limited to: a triple taking description words depending on the entity as relation words; triples generated in the form of "principal and predicate guest" and "principal table"; triplets combined in the form of relatives describing entities and "principals and predicates".

The law enforcement knowledge map is shown in fig. 4, and includes law enforcement agent ID, law enforcement officer ID, administrative law enforcement case ID, party ID, statute of law provision ID, specific contents, and the like.

S33: simply identifying the problem input by the user;

in some embodiments, specific implementations of S33 include, but are not limited to:

The recognition result comprises an entity, an attribute and an attribute value recognition result, and the setting of the confidence degree priority of the recognition result comprises the following steps:

the corresponding confidence given to each entity specifically includes:

the confidence coefficient is 1 when the entity in the legal system knowledge map is completely matched with the entity in the legal system knowledge map;

matching the entity synonym dictionary with the entity, wherein the confidence coefficient is 0.9;

and the confidence coefficient of an entity obtained by fuzzy matching similarity calculation and longest common substring matching is 0.8.

The giving of the corresponding confidence to each predicate specifically includes:

the attribute in the legal system knowledge graph is completely matched, and the confidence coefficient is 1;

based on predicate synonym dictionary matching, the confidence is 0.9;

and the confidence coefficient of a predicate obtained by fuzzy matching similarity calculation and longest common substring matching is 0.8.

In some embodiments, after the simple recognition result is an attribute value, the giving of the corresponding confidence to each attribute value specifically includes: counting the most frequent attribute names in the attribute names corresponding to the attribute values; when the attribute name is absent in the query statement, the most frequent attribute name is used as the complemented attribute name.

The type matching comprises the following steps: and establishing a type synonym matching dictionary for matching the types of the entities in the legal knowledge graph.

S34: and performing a second retrieval on the simple identification result in the full-text search engine, wherein the second retrieval comprises the following steps:

presetting entity and attribute peer-to-peer strategies;

and screening out a second retrieval result according to the grading result.

As an optional implementation manner of the present invention, defining the spark ql language query template includes but is not limited to:

the method comprises the steps that according to entity-attribute relation, a spark ql language query template is defined through the entity-entity relation;

and secondly, defining a spark ql language query template based on the query words.

Therefore, a universal template of the problems input by the user is constructed, and as a neural network algorithm needs a large amount of linguistic data when being applied to question answering and retrieval of a knowledge graph, for real projects, the data size of an effective training model is often difficult to find according to specific data, and training results are difficult to migrate. Therefore, the embodiment provides a mode of combining the template and the neural network, which not only can make full use of a large amount of user questioning data, but also can ensure the accuracy of the retrieval result.

Although the present application is a search method customized according to legal data, it is possible to construct other domain knowledge maps by using the same or similar method for constructing legal knowledge maps as the present application according to actual needs, so that the search method can be commonly used in each domain.

S35: performing intention recognition on a question input by a user;

s36: defining a sparql language query template, matching a corresponding sparql language query template according to the intention recognition result, and performing first retrieval in the legal knowledge base map to give a first confidence coefficient to the first retrieval result;

the method for giving the intention recognition result with the intention template confidence coefficient specifically comprises the following steps:

the confidence of the full match intent template is 1;

the incomplete match intent template confidence is 0.

And carrying out entity recognition on the user problems to obtain a possible subject list, and giving a corresponding entity confidence coefficient to each subject according to a preset rule.

And identifying attributes and attribute values of the user problems to obtain possible predicates and attribute value lists, and endowing each predicate and a predicate confidence corresponding to the attribute value according to a preset rule.

First confidence, intent template confidence × entity confidence × predicate confidence.

Because the semantic retrieval based on the knowledge graph is only used, the results can not be stably obtained sometimes, in the embodiment, the results obtained by the two retrieval modes are graded and sorted by combining the legal knowledge graph with the full-text search engine, the retrieval result with the highest rank is output, and the retrieval result can be stably obtained for each problem.

As shown in fig. 5, an embodiment of the present invention provides a flowchart of a case information semantic retrieval method based on a knowledge graph, where the case information semantic retrieval method based on a knowledge graph includes:

step 1: and establishing a legal knowledge graph based on legal documents, a problem template library and retrieval based on an elastic search engine.

In some embodiments, establishing a legal knowledge graph based on a legal instrument specifically comprises: crawling criminal law documents on a Chinese referee document network, and filtering out webpage elements such as html labels and texts with irrelevant knowledge; taking legal documents in a certain time period in a certain area as linguistic data, extracting entities and attributes, and constructing a structured legal knowledge map.

In some embodiments, further comprising: constructing a basic education knowledge graph, which specifically comprises the following steps:

referring to knowledge graphs in general fields such as schema.

Determining concepts and relationships between the concepts and constraints thereof according to legal knowledge;

inviting experts and teachers in the legal field to carry out auditing and completing the body construction process;

and extracting information from the text by using a machine learning method, wherein the information comprises entity set expansion, relation extraction and the like.

By constructing the knowledge graph in various forms in the treatment field, the effective use of the knowledge graph in the treatment field is further ensured.

Establishing a problem template base according to the existing legal knowledge mapping, which specifically comprises the following steps: and establishing a sparql language query template corresponding to query for the relationship between the entities and the attributes in the legal knowledge graph and the multi-hop relationship between the entities. The sparql language query template correspondence is shown in table 1.

TABLE 1 spark ql language query template correspondences

In some embodiments, establishing the search based on the elastic search engine specifically includes:

an extensible open source full text search and analysis engine elastic search is set up, and the elastic search provides a distributed full text search engine with multi-user capability so as to support instant query and retrieval of massive texts.

And adding the triple into the elastic search index according to the elastic search index format.

Step 2: spelling error correction is carried out on the user input problem;

most data in the legal field are unsupervised data, and manual labeling is time-consuming and labor-consuming, so that the unsupervised method is selected for spelling error correction. The Chinese spelling error correction is realized by combining an n-gram language model and a confusion set.

As shown in fig. 6, the error correction method specifically includes: acquiring a user problem, and using a word segmentation tool to combine with a user-defined dictionary to segment words to obtain a word segmentation text; outputting word-level replacement candidate sentences for individual words using a word-level obfuscation set; outputting wrong word correction candidate sentences based on the trained n-gram language model; and through word-pronunciation conversion, word-level confusion is used for words to gather and output error word correction candidate sentences, and an error correction result is output based on n-gram language model scoring.

And step 3: performing word segmentation, part of speech tagging and entity attribute identification on question sentences, and specifically comprising the following steps:

for the attribute names and the entity names, all the entity names and the attribute names are added into a custom segmentation dictionary to ensure correctness, and the custom segmentation dictionary comprises an attribute synonym dictionary, a type synonym matching dictionary and the like.

Two entities, entities and attribute values are retrieved by a general search engine (e.g., a necessity search engine), and characters appearing between entities and attributes are used as an attribute synonym dictionary.

For the attribute value, a fuzzy matching method can be adopted, a method of n-gram language model retrieval after word segmentation can also be adopted, after a phrase is judged to be the attribute value, the attribute name corresponding to the attribute value is counted to obtain the most frequent attribute name corresponding to the attribute value, and when the attribute name is absent in the query sentence, the most frequent attribute name corresponding to the attribute value is used as the completed attribute name.

For type, a type synonym matching dictionary is established for matching the types of entities in the knowledge base.

The entity attribute recognition comprises entity recognition, attribute recognition and attribute value recognition, and the entity attribute recognition is realized by adopting methods such as knowledge base matching, template segmentation, synonym dictionary query, similarity calculation, longest common substring matching and the like and setting priority according to the confidence of each method to obtain a candidate entity set.

The entity identification priority setting rule is as follows:

the template segmentation is used for completely matching with the entities in the knowledge base, and the confidence coefficient is 1;

and setting the confidence of similarity calculation and the longest common substring matching method to be 0.8.

The attribute identification priority setting rule is as follows:

obtaining a possible predicate list according to the user problem, and endowing each predicate with a corresponding confidence coefficient according to a preset rule:

the attribute in the knowledge base is completely matched, and the confidence coefficient is 1;

based on predicate synonym dictionary matching, the confidence is 0.9;

and the confidence coefficient of an example obtained by fuzzy matching similarity calculation and longest common substring matching is 0.8.

The attribute value identification priority setting rule is as follows:

the attribute values in the knowledge base are completely matched, and the confidence coefficient is 1;

And 4, step 4: performing intention recognition on the question sentences, specifically comprising:

and performing entity recognition on the user problem to obtain a possible subject list, and endowing each subject with a corresponding confidence coefficient according to a preset rule.

And identifying attributes and attribute values of the user problems to obtain possible predicates and attribute value lists, and endowing each predicate and the corresponding confidence coefficient of the attribute value according to a preset rule.

And 5: generating a structured query statement by query and matching of a question template, which specifically comprises the following steps:

after all entity names, attribute names and attribute values in the query are identified, the type of the query is determined according to the number and the positions of the entity names, the attribute names and the attribute values, and the process is as follows:

step 51: if only the entity name exists, the entity query is carried out.

Step 52: if there is one entity name, there is only one attribute name. Such as: what is the (name) of "(case _ c 39)? "is then an entity attribute direct query.

Step 53: if there is one entity name, multiple attribute names:

it may be that multiple attributes of the lookup entity, such as what are (name) and (law enforcement certificate number) of "(staff _ c 39)? "

Select? o where { < entity name > < attribute name >? o } union { < entity name > < attribute name >? o }

Possibly a multi-hop lookup of attributes, as is (case name) for "(staff _053) (law enforcement)? "

Select? o where { < entity name > < attribute name >? s.? s < attribute name >? o

If the query does not result, it may be that a one-hop relationship is omitted from the question, as is (the subject name) of "(staff _053) (law enforcement)? ", need to be added and then retrieved again:

select? o where { < entity name > < attribute name >? s.? s? p? s 2.? s2< attribute name >? o }

In some embodiments, the query is passed through three hops at most, and no results are added.

Step 54: if only the attribute name and attribute value:

one or more attribute names and attribute values (the number of attribute names and attribute values may not be the same), a type is identified. Such as which cases (cases) are "(2019) (fine) (1000 yuan). The entity is looked up according to the attributes. The specific query sentence is determined by a plurality of attribute names in the question sentence.

Select? s where {? s < attribute name > < attribute value >? s < type > < case > }

One or more attribute names, an attribute value, no type identified. Then other attributes are looked up based on the attributes of the entity or attributes of another entity are looked up based on the attributes of the entity.

In this case, the corresponding entities are searched according to the attribute values, and then the entities are sequentially searched according to the attribute names.

Looking up other attributes based on entity attributes, such as: e.g. (the plum-shaped cell)? "

Select? o where {? s? p < attribute value >? s < attribute name >? o

What is the (case name) of the (li-script) (law enforcement) entity's attribute, based on the entity's attribute, found another entity? "

Select? o where {? s? p < attribute value >? s < attribute name >? s2.s2< attribute name >? o }

An attribute value and a number of attribute names, the attribute names being among the attribute values, e.g., (case name) of the (plum cell) (law enforcement)? "

Then the attribute value is searched in a multi-hop mode according to the attribute value: select? o where {? s < attribute name > < attribute value >? s < attribute name >? .}

Step 55: if there are only attribute values, such as "(2019) what cases occur in (hai-lake zone)? "

From the type dictionary, the type "case" is recognized.

Select? s <? s < type > < case >? s? p < attribute value >? s? p < attribute value > }

Step 56: if there are two entities, then find another entity according to the entity and the relationship between the entities.

Step 6: and returning an answer list, selecting the answer with the highest score, and displaying the matched answer template.

Generating sparql sentences according to a sparql language query template, querying in a legal knowledge base map to obtain an answer list, giving each answer a score according to a preset rule, and sequencing the answers based on the legal knowledge base map according to the scores;

and obtaining entity and attribute pairs with different levels in the problem according to a preset strategy by using an entity identification method.

Inputting the entity and attribute pairs with different levels in the semantic parsing step into an elasticsearch engine to obtain an answer list; scoring the answers according to the entity attributes; sorting answers based on the full text search engine according to scores;

and if the answer with the highest score based on the legal knowledge map source exceeds a preset threshold value, returning the answer.

If the highest scoring answer based on the forensic knowledge graph sources does not exceed the preset threshold, the highest scoring answer based on the full text search engine is returned.

And 7: after accumulating a certain number of user questions, marking the questions to make question-answer pairs;

and 8: expressing question corpus Q and answer candidate A as two vectors f (Q) and g (A) respectively, calculating the distance between Q and A on a vector space, and training a neural network by using a question-answer pair;

and step 9: and converting the best matching answer vector obtained through the neural network into a natural language form retrieval result, and adding the natural language form retrieval result into an answer list based on the legal knowledge graph to participate in grading and sorting.

Fig. 7 is a functional block diagram of a case information semantic retrieval device based on a knowledge-graph according to an embodiment of the present application, and as shown in fig. 7, the case information semantic retrieval device based on a knowledge-graph includes:

the construction module 71 is used for constructing a legal knowledge base according to the legal documents;

an identification module 72 for performing simple identification and intention identification on the question input by the user;

the first retrieval module 73 is used for defining a spark ql language query template, matching the corresponding spark ql language query template according to the intention recognition result, performing first retrieval in the legal knowledge base map, and giving a first confidence coefficient to the first retrieval result;

the second retrieval module 74 is used for building a full-text search engine, performing second retrieval on the simple identification result in the full-text search engine, and giving a second confidence coefficient to the second retrieval result;

and an output module 75, configured to output a final retrieval result according to the first confidence level and the second confidence level.

And a third retrieval module 76 for generating a third retrieval result by using a neural network algorithm.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

The case information semantic retrieval device based on the knowledge graph, provided by the embodiment, is characterized in that a legal knowledge graph is constructed through a construction module according to legal documents, the identification module is used for simply identifying and intention identifying problems input by a user, a first retrieval module defines a spark ql language query template, a corresponding spark ql language query template is matched according to intention identification results to perform first retrieval in the legal knowledge graph, a second retrieval module constructs a full-text search engine, the simple identification results are subjected to second retrieval in the full-text search engine, a third retrieval module generates third retrieval results by utilizing a neural network algorithm, and an output module outputs the final retrieval results according to the output module. The method can complete complex multi-hop semantic retrieval, can return retrieval results based on a full-text search engine even if the semantic retrieval of the legal knowledge base has output which cannot meet requirements, and further improves the retrieval efficiency and accuracy by generating a third retrieval result through a neural network.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

It should be noted that the present invention is not limited to the above-mentioned preferred embodiments, and those skilled in the art can obtain other products in various forms without departing from the spirit of the present invention, but any changes in shape or structure can be made within the scope of the present invention with the same or similar technical solutions as those of the present invention.

Claims

1. A knowledge graph-based case information semantic retrieval method is characterized by comprising the following steps:

constructing a legal knowledge map according to the legal documents;

2. The knowledge-graph-based case information semantic retrieval method according to claim 1, characterized by further comprising:

generating a third retrieval result by using a neural network algorithm;

3. The knowledge-graph-based case information semantic retrieval method according to claim 2, wherein the generating of the third retrieval result by using a neural network algorithm comprises:

4. The case information semantic retrieval method based on knowledge-graph according to claim 1, characterized in that the construction of legal knowledge-graph according to legal documents comprises:

5. The knowledge-graph-based case information semantic retrieval method according to claim 1, wherein the simple recognition of the user-input question comprises:

6. The knowledge-graph-based case information semantic retrieval method according to claim 5, characterized in that after the simple recognition result is an attribute value, the method further comprises:

7. The knowledge-graph-based case information semantic retrieval method according to claim 1, characterized in that the first confidence degree is an intention template confidence degree × entity confidence degree × predicate confidence degree.

8. The knowledge-graph-based case information semantic retrieval method according to claim 1, wherein the second retrieval of the simple recognition result in the full-text search engine comprises:

presetting entity and attribute peer-to-peer strategies;

and screening out a second retrieval result according to the grading result.

9. The knowledge-graph-based case information semantic retrieval method according to claim 1, wherein the defining a spark ql language query template comprises:

and/or (c) and/or,

sparql language query templates are defined based on the query terms.

10. A knowledge graph-based case information semantic retrieval device is characterized by comprising: