CN111291152A - Case document recommendation method, device, equipment and storage medium - Google Patents

Case document recommendation method, device, equipment and storage medium Download PDF

Info

Publication number
CN111291152A
CN111291152A CN201811491656.2A CN201811491656A CN111291152A CN 111291152 A CN111291152 A CN 111291152A CN 201811491656 A CN201811491656 A CN 201811491656A CN 111291152 A CN111291152 A CN 111291152A
Authority
CN
China
Prior art keywords
case document
label
case
document
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811491656.2A
Other languages
Chinese (zh)
Inventor
李亚博
谢海华
陈雪飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Original Assignee
Pku Founder Information Industry Group Co ltd
Peking University Founder Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pku Founder Information Industry Group Co ltd, Peking University Founder Group Co Ltd filed Critical Pku Founder Information Industry Group Co ltd
Priority to CN201811491656.2A priority Critical patent/CN111291152A/en
Publication of CN111291152A publication Critical patent/CN111291152A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a case document recommendation method, a case document recommendation device, case document recommendation equipment and a case document recommendation storage medium, wherein a user query sentence is received, a label word is extracted from the query sentence, a graph database is searched according to the label word to obtain a case document ID set corresponding to the label word, a document database is queried according to the case document ID set to obtain a case document set corresponding to the case document ID set. In the scheme, the natural language understanding and graphic database technology is applied, the query intention of a user is determined through extracting the key information of the query statement and interacting with the graphic database, and the related document ID is accurately returned from the graphic database with strict organization, so that the corresponding case document set is returned from the document database, and the user requirements are met.

Description

Case document recommendation method, device, equipment and storage medium
Technical Field
The invention relates to the field of information retrieval and text information processing, in particular to a case document recommendation method, device, equipment and storage medium.
Background
The informatization and the publicization of case documents become an important direction for promoting judicial reform in recent years. How to provide accurate legal consulting services for law practitioners and the masses by utilizing AI (Artificial Intelligence, AI for short) technology has become a focus of attention in the scientific and technological field. The case document mainly refers to a complete judgment book, which comprises case trial and error, original debate and debate, related law, case fact identification statement, court judgment result and basis thereof, so the case document is the first hand of information for consulting similar case information.
However, returning relevant documents from millions or even tens of millions of cases document data according to the query intention of the querier in a very detailed manner has two difficulties: one is that an information querier may not have the ability to organize query statements using accurate legal terms during the query process. For example: a case party involved in multiple vehicle tailgating events may not know at all that a similar "multiple vehicle injury" professional legal term should be used to search when retrieving similar cases. Secondly, the efficiency of case document data entering into the database and the organization structure of the data need to be improved. First, when a new case document is entered into the database, it still needs to be manually labeled to determine the category to which it belongs. Secondly, the classification category of the case document is too rough compared with the query requirement of the inquirer, so that the inquirer is forced to carry out query by self-organizing query sentences. However, since the current service platforms providing similar case documents determine the intentions of the inquirers based on keyword search, if the inquirers cannot provide accurate inquiry words, the inquiry results cannot meet the requirements.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a storage medium for recommending case documents, so as to analyze one or more tag words through multiple rounds of dialog with an inquirer by using natural language understanding and graph database technologies, and perform accurate search in a legal document database by using the tag words and return a corresponding document.
In a first aspect, an embodiment of the present invention provides a method for recommending a case document, including:
receiving a user query statement;
extracting tag words from the query sentence;
searching in a graphic database according to the label words, and acquiring a case document ID set corresponding to the label words;
inquiring a document database according to the case document ID set to obtain a case document set corresponding to the case document ID set;
wherein, the graph database stores label words and corresponding case document IDs thereof; the document database stores the ID of each case document and the corresponding original case document.
In a possible implementation manner, in the foregoing method provided in an embodiment of the present invention, before receiving the user query statement, the method further includes:
establishing a one-to-one corresponding relation between the case documents and the case document IDs, and storing the case documents and the case document IDs in the document database;
extracting a case document title from the original case document, and constructing case document metadata according to a case document ID and the case document title corresponding to the original case document;
identifying and labeling the case documents by using the trained multi-label classification model to generate corresponding case document labels;
and establishing a corresponding relation between the case document metadata and the case document label, and inserting the case document metadata and the case document label into a graphic database.
In a possible implementation manner, in the method provided in an embodiment of the present invention, before the identifying and labeling the case document by using the trained multi-label classification model, the method further includes:
extracting information paragraphs from the original case document to obtain plain text paragraphs of the case document;
performing word segmentation processing on the pure text paragraphs of the case document according to a legal vocabulary word segmentation dictionary to obtain a first word bag;
carrying out special word replacement on words in the first word bag to obtain a second word bag;
constructing a first text vector by using a vector constructor according to the second word bag, and carrying out denoising processing on the first text vector to obtain a second text vector;
dividing the second text vector into a training data set and a test data set;
and training and testing and evaluating the multi-label classification model based on machine learning by using the training data set and the testing data set to obtain the trained multi-label classification model.
In a possible implementation manner, in the method provided in an embodiment of the present invention, the extracting a tag word from the query statement specifically includes:
segmenting the query sentence by using a word segmentation dictionary stored with keywords to generate a third word bag, and generating a keyword set according to the third word bag;
judging whether the keyword set is empty or not;
if the keyword set is not empty, judging whether each keyword in the keyword set is the same as a tag word;
and if the keyword set is the same as the label word, taking the keyword in the keyword set, which is the same as the label word, as the label word.
In a possible implementation manner, in the method provided in an embodiment of the present invention, after determining whether each keyword in the keyword set is the same as a tag word if the keyword set is not empty, the method further includes:
if not, entering a graphic database to search for a corresponding label node, wherein the label node is associated with the case document metadata;
constructing a new keyword set according to the keywords corresponding to the child nodes of the label nodes;
constructing a reply sentence comprising the new keyword set;
and sending the reply sentence to a user.
In a second aspect, an embodiment of the present invention provides an apparatus for recommending case documents, including:
the receiving module is used for receiving a user query statement;
the extraction module is used for extracting the label words from the query sentences;
the retrieval module is used for retrieving in a graphic database according to the label words and acquiring a case document ID set corresponding to the label words;
the query obtaining module is used for querying a document database according to the case document ID set to obtain a case document set corresponding to the case document ID set;
wherein, the graph database stores label words and corresponding case document IDs thereof; the document database stores the ID of each case document and the corresponding original case document.
In a possible implementation manner, in the apparatus provided in an embodiment of the present invention, the apparatus further includes:
the establishing module is used for establishing a one-to-one corresponding relation between the case documents and the case document IDs before the receiving module receives the user query sentences, and storing the one-to-one corresponding relation in the document database; extracting a case document title from the original case document, and constructing case document metadata according to a case document ID and the case document title corresponding to the original case document; identifying and labeling the case documents by using the trained multi-label classification model to generate corresponding case document labels; and establishing a corresponding relation between the case document metadata and the case document label, and inserting the case document metadata and the case document label into a graphic database.
In a possible implementation manner, in the apparatus provided in an embodiment of the present invention, the apparatus further includes: a classification model training module: the system comprises an establishing module, a judging module and a judging module, wherein the establishing module is used for extracting information paragraphs of an original case document before the establishing module utilizes a trained multi-label classification model to identify and label the case document to obtain plain text paragraphs of the case document; performing word segmentation processing on the pure text paragraphs of the case document according to a legal vocabulary word segmentation dictionary to obtain a first word bag; carrying out special word replacement on words in the first word bag to obtain a second word bag; constructing a first text vector by using a vector constructor according to the second word bag, and carrying out denoising processing on the first text vector to obtain a second text vector; dividing the second text vector into a training data set and a test data set; and training and testing and evaluating the multi-label classification model based on machine learning by using the training data set and the testing data set to obtain the trained multi-label classification model.
In a possible implementation manner, in the foregoing apparatus provided in an embodiment of the present invention, the extracting module includes:
the keyword unit is used for segmenting the query sentence by utilizing a segmentation dictionary which stores keywords to generate a third word bag, and generating a keyword set according to the third word bag;
the label word unit is used for judging whether the keyword set is empty or not; if the keyword set is not empty, judging whether each keyword in the keyword set is the same as a tag word; and if the keyword set is the same as the label word, taking the keyword in the keyword set, which is the same as the label word, as the label word.
In a possible implementation manner, in the apparatus provided in an embodiment of the present invention, the apparatus further includes:
the construction module is used for judging whether each keyword in the keyword set is the same as a label word in the label word unit, if not, entering a graphic database to search for a corresponding label node, and the label node is associated with the case document metadata; constructing a new keyword set according to the keywords corresponding to the child nodes of the label nodes;
a reply module for constructing a reply sentence including the new keyword set; and sending the reply sentence to a user.
In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory and a processor;
the memory for storing a computer program;
wherein the processor executes the computer program in the memory to implement the method of any one of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program is used for implementing the method in any one of the first aspect when executed by a processor.
The case document recommendation method, the case document recommendation device, the case document recommendation equipment and the case document recommendation storage medium receive a user query statement, extract a label word from the query statement, retrieve in a graphic database according to the label word to obtain a case document ID set corresponding to the label word, query a document database according to the case document ID set to obtain a case document set corresponding to the case document ID set. The graph database stores label words and corresponding case documents IDs, and the document database stores the case documents IDs and corresponding original case documents. In the scheme, the natural language understanding and graphic database technology is applied, the query intention of a user is determined through extracting the key information of the query statement and interacting with the graphic database, and the related document ID is accurately returned from the graphic database with strict organization, so that the corresponding case document set is returned from the document database, and the user requirements are met.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a schematic flow chart of a case document recommendation method according to an embodiment of the present invention;
fig. 1A is a schematic flowchart of step S102 in a method for recommending a case document according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating the process of creating a graphic database according to an embodiment of the present invention;
FIG. 3 is a schematic flowchart of a multi-label classification model training process according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a graphic database of hit-and-run classes according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a case document recommendation apparatus according to a second embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other examples obtained based on the examples in the present invention are within the scope of the present invention.
Example one
Fig. 1 is a schematic flow chart of a case document recommendation method according to an embodiment of the present invention, and as shown in fig. 1, the method may include the following steps:
s101, receiving a user query statement.
In practical applications, the execution subject of the embodiment may be a recommendation device of case documents. In practical applications, the recommendation apparatus of the case document may be implemented by a virtual apparatus, such as a software code, or by an entity apparatus written with a relevant execution code, such as a usb disk, or by an entity apparatus integrated with a relevant execution code, such as a chip, an intelligent terminal, or the like.
Specifically, the user makes legal consultation about a case, usually consults similar case information, and therefore, the user organizes query sentences according to the own query intention and inputs a recommendation device of the case document.
And S102, extracting the label words from the query sentence.
The user may not have a deep legal background and therefore the vocabulary selected in describing the problem is too colloquial. For example: a case party involved in multiple vehicle tailgating events may not know at all that a similar "multiple vehicle injury" professional legal term should be used to search when retrieving similar cases. In order to determine the query intention of the user, according to an embodiment of the present invention, as shown in fig. 1A, step S102 may specifically include the following steps:
s102a, segmenting the query sentence by using a segmentation dictionary with stored keywords to generate a third bag, and generating a keyword set according to the third bag.
Specifically, the user query sentence is segmented into word bags by using a segmentation dictionary which stores keywords. And searching words or phrases with the same or similar meanings as the keywords in the word bag by using a word embedding technology, converting the words or phrases into corresponding keywords and putting the keywords into the keyword set, and directly putting the keywords into the keyword set if the word bag contains the keywords.
S102b, judging whether the keyword set is empty or not. And if the answer is null, constructing a reply statement and sending the reply statement to the user.
Specifically, if the keyword set obtained by word segmentation is empty, it indicates that any information related to the case cannot be obtained according to the user query statement, and at this time, a reply statement is constructed and sent to the user to prompt the user to re-input the query statement.
S102c, if the keyword set is not empty, judging whether each keyword in the keyword set is the same as a tag word. If so, the process proceeds to step S102 d. If not, the step S102e is performed.
S102d, the keywords in the keyword set, which are the same as the label words, are used as the label words.
S102e, entering a graph database to search for corresponding label nodes, wherein the label nodes are associated with case document metadata, constructing a new keyword set according to keywords corresponding to child nodes of the label nodes, constructing a reply sentence comprising the new keyword set, and sending the reply sentence to a user.
Specifically, if no label word is found in the keyword set, the non-label node in the keyword set is found in the graph database, and after the corresponding node is found, the next-level sub-node is found according to the relation included in the node and a new keyword set is generated by using the label name of the sub-node. A reply sentence including the new keyword set is then constructed and sent to the user to prompt the user to confirm the tagged words. Through the above multiple rounds of dialog based on natural language understanding and multiple rounds of interaction with the graphical database, the user query intent can be confirmed.
For example, a recommendation device of a case document is abbreviated as "recommendation device", and multiple rounds of dialogs based on natural language understanding are as follows:
the user: i want to see the case document of the traffic class.
The recommendation device: traffic offences can be subdivided into five sub-categories: responsibility subjects, responsibility constitution, alleviation or disclaimer of responsibility origin, responsibility mode, litigation procedure. Which category you are interested in?
The user: i want to see what is in the way of responsibility.
The recommendation device: the responsibility mode comprises a classification basis: loss of indemnity; and six kinds of labels: accident responsibility cannot be identified, major responsibility is reported, minor responsibility is reported, all responsibility is reported, both responsibility is equal, and responsibility is reported without responsibility.
The user: please help me find a document that is fully covered with responsibility.
S103, searching in a graph database according to the label words, and acquiring a case document ID set corresponding to the label words.
Specifically, the graph database stores tag words and corresponding case documents IDs. Case document ID is a one-to-one correspondence identification code generated for the case document using a universal unique identification code generator. And searching in the graphic database through the label words to obtain a case document ID set corresponding to the label words.
And S104, inquiring a document database according to the case document ID set to obtain a case document set corresponding to the case document ID set.
Specifically, each case document ID and the original case documents corresponding to the case document ID one by one are stored in the document database, and the document database is queried according to the case document ID set, so that the case document set corresponding to the case document ID set can be obtained.
Optionally, the graph database is pre-established, and according to an embodiment of the present invention, as shown in fig. 2, before performing the step S101, the method may further include the following steps:
s201, establishing a one-to-one corresponding relation between the case documents and the case document IDs, and storing the case documents and the case document IDs in the document database.
S202, extracting a case document title from the original case document, and constructing case document metadata according to a case document ID and the case document title corresponding to the original case document.
Specifically, a case document title can be extracted from a case document in an original XML format, case document metadata is constructed according to a case document ID and the case document title corresponding to the original case document, and a type of a node is indicated to be distinguished from other types of nodes when a metadata node is created in a graph database, for example: "MERGE (n: File { ID: 'case document ID', title: 'case document title' })".
S203, identifying and labeling the case documents by using the trained multi-label classification model to generate corresponding case document labels.
And S204, establishing a corresponding relation between the case document metadata and the case document labels, and inserting the case document metadata and the case document labels into a graphic database.
Specifically, label matching is carried out on a label word set predicted BY case document metadata according to a trained multi-label classification model in a graphic database, and the relation between the case document and the label words is established, wherein the label word set comprises ' MATCH (a: File), (b: Tag) WHERE b.name ═ label name ' CREATE (a) [: TAGGED _ BY ] - > (b) '.
The trained multi-label classification model is obtained by training the multi-label classification model in advance, and the multi-label classification model is used for labeling the existing or newly added case documents in the future so as to assist in creating metadata of the case documents stored in the graphic database.
The multi-label classification model comprises a series of two-classification and multi-classification models. Taking the hit-and-run case document as an example, as shown in fig. 4, it can be subdivided into five sub-categories, and these sub-categories can be further subdivided into 42 more specific labels, which belong to each other, and the sub-categories described in the case document are largely different. Taking the "responsibility mode" subclass as an example, such case documents contain six different label words such as "accident responsibility cannot be identified", "reported main responsibility", "reported secondary responsibility" and the like, and a case document cannot contain multiple labels in the case documents at the same time, but only one label is possible, so that a multi-label classification model can be trained for the labels. While for some labels, such as "multiple vehicle injury," a two classification model is suitable.
FIG. 4 illustrates a graph database of hit traffic classes, which contains five different types of nodes: case type, subclass, classification basis, label, case metadata. There are four relationships between nodes, which are: the case type includes a (: CONTAINS) sub-class, the sub-class is BASED ON a (: BASED _ ON) classification basis, the sub-class and the classification basis contain a (: HAS) label, and the case metadata is labeled with a label (: TAGGED _ BY). The case type, the subclass, the classification basis and the name of the label node are the attributes, which are collectively called keywords, wherein the name of the label node is also called a label word.
Optionally, according to an embodiment of the present invention, as shown in fig. 3, before performing step S203, the method may further include the following steps:
s301, extracting information paragraphs from the original case document to obtain plain text paragraphs of the case document.
S302, performing word segmentation processing on the case document plain text paragraphs according to the legal vocabulary word segmentation dictionary to obtain a first word bag.
Specifically, the extraction of the information paragraphs directly concerns the quality of the training data, and the original XML-formatted case document contains many information paragraphs that are useless for text classification learning, such as < title >, < case word size >, < referee time >, < referee > and so on, which are used as corpus to increase noise in the training data. And the < examined people >, < found people in the institute >, < thought people in the institute > and the like have high generalization effect on case conditions, and simultaneously contain a complete vocabulary list with important effect on case condition definition, so that the effect of text classification learning by being used as a corpus extraction paragraph is better for constructing text vectors.
S303, performing special word replacement on the words in the first word bag to obtain a second word bag.
Specifically, for different classification problems, individual words and phrases in the case document have higher influence on classification learning than other words and phrases. For example, in the "responsibility style" class document, the frequency and location of the occurrence of "original report" and "reported" have an important influence on whether the case document belongs to "reported all responsibility", "reported primary responsibility" or "reported secondary responsibility". However, in some case documents, only the name of the party is used and the original name is omitted. In this case, in the process of constructing the corpus, the name of the principal is matched with the original quilt report by semantic analysis of the < principal information > paragraph, and then the name of the principal is replaced by entering the paragraph extracted from the corpus. Similarly, in the document of 'harm caused by a plurality of motor vehicles', different license plates are replaced by fixed replacement words such as 'license plate 1', 'license plate 2', 'license plate 3' and the like which are added into the word segmentation dictionary, and the expected effect is similar to the responsibility mode. Therefore, special word substitutions may be made to words in the first bag.
S304, according to the second word bag, a first text vector is constructed by using a vector constructor, and the first text vector is subjected to denoising processing to obtain a second text vector.
S305, dividing the second text vector into a training data set and a testing data set.
S306, training and testing evaluation are carried out on the multi-label classification model based on machine learning by utilizing the training data set and the testing data set, and the trained multi-label classification model is obtained.
The case document recommendation method provided in this embodiment receives a user query sentence, extracts a tag word from the query sentence, searches in a graph database according to the tag word, obtains a case document ID set corresponding to the tag word, queries a document database according to the case document ID set, and obtains a case document set corresponding to the case document ID set. The graph database stores label words and corresponding case documents IDs, and the document database stores the case documents IDs and corresponding original case documents. In the scheme, the natural language understanding and graphic database technology is applied, the query intention of a user is determined through extracting the key information of the query statement and interacting with the graphic database, and the related document ID is accurately returned from the graphic database with strict organization, so that the corresponding case document set is returned from the document database, and the user requirements are met.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.
Fig. 5 is a schematic structural diagram of a case document recommendation apparatus according to a second embodiment of the present invention, and as shown in fig. 5, the apparatus includes:
a receiving module 510, configured to receive a user query statement.
An extracting module 520, configured to extract the tag word from the query statement.
And the retrieval module 530 is configured to retrieve from a graph database according to the tag word, and obtain a case document ID set corresponding to the tag word.
And the query obtaining module 540 is configured to query the document database according to the case document ID set to obtain a case document set corresponding to the case document ID set.
Wherein, the graph database stores label words and corresponding case document IDs. The document database stores the ID of each case document and the corresponding original case document.
According to an embodiment of the present invention, the apparatus may further include:
the establishing module 550 is configured to establish a one-to-one correspondence relationship between the case documents and the case document IDs before the receiving module receives the user query statement, and store the one-to-one correspondence relationship in the document database. Extracting case document titles from the original case documents, and constructing case document metadata according to the case document IDs and the case document titles corresponding to the original case documents. And identifying and labeling the case documents by using the trained multi-label classification model to generate corresponding case document labels. And establishing a corresponding relation between the case document metadata and the case document label, and inserting the case document metadata and the case document label into a graphic database.
According to an embodiment of the present invention, the apparatus may further include: classification model training module 560: the method is used for extracting information paragraphs from the original case document before the establishing module identifies and labels the case document by using the trained multi-label classification model to obtain plain text paragraphs of the case document. And performing word segmentation processing on the pure text paragraphs of the case document according to a legal vocabulary word segmentation dictionary to obtain a first word bag. And carrying out special word replacement on the words in the first word bag to obtain a second word bag. And constructing a first text vector by using a vector constructor according to the second word bag, and carrying out denoising processing on the first text vector to obtain a second text vector. The second text vector is divided into a training data set and a test data set. And training and testing and evaluating the multi-label classification model based on machine learning by using the training data set and the testing data set to obtain the trained multi-label classification model.
According to an embodiment of the present invention, the extracting module 520 may include:
the keyword unit 521 is configured to segment the query sentence by using a segmentation dictionary in which keywords are stored to generate a third bag of words, and generate a keyword set according to the third bag of words.
A label word unit 522, configured to determine whether the keyword set is empty. And if the keyword set is not empty, judging whether each keyword in the keyword set is the same as a tag word. And if the keyword set is the same as the label word, taking the keyword in the keyword set, which is the same as the label word, as the label word.
According to an embodiment of the present invention, the apparatus may further include:
the construction module 570 is configured to determine, in the tag word unit, whether each keyword in the keyword set is the same as a tag word, and if not, enter a graph database to search for a corresponding tag node, where the tag node is associated with the case document metadata. And constructing a new keyword set according to the keywords corresponding to the child nodes of the label nodes.
A reply module 580 for constructing a reply sentence including the new keyword set. And sending the reply sentence to a user.
EXAMPLE III
Fig. 6 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention, and as shown in fig. 6, the electronic device includes: a memory 610 and a processor 620.
The memory 610 is used for storing computer programs.
Wherein the processor 620 executes the computer program in the memory to implement the method of the first embodiment.
Example four
The fourth embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program is used for implementing the method in the first embodiment when being executed by a processor.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (12)

1. A method for recommending case documents, comprising:
receiving a user query statement;
extracting tag words from the query sentence;
searching in a graphic database according to the label words, and acquiring a case document ID set corresponding to the label words;
inquiring a document database according to the case document ID set to obtain a case document set corresponding to the case document ID set;
wherein, the graph database stores label words and corresponding case document IDs thereof; the document database stores the ID of each case document and the corresponding original case document.
2. The method of claim 1, wherein prior to receiving the user query statement, further comprising:
establishing a one-to-one corresponding relation between the case documents and the case document IDs, and storing the case documents and the case document IDs in the document database;
extracting a case document title from the original case document, and constructing case document metadata according to a case document ID and the case document title corresponding to the original case document;
identifying and labeling the case documents by using the trained multi-label classification model to generate corresponding case document labels;
and establishing a corresponding relation between the case document metadata and the case document label, and inserting the case document metadata and the case document label into a graphic database.
3. The method of claim 2, wherein before identifying and labeling the case document by using the trained multi-label classification model, the method further comprises:
extracting information paragraphs from the original case document to obtain plain text paragraphs of the case document;
performing word segmentation processing on the pure text paragraphs of the case document according to a legal vocabulary word segmentation dictionary to obtain a first word bag;
carrying out special word replacement on words in the first word bag to obtain a second word bag;
constructing a first text vector by using a vector constructor according to the second word bag, and carrying out denoising processing on the first text vector to obtain a second text vector;
dividing the second text vector into a training data set and a test data set;
and training and testing and evaluating the multi-label classification model based on machine learning by using the training data set and the testing data set to obtain the trained multi-label classification model.
4. The method according to claim 2, wherein the extracting of the tag word from the query sentence specifically includes:
segmenting the query sentence by using a word segmentation dictionary stored with keywords to generate a third word bag, and generating a keyword set according to the third word bag;
judging whether the keyword set is empty or not;
if the keyword set is not empty, judging whether each keyword in the keyword set is the same as a tag word;
and if the keyword set is the same as the label word, taking the keyword in the keyword set, which is the same as the label word, as the label word.
5. The method of claim 4, wherein after determining whether each keyword in the keyword set is the same as a tag word if the keyword set is not empty, the method further comprises:
if not, entering a graphic database to search for a corresponding label node, wherein the label node is associated with the case document metadata;
constructing a new keyword set according to the keywords corresponding to the child nodes of the label nodes;
constructing a reply sentence comprising the new keyword set;
and sending the reply sentence to a user.
6. An apparatus for recommending case documents, comprising:
the receiving module is used for receiving a user query statement;
the extraction module is used for extracting the label words from the query sentences;
the retrieval module is used for retrieving in a graphic database according to the label words and acquiring a case document ID set corresponding to the label words;
the query obtaining module is used for querying a document database according to the case document ID set to obtain a case document set corresponding to the case document ID set;
wherein, the graph database stores label words and corresponding case document IDs thereof; the document database stores the ID of each case document and the corresponding original case document.
7. The apparatus of claim 6, further comprising:
the establishing module is used for establishing a one-to-one corresponding relation between the case documents and the case document IDs before the receiving module receives the user query sentences, and storing the one-to-one corresponding relation in the document database; extracting a case document title from the original case document, and constructing case document metadata according to a case document ID and the case document title corresponding to the original case document; identifying and labeling the case documents by using the trained multi-label classification model to generate corresponding case document labels; and establishing a corresponding relation between the case document metadata and the case document label, and inserting the case document metadata and the case document label into a graphic database.
8. The apparatus of claim 7, further comprising: a classification model training module: the system comprises an establishing module, a judging module and a judging module, wherein the establishing module is used for extracting information paragraphs of an original case document before the establishing module utilizes a trained multi-label classification model to identify and label the case document to obtain plain text paragraphs of the case document; performing word segmentation processing on the pure text paragraphs of the case document according to a legal vocabulary word segmentation dictionary to obtain a first word bag; carrying out special word replacement on words in the first word bag to obtain a second word bag; constructing a first text vector by using a vector constructor according to the second word bag, and carrying out denoising processing on the first text vector to obtain a second text vector; dividing the second text vector into a training data set and a test data set; and training and testing and evaluating the multi-label classification model based on machine learning by using the training data set and the testing data set to obtain the trained multi-label classification model.
9. The apparatus of claim 7, wherein the extraction module comprises:
the keyword unit is used for segmenting the query sentence by utilizing a segmentation dictionary which stores keywords to generate a third word bag, and generating a keyword set according to the third word bag;
the label word unit is used for judging whether the keyword set is empty or not; if the keyword set is not empty, judging whether each keyword in the keyword set is the same as a tag word; and if the keyword set is the same as the label word, taking the keyword in the keyword set, which is the same as the label word, as the label word.
10. The apparatus of claim 9, further comprising:
the construction module is used for judging whether each keyword in the keyword set is the same as a label word in the label word unit, if not, entering a graphic database to search for a corresponding label node, and the label node is associated with the case document metadata; constructing a new keyword set according to the keywords corresponding to the child nodes of the label nodes;
a reply module for constructing a reply sentence including the new keyword set; and sending the reply sentence to a user.
11. An electronic device, comprising: a memory and a processor;
the memory for storing a computer program;
wherein the processor executes the computer program in the memory to implement the method of any one of claims 1-5.
12. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the method according to any one of claims 1-5.
CN201811491656.2A 2018-12-07 2018-12-07 Case document recommendation method, device, equipment and storage medium Pending CN111291152A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811491656.2A CN111291152A (en) 2018-12-07 2018-12-07 Case document recommendation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811491656.2A CN111291152A (en) 2018-12-07 2018-12-07 Case document recommendation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111291152A true CN111291152A (en) 2020-06-16

Family

ID=71022932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811491656.2A Pending CN111291152A (en) 2018-12-07 2018-12-07 Case document recommendation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111291152A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434157A (en) * 2020-11-05 2021-03-02 平安直通咨询有限公司上海分公司 Document multi-label classification method and device, electronic equipment and storage medium
CN112732865A (en) * 2020-12-29 2021-04-30 长春市把手科技有限公司 Method and device for measuring and calculating criminal period influence ratio of criminal case plots
CN113434506A (en) * 2021-06-29 2021-09-24 平安科技(深圳)有限公司 Data management and retrieval method and device, computer equipment and readable storage medium
CN117093604A (en) * 2023-10-20 2023-11-21 中信证券股份有限公司 Search information generation method, apparatus, electronic device, and computer-readable medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1300026A (en) * 1999-12-14 2001-06-20 三菱电机株式会社 Text searching apparatus and text searching method
US20030079183A1 (en) * 2001-03-23 2003-04-24 Hiroyuki Tada Document data processing device, server device, terminal device, and document processing system
US7529756B1 (en) * 1998-07-21 2009-05-05 West Services, Inc. System and method for processing formatted text documents in a database
CN106815263A (en) * 2015-12-01 2017-06-09 北京国双科技有限公司 The searching method and device of legal provision
CN106991092A (en) * 2016-01-20 2017-07-28 阿里巴巴集团控股有限公司 The method and apparatus that similar judgement document is excavated based on big data
CN107092681A (en) * 2017-04-21 2017-08-25 安徽富驰信息技术有限公司 A kind of judicial retrieval result based on user behavior feature learns sort method automatically
CN108255877A (en) * 2016-12-29 2018-07-06 北京国双科技有限公司 The storage method and device of judgement document
CN108334588A (en) * 2018-01-29 2018-07-27 北京搜狐新媒体信息技术有限公司 A kind of user tag construction method and device
CN108334500A (en) * 2018-03-05 2018-07-27 上海思贤信息技术股份有限公司 A kind of judgement document's mask method and device based on machine learning algorithm

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7529756B1 (en) * 1998-07-21 2009-05-05 West Services, Inc. System and method for processing formatted text documents in a database
CN1300026A (en) * 1999-12-14 2001-06-20 三菱电机株式会社 Text searching apparatus and text searching method
US20030079183A1 (en) * 2001-03-23 2003-04-24 Hiroyuki Tada Document data processing device, server device, terminal device, and document processing system
CN106815263A (en) * 2015-12-01 2017-06-09 北京国双科技有限公司 The searching method and device of legal provision
CN106991092A (en) * 2016-01-20 2017-07-28 阿里巴巴集团控股有限公司 The method and apparatus that similar judgement document is excavated based on big data
CN108255877A (en) * 2016-12-29 2018-07-06 北京国双科技有限公司 The storage method and device of judgement document
CN107092681A (en) * 2017-04-21 2017-08-25 安徽富驰信息技术有限公司 A kind of judicial retrieval result based on user behavior feature learns sort method automatically
CN108334588A (en) * 2018-01-29 2018-07-27 北京搜狐新媒体信息技术有限公司 A kind of user tag construction method and device
CN108334500A (en) * 2018-03-05 2018-07-27 上海思贤信息技术股份有限公司 A kind of judgement document's mask method and device based on machine learning algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨波: ""档案管理系统的设计与实现-以东南大学成贤学院为例"", 《中国优秀硕士学位论文全文数据库》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434157A (en) * 2020-11-05 2021-03-02 平安直通咨询有限公司上海分公司 Document multi-label classification method and device, electronic equipment and storage medium
CN112732865A (en) * 2020-12-29 2021-04-30 长春市把手科技有限公司 Method and device for measuring and calculating criminal period influence ratio of criminal case plots
CN112732865B (en) * 2020-12-29 2022-11-29 长春市把手科技有限公司 Method and device for measuring and calculating criminal period influence ratio of criminal case plots
CN113434506A (en) * 2021-06-29 2021-09-24 平安科技(深圳)有限公司 Data management and retrieval method and device, computer equipment and readable storage medium
CN117093604A (en) * 2023-10-20 2023-11-21 中信证券股份有限公司 Search information generation method, apparatus, electronic device, and computer-readable medium
CN117093604B (en) * 2023-10-20 2024-02-02 中信证券股份有限公司 Search information generation method, apparatus, electronic device, and computer-readable medium

Similar Documents

Publication Publication Date Title
CN108804521B (en) Knowledge graph-based question-answering method and agricultural encyclopedia question-answering system
CN109885692B (en) Knowledge data storage method, apparatus, computer device and storage medium
CN110968699B (en) Logic map construction and early warning method and device based on fact recommendation
CN108038091B (en) Graph-based referee document case similarity calculation and retrieval method and system
CN111291161A (en) Legal case knowledge graph query method, device, equipment and storage medium
CN111291152A (en) Case document recommendation method, device, equipment and storage medium
CN110781276A (en) Text extraction method, device, equipment and storage medium
CN109145110B (en) Label query method and device
US20080154875A1 (en) Taxonomy-Based Object Classification
CN111026671A (en) Test case set construction method and test method based on test case set
CN113158653B (en) Training method, application method, device and equipment for pre-training language model
CN112035599B (en) Query method and device based on vertical search, computer equipment and storage medium
CN111125086B (en) Method, device, storage medium and processor for acquiring data resources
CN111339751A (en) Text keyword processing method, device and equipment
Banerjee et al. Bengali question classification: Towards developing qa system
CN112256845A (en) Intention recognition method, device, electronic equipment and computer readable storage medium
CN114491079A (en) Knowledge graph construction and query method, device, equipment and medium
CN111783425B (en) Intention identification method based on syntactic analysis model and related device
CN116628173B (en) Intelligent customer service information generation system and method based on keyword extraction
CN111881695A (en) Audit knowledge retrieval method and device
CA3104292C (en) Systems and methods for identifying and linking events in structured proceedings
Jalal et al. A web content mining application for detecting relevant pages using Jaccard similarity
CN111209393A (en) Method for realizing specialized document classification label based on natural language processing
CN116244496B (en) Resource recommendation method based on industrial chain
CN113807429B (en) Enterprise classification method, enterprise classification device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230627

Address after: 3007, Hengqin International Financial Center Building, No. 58 Huajin Street, Hengqin New District, Zhuhai City, Guangdong Province, 519030

Applicant after: New founder holdings development Co.,Ltd.

Address before: 100871, Beijing, Haidian District, Cheng Fu Road, No. 298, Zhongguancun Fangzheng building, 9 floor

Applicant before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Applicant before: PKU FOUNDER INFORMATION INDUSTRY GROUP CO.,LTD.

AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20231208