CN111008262A

CN111008262A - Lawyer evaluation method and recommendation method based on knowledge graph

Info

Publication number: CN111008262A
Application number: CN201911160895.4A
Authority: CN
Inventors: 刘飞; 陈文平; 陈亿熙; 黄伟民
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-11-24
Filing date: 2019-11-24
Publication date: 2020-04-14
Anticipated expiration: 2039-11-24
Also published as: CN111008262B

Abstract

The invention discloses a lawyer evaluation method and a lawyer recommendation method based on a knowledge graph, wherein the lawyer evaluation method comprises the following steps of: s1, collecting open referee documents to form a referee document library; s2, preprocessing the judgment document to form an effective database; s3, constructing a knowledge graph of the referee documents, and identifying the elements of the knowledge graph of each referee document; s4, obtaining the professional quality evaluation score of each lawyer through a machine learning model according to the knowledge graph in the step S3; and S5, carrying out classification statistics according to case types to obtain case type data which each lawyer is good at, and writing the case type data into a database. A lawyer recommendation method for the lawyer evaluation method, comprising the steps of: A. acquiring a case type, a case area and a case description input by a user; B. finding a corresponding case library according to the case type and the case area input by the user; C. and (5) finding similar cases and lawyers thereof, and returning the cases and lawyers to the user side.

Description

Lawyer evaluation method and recommendation method based on knowledge graph

Technical Field

The invention belongs to the field of internet and data retrieval recommendation, and particularly relates to a lawyer evaluation method and a lawyer recommendation method based on a knowledge graph.

Background

In the internet era, various industries are developed at a high speed with the help of internet + and artificial intelligence algorithms. However, in the legal field, due to its strong professional, the knowledge of the ordinary user in the legal field is relatively deficient, which makes the user inefficient in dealing with the legal problem. This is particularly true in vulnerable groups that require legal assistance. The vulnerable group has the problems of information blocking, less available resources and the like compared with other groups. When a legal problem is met, the people cannot deal with the legal problem by finding human vein resources or high-gold application attorneys like other people. They can only apply for assistance through a legal assistance center, which consumes a great amount of labor and energy and cannot ensure to find a lawyer which is suitable for and satisfied with the lawyer.

At present, various Internet lawyer recommendation platforms are already available in the market. The user only needs to submit the basic information of the case, such as case type information, and can retrieve lawyers meeting the legal requirements of the user, so that the process of seeking legal services by the user is simplified to a certain extent. However, existing lawyers recommend that the platform function be relatively simple. On the one hand, in the existing recommendation platform, lawyer recommendation algorithms are mostly based on the lawyer historical handling rate, and a professional and objective lawyer evaluation system is lacked, for example, the method described in chinese patent CN109409645A mainly bases on the lawyer rate and some lawyer basic information in the lawyer scoring method, such as law information, lawyer personal information, and lacks of comprehensive professional evaluation on the lawyer handling process, which does not well reflect the professional level of the lawyer. On the other hand, in the prior art, when the case similarity is calculated, the semantics of the whole case are not modeled, for example, in the aspect of calculating the similarity, the chinese patent CN107563912A only calculates the cosine similarity between the case of the user and each case in the case library, and further obtains the similarity score by using the KNN method, and does not model the whole case, and the semantic information expressed by the case cannot be extracted, which affects the recommendation effect of lawyers.

Disclosure of Invention

In order to solve the problems existing in the existing methods, the invention aims to provide a lawyer assessment method and a recommendation method based on a knowledge graph, which can objectively and accurately evaluate the professional level and the field of excellence of a lawyer, accurately recommend a corresponding lawyer for a user and effectively improve the user experience.

The invention is realized by at least one of the following technical schemes.

A lawyer assessment method based on a knowledge graph comprises the following steps:

s1, collecting open referee documents to form a referee document library;

s2, preprocessing the referee documents in the referee document library in the step S1, and eliminating invalid data to form an effective database;

s3, aiming at the official documents of the effective database in the step S2, constructing a knowledge graph of the official documents, carrying out knowledge graph element recognition on each official document, and writing the extracted elements into an official document element database;

s4, performing professional quality evaluation on the case lawyers according to the knowledge graph in the step S3, obtaining professional quality evaluation scores of each lawyer through a machine learning model, and writing the lawyer scores into a lawyer evaluation database;

s5, classifying and counting the professional quality evaluation scores of each lawyer in the step S4 according to case types to obtain case type data which are good for each lawyer, and writing the case type data into a database;

and S6, applying the database obtained in the step S5 to the law assistant evaluation systems of all levels of law assistance centers, quantifying law assistant evaluation data, and improving the specialty and objectivity of law assistant evaluation.

Preferably, the pretreatment comprises the following specific steps:

s201, the content integrity of the referee documents in the referee document library in the step S1 is checked, whether the downloading is complete or not is checked, whether the content of the referee documents comprises lawyer information, original report appeal, original report proof, fact of court confirmation and court judgment results or not is checked, and the referee documents which are incompletely downloaded and lack of content are removed from the referee document library to form a primary preprocessing case library;

s202, dividing each case in the primary preprocessing case library in the step S201 into five text sections according to lawyer information, original notice requirements, original notice testimony, fact identified by a court and court judgment results;

s203, extracting corresponding element information from the five text sections in the step S202, wherein the element information comprises lawyer information, a president agent, court trial performance and case handling results, and forming an effective database.

Preferably, the step S3 includes the following steps:

s301, aiming at the referee document of the effective database in the step S2, constructing a knowledge graph of the referee document, wherein the knowledge graph of the referee document comprises a case entity, a lawyer entity, a pre-court agent entity, a court trial expression entity and a case handling result entity, and the case entity comprises three attributes of the location of the case, the name of a court and the number of the case; the lawyer entity comprises three attributes of lawyer name, law firm information and lawyer place information; the pre-court agent entity comprises three attributes of the number of testimony of original report, evidence quality certificate and the number of litigation request items; the court trial performance entity comprises three attributes of lawyer court situation, fact claim and responsibility claim; the result of case handling comprises five attributes of fact affirmation, evidence adoption, indemnity item affirmation, indemnity responsibility affirmation and support degree of original notice appeal item of the court;

s302, sampling the referee document according to the knowledge graph produced in the step S301, establishing a regular rule base of elements of the knowledge graph, setting priority for syntactic rules of the same element, performing rule matching on the referee document according to the priority, and writing a matching result of the referee document into a database;

s303, sampling the referee document according to the knowledge graph produced in the step S301 and constructing a training data set, so that model training is performed on two attributes of responsibility assertion and indemnity responsibility confirmation by adopting a bidirectional long-short term memory classification model (Bi-LSTM), more accurate element information is finally obtained and is used as further verification of the element matching result in the step S302, and the more accurate element result is written into the referee document element database again.

Preferably, the step S4 includes the following steps:

s401, sampling 50% of documents from the case library in the step S201, manually scoring the lawyers of the case according to the content of the judge documents, removing the highest score and the lowest score, and then calculating a score mean value to serve as a prior lawyer professional quality evaluation score of the lawyers in the case;

s402, converting the more accurate element result of the step S303 into a low-dimensional tensor through a word embedding model;

s403, taking a gradient lifting decision tree (GBDT) as a basic algorithm, and adopting a lightweight gradient lifter (lightGBM) frame training and monitoring model to realize automatic lawyer professional quality evaluation, wherein the low-dimensional tensor obtained in the step S402 is used as the lightweight gradient lifter (lightGBM) frame training and monitoring model input, the lawyer professional quality evaluation score obtained in the step S401 is used as prior knowledge, the Mean Square Error (MSE) is used as a lightGBM frame training and monitoring model evaluation index through iteration, the optimal parameter of the model is selected, and the model is stored;

and S404, performing lawyer professional quality evaluation on the referee document by adopting the model obtained in the step S403, and writing the score into a database.

Preferably, the step S5 includes the following steps:

s501, classifying the case professional quality evaluation scores of each lawyer according to the case types;

s502, calculating the professional quality evaluation score mean value of each lawyer in different types of cases respectively;

s503, arranging the mean values obtained in the step S502 in a descending order, taking the first, second and third case types as the lawyer adept case types, and writing the case types into a database.

The lawyer recommendation method for the knowledge-graph-based lawyer evaluation method comprises the following steps of:

A. acquiring a case type, a case area and a case description input by a user;

B. searching a corresponding case library according to the case type and the case area input by the user;

C. b, extracting information according to case descriptions input by a user, finding similar cases and lawyers thereof in the case library in the step B, and forming a lawyer recommendation candidate set;

D. and D, returning the lawyer recommendation candidate set obtained in the step C to the law-aid lawyer recommendation system so as to improve the accuracy and effectiveness of the law-aid lawyer recommendation.

Preferably, in the step B, the specific steps of searching the corresponding case library are as follows:

B01. screening cases meeting requirements according to case types input by a user to obtain a preliminary case library;

B02. and B01, comparing the region information of the case with the region input by the user according to the preliminary case library obtained in the step B01, and screening the case meeting the requirements to form a corresponding case library.

Preferably, in the step C, the step of finding similar cases and their lawyers specifically includes the following steps:

C01. according to case description input by a user, performing word segmentation processing on a text of the case description, and filtering stop words to obtain effective keywords related to the case;

C02. after the effective key words obtained in the step C01 are subjected to unique hot coding, the effective key words are input into a word embedding model, and high-dimensional sparse information is converted into low-dimensional tensors;

C03. according to the low-dimensional tensor obtained in the step C02 and the case library cases generated in the step B02, similarity is calculated one by using a similarity calculation method, and cases with the similarity larger than a threshold value and the similarity of the cases are reserved;

C04. taking the case similarity obtained in the step C03 as a weight coefficient, and taking lawyer professional quality scores corresponding to the cases as variables to obtain recommendation scores; if the same lawyer has a plurality of similar cases, taking the arithmetic mean value as a recommendation score;

C05. and D, performing descending sorting according to the recommendation scores obtained in the step C04 to serve as the recommendation sequence of lawyers to form a lawyer recommendation candidate set, wherein the information of the lawyers in the candidate set comprises names of the lawyers, places to which the lawyers belong, recommendation scores and cases of similar cases.

Preferably, the word embedding model in step C02 is a word2vec model, the training process is trained by using a skip-word model (skip-gram) and negative samples, and the objective function is as follows:

where θ is the target parameter to be optimized, D_pIs a positive sample set, D is a positive sample, D_nIs a negative sample set, D' is a randomly sampled negative sample, D_m,nSet of negative examples of the same case type as the positive examples, d_m,nA negative sample of the same case type as the positive sample; v. of_dWord vector of positive samples, v_d′A word vector that is a randomly sampled negative sample,

a word vector that is a negative example of the same case type as the positive example.

Preferably, the similarity algorithm of step C03 is specifically as follows:

normalizing the case tensor after one-hot coding to obtain tensor x, namely a document I, expressing the case after the tensor normalization of any document in a case library as x', namely a document II, and defining a sparse transfer matrix T belonging to R^n×nR represents a real number, n represents an nth Word in two documents, c (i, j) represents the overhead required for transferring the ith Word of the document to the jth Word of the document, and according to WMD (Word move's Distance), the similarity between cases sim is represented as:

wherein i represents the ith word in the first document, and x_iA word vector representing an ith word in a first document; j represents the jth word, x 'in document two'_jWord vector, T, for the jth word in document two_ijIndicating the distance that the ith word in document one needs to be moved to the second jth word in the document,

preferably, the arithmetic mean of step C04 is calculated as follows:

in the above formula, m represents that the lawyer accepts m similar cases, S_piScoring the professional of the lawyer corresponding to the case in the case library, wherein the score recommended by the lawyer at this time is S_r。

The invention has the beneficial effects that:

1) the invention effectively evaluates the professionalism of each lawyer and the field of excellence thereof and discriminates lawyers with strong professional ability in different professional fields by processing the data in the judge document. The entity and the attribute thereof are quantified according to the knowledge map, and the professional evaluation scores of the lawyers are obtained through a machine learning model, so that the professional level of each lawyer can be accurately evaluated from high-dimensional mass data;

2) according to the invention, lawyers recommend a case descriptor embedding model, so that word vectors have semantic information. In the training process of the model, the negative samples are not only from random sampling, but also from the same type of case. Compared with the existing random negative sample, the word vector has richer characteristics, and the cases of the same type and different types have differences, so that similarity calculation is facilitated;

3) the similarity calculation algorithm adopts a WMD algorithm, compared with the existing cosine similarity, edit distance and other methods, the migration of word2vec can be better utilized, meanwhile, the problem is converted into linear programming, and the method has a global optimal solution. Therefore, the method can better calculate the similarity value between texts, has higher accuracy and further improves the recommendation effect;

4) the method and the system can recommend the corresponding lawyers according to the case description of the user, so that the lawyers can process legal cases in the field of oral expertise, the user experience degree is improved, the recommendation accuracy is high, and the method and the system are suitable for popularization and use.

5) The system can be applied to the law attorney evaluation and recommendation systems of the judicial department and law assistance centers at all levels, can powerfully promote the objective and fair evaluation on the professional ability of the attorney, greatly improves the recommendation precision of the law attorney, and improves the working efficiency and accuracy of the law assistance centers. As China increasingly pays more attention to the aspect of legal assistance construction, the achievement of the invention has important application value and market potential, can effectively reduce social contradiction and promote the development of national justice and fair construction.

Drawings

Fig. 1 is a flowchart of a lawyer assessment method based on a knowledge graph according to the present embodiment;

fig. 2 is a flowchart of a lawyer assessment method based on a knowledge graph according to the present embodiment.

Detailed Description

The invention is further explained below with reference to the drawings and the specific embodiments:

a lawyer assessment method based on a knowledge-graph as shown in fig. 1, comprising the following steps:

s1, collecting open referee documents to form a referee document library;

in this embodiment, in step S1, the data is derived from a publicly-known official document network. Therefore, objectivity and authenticity of data are effectively guaranteed.

S2, preprocessing the documents in the referee document library in the step S1, and removing invalid data to form an effective database;

in this embodiment, in step S2, the specific steps of preprocessing are as follows:

s201, checking the completeness of cases in the case library of the step S1 to check whether downloading is complete, judging whether the content of the document comprises lawyer information, original notice appeal, original notice testification, fact of court identification and court judgment result, and eliminating documents which are incompletely downloaded and lack of content to form a primary preprocessed case library;

s202, dividing each case in the case library in the step S201 into 5 text sections according to lawyer information, original notice appeal, original notice testifying, fact identified by a court and court judgment results;

and S203, extracting corresponding element information from the five text segments in the step S202 to form four parts of lawyer information, president agents, court trial expression and case handling results, and forming a direct and meaningful effective database.

S3, aiming at the effective judge document database in the step S2, designing a knowledge graph of the judge document, carrying out knowledge graph element identification on each judge document, and writing the extracted relevant elements of the knowledge graph into the database;

in this embodiment, in step S3, the specific steps of preprocessing are as follows:

s301, according to the effective database obtained in the step S2, the judgment document knowledge graph comprises 5 entities in total, a case entity, a lawyer entity, a pre-court agent entity, a court trial performance entity and a case handling result entity. The case entity comprises three attributes of a case location, a court name and a case number; the lawyer entity comprises three attributes of lawyer name, law firm information and lawyer place information; the pre-court agent entity comprises three attributes of the number of testimony of original report, evidence quality certificate and the number of litigation request items; the court trial performance entity comprises three attributes of lawyer court situation, fact claim and responsibility claim; the result of case handling comprises five attributes of fact affirmation, evidence adoption, indemnity item affirmation, indemnity responsibility affirmation and support degree of original notice appeal item of the court;

s302, according to the knowledge graph produced in the step S301, the judgment document is sampled, a knowledge graph element regular rule base is established, and the syntactic rules of the same element are prioritized, such as: lawyer name elements are extracted, and the lawyer name elements can be obtained according to the' entrusted litigation attorney: XXX sentence rule, extract "commission litigation agent: the character after "is used as the lawyer name. Performing rule matching on the judgment documents according to the priority, and writing the matching result of the file to a database;

and S303, sampling the judgment document according to the knowledge graph produced in the step S301, constructing a training data set, and meanwhile, performing word segmentation on the data set by adopting a jieba word segmentation tool and removing stop words. And (3) encoding the processed data set by adopting a word2vec word vector tool of the Kagaku Kaiyuan of 2013, performing model training on two attributes of responsibility assertion and indemnity confirmation by adopting a Bi-LSTM (bidirectional long-short term memory) classification model, finally identifying more accurate element information, further verifying the element information as the element result of S302, and writing the more accurate element result into a database again.

S4, constructing a feature list of lawyer professional quality evaluation according to the knowledge graph in the step S3, obtaining professional quality evaluation scores of each lawyer through a machine learning model, and writing case lawyer evaluation information into a database; the method comprises the following specific steps:

s401, sampling 50% of preprocessing data from the case library S201, carrying out professional quality scoring (full score 100) on case lawyers by multi-person professionals according to the content of the judge document, and calculating a score mean value after removing the highest score and the lowest score to be used as a priori professional quality evaluation score of the lawyers in the case;

s402, performing word segmentation and stop word removal on the knowledge graph elements obtained in the step S302 by adopting a jieba word segmentation tool; the processed data is represented by a word2vec model in a coding mode, so that a high-dimensional sparse vector is converted into a low-dimensional dense tensor;

and S403, using GBDT (gradient lifting decision tree) as a basic algorithm, and adopting a lightGBM (lightweight gradient lifting machine) framework training and monitoring model to realize automatic lawyer professional quality evaluation. And constructing a regression task by taking the low-dimensional tensor obtained in the step S402 as a model input and the lawyer professional quality evaluation score obtained in the step S401 as a priori knowledge, and performing optimization training. Constructing a 10-fold cross inspection experiment, preferably selecting the model parameters with the best inspection effect, and storing;

and S404, performing lawyer evaluation on the processed data by adopting the model obtained in the step S403, and writing an evaluation result into a database.

S5, classifying and counting the professional quality evaluation scores of each lawyer in the step S4 according to case types to obtain case type data which are good for each lawyer, and writing the case type data into a database, wherein the specific steps are as follows:

s501, classifying the case professional quality evaluation scores of each lawyer according to case types and divided according to three-level case routing;

The lawyer recommendation method for the knowledge-graph-based lawyer assessment method as shown in fig. 2 can know the professional level and the skilled field of the lawyer known in the effective database based on the published judge document, and the lawyer recommendation method in the invention is the continuation and development of information retrieval. Lawyer recommendation firstly needs to judge the type of a case of a user and the area where the user is located so as to divide a matched case subset; then, case description participles input by a user are converted into word vectors after stop words are removed, case description of each case in the case subset is also converted into word vectors, the similarity between the case description vectors of the user and the word vectors of each case in the case subset is calculated, then the similarity is used as the weight of case evaluation scores of lawyers to obtain the most total ranking score, and the optimal lawyers can be recommended after descending ranking, and the method specifically comprises the following steps:

A. and acquiring the case type, the case area and the case description input by the user through the interface. The case types are divided according to three-level case routes; the case area is divided into three levels of administrative areas, namely province, city and county (district).

B. Finding out a corresponding case library according to the case type and the case area input by the user, wherein the specific implementation steps are as follows:

B02. according to the preliminary case library obtained in the step B01, comparing the region information of the case with the region input by the user, and screening the case meeting the requirements to form a corresponding case library;

C. and B, extracting information according to case description input by a user, finding similar cases and lawyers thereof in the case library in the step B, and returning the cases and lawyers to the user side, wherein the specific steps are as follows:

C01. according to case description input by a user, word segmentation processing is carried out on the text of the case description, stop words are filtered out, and effective keywords related to the case are obtained. Wherein, the word segmentation tool adopts jieba word segmentation;

C02. and C01, inputting the effective key words into a word embedding model after the effective key words are subjected to one-hot coding, and converting the high-dimensional sparse information into a low-dimensional tensor.

In this embodiment, in step C02, the word embedding model is specifically as follows:

the word embedding model is an improved version of the word2vec model based on the open source in google 2013. On the basis of the original version, negative sample information of the same case type is added into the objective function, so that the word vector has case difference information in the type. The training process adopts skip-gram and negative sampling method to train, and the target function is

C03. According to the low-dimensional tensor obtained in the step C02 and the case library case generated in the step B02, the similarity is calculated one by one, the case with the similarity larger than the threshold value and the similarity thereof are reserved,

in this embodiment, in step C03, the similarity calculation method specifically includes:

normalizing the case tensor after one-hot (one-hot coding) to obtain a tensor x (a document I), expressing the case after the tensor normalization of any document in a case library as x' (a document II), and defining a sparse transfer matrix T belonging to R^n×n(R represents a real number, n represents an nth Word in two documents), c (i, j) represents the overhead required for transferring the ith Word of the document to the jth Word of the document, and according to WMD (Word move's Distance), the similarity sim between cases is represented as

in consideration of the actual recommendation performance problem, in this embodiment, the WMD (word shift distance) adopts an optimized RWMD (relaxed word shift distance) algorithm, that is, an open-source version of FastWMD (fast word shift distance).

C04. According to the case similarity obtained in the step C03 as a weight coefficient, and the lawyer professional quality score corresponding to the case as a variable, calculating a recommendation score; if the same lawyer has a plurality of similar cases, taking the arithmetic mean value as a recommendation score;

in this embodiment, in step C04, the arithmetic mean specifically includes the following steps:

the professional rating of the lawyer corresponding to the case in the case library is S_piThen the lawyer scores at this recommendation are

In the above formula, m represents that the lawyer accepts m similar cases, S_rIs the final recommendation score. According to S_rAnd (4) sorting lawyers to be recommended in a descending order, and sending lawyer information and lawyer recommendation scores to the client for reference and selection of the user.

C05. And D, sorting in a descending order according to the recommendation scores obtained in the step C04 to serve as the recommendation sequence of lawyers, and returning lawyer information to the user side, wherein the lawyer information comprises the names of the lawyers, the offices to which the lawyers belong, the recommendation scores and cases of similar cases.

The above-described examples merely represent one embodiment of the present invention, which is described in more detail and in greater detail, but are not to be construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A lawyer assessment method based on a knowledge graph is characterized by comprising the following steps:

s1, collecting open referee documents to form a referee document library;

2. The method for lawyer assessment based on knowledge-graph as claimed in claim 1, wherein the specific steps of the preprocessing in step S2 are as follows:

3. The method for lawyer assessment based on a knowledge-graph of claim 1, wherein said step S3 comprises the following steps:

4. The method for lawyer assessment based on a knowledge-graph of claim 1, wherein said step S4 comprises the following steps:

5. The method for lawyer assessment based on a knowledge-graph of claim 1, wherein said step S5 comprises the following steps:

6. Attorney recommendation method for an attorney assessment method based on knowledge-graph according to claim 1, characterized by comprising the following steps:

A. acquiring a case type, a case area and a case description input by a user;

7. The lawyer recommendation method of claim 6, wherein the step B of searching the corresponding case library comprises the following steps:

8. The lawyer recommendation method of claim 6, wherein the step C of finding similar cases and their lawyers comprises the following steps:

9. The lawyer recommendation method of claim 8, wherein the word embedding model of step C02 is a word2vec model, and the training process is performed by using a skip-word model (skip-gram) and negative sampling, and the objective function is:

where θ is the target parameter to be optimized, D_pIs a positive sample set, D is a positive sample, D_nIs a negative sample set, D' is a randomly sampled negative sample, D_m，nSet of negative examples of the same case type as the positive examples, d_m，nA negative sample of the same case type as the positive sample; v. of_dWord vector of positive samples, v_d′A word vector that is a randomly sampled negative sample,

10. The lawyer recommendation method of claim 8, wherein the similarity algorithm of step C03 is specifically as follows:

encoding one-hoNormalizing the case tensor after T) to obtain a tensor x which is a document one, expressing the case after the tensor normalization of any document in the case library as x' which is a document two, and defining a sparse transfer matrix T belonging to R^n×nR represents a real number, n represents an nth Word in two documents, c (i, j) represents the overhead required for transferring the ith Word of the document to the jth Word of the document, and according to WMD (Word move's Distance), the similarity between cases sim is represented as:

the arithmetic mean of step C04 is calculated as follows:

in the above formula, m represents that the lawyer accepts m similar cases, S_piScoring the expertise of the corresponding attorney of the case in the case library, the

The score recommended by lawyer at this time is S_r。