CN108595525B

CN108595525B - Lawyer information processing method and system

Info

Publication number: CN108595525B
Application number: CN201810260735.6A
Authority: CN
Inventors: 彭帅
Original assignee: Chengdu Lyuyun Technology Co ltd
Current assignee: Chengdu Lyuyun Technology Co ltd
Priority date: 2018-03-27
Filing date: 2018-03-27
Publication date: 2021-11-23
Anticipated expiration: 2038-03-27
Also published as: CN108595525A

Abstract

The invention relates to a lawyer information processing method and a system, in particular to a lawyer matching method and a system, which comprise the following steps: acquiring case classification and keywords according to case information; acquiring judge documents and acquiring lawyer practice evaluation information aiming at different types of judge documents; obtaining lawyer related evaluation information according to other preset related lawyer information; obtaining comprehensive lawyer evaluation information according to the obtained actual evaluation information of the lawyer and the related evaluation information of the lawyer; and matching corresponding lawyers according to comprehensive evaluation information of lawyers, case classification and keywords.

Description

Lawyer information processing method and system

Technical Field

The invention relates to a lawyer information processing method and a system, in particular to a lawyer recommendation method and a system aiming at the case situation of a user.

Background

With the continuous advance of the construction of the legalized countries, more and more legal case documents are published on the Chinese referee document network, so that individuals and various group organizations increasingly pay more attention to the use of legal weapons to guarantee the rights and interests of the individuals and the various group organizations. However, due to the normative and strict nature of the laws and regulations, it is difficult for an illegal professional to actually apply the laws and regulations to maintain his own rights, so that it is desired to provide professional legal services through a law practitioner, especially an optimal lawyer team or individual, so as to actually realize the maintenance and guarantee of his own legal rights using the laws and regulations.

In order to better provide targeted services for users who need to seek legal services, appropriate lawyers should be selected according to actual cases and user requirements, namely matched lawyers are searched according to case descriptions of the users and recommended to the users. The general design process of the existing common attorney matching recommendation system is as follows: analyzing and processing the case proposed by the user to obtain the affiliated category of the case and the key words of the case, and recommending appropriate lawyers to the user by combining the related information of each lawyer of the law firm.

However, the lawyer information of the conventional lawyer recommendation system is single, and cannot fully reflect the information which is good for lawyers and is concerned by users and has important influence on the case brokering, such as the legal field, comprehensive performance and service evaluation, so that the most suitable lawyer cannot be effectively recommended to the users, the case results are influenced, the satisfaction degree of the users is reduced, and even unnecessary loss is caused.

Therefore, there is a need for an improvement on the existing lawyer recommendation system, which can evaluate each candidate lawyer more comprehensively and recommend the most suitable lawyer to the user by an effective matching recommendation strategy.

Disclosure of Invention

Aiming at the defects of the conventional lawyer recommendation system, the invention provides a lawyer information processing method and system, and further provides a lawyer recommendation method and system.

Specifically, the method relates to a method for matching lawyers according to case situations, and comprises the steps of obtaining case classifications and keywords according to the case situations; acquiring and classifying the referee documents, and extracting preset information required to be acquired according to each classified referee document so as to acquire corresponding first uniform teacher information; acquiring second lawyer information, wherein the second lawyer information can be lawyer registration information, lawyer question and answer information and/or lawyer evaluation information; acquiring third lawyer information according to the first lawyer information and the second lawyer information, wherein the third lawyer information can be comprehensive lawyer evaluation information; and matching corresponding lawyers according to the third lawyer information, the case classification and/or the keywords.

Further, the acquiring of case classification and keywords according to the case comprises preprocessing a text to acquire feature words and/or phrases; classifying the non-question sentences and/or question sentences in the text according to the characteristic words and/or phrases; and acquiring the type of the text information, wherein the text is case description and/or a problem related to the case, and the type of the text information is a case type.

Further, acquiring the referee document information further comprises establishing a full-text index of the referee document; classifying the referee document; presetting information required to be acquired according to the type of the referee document; and extracting the preset information required to be acquired by a rule-based method.

Further, the method also relates to a lawyer recommendation method, which comprises the steps of obtaining a candidate lawyer set; obtaining user-bar information, the user bar information comprising: user preference information, attorney preference information, user scores, inter-user similarity information, and/or inter-attorney similarity information; selecting a lawyer recommendation algorithm; obtaining a recommended lawyer or a recommended lawyer list according to the user-lawyer information and the lawyer recommendation algorithm; wherein the acquiring the candidate lawyer set comprises acquiring the candidate lawyer set according to the case category and lawyer information, wherein the lawyer information comprises information contained in a judge document associated with a lawyer, lawyer registration information and/or lawyer question and answer information.

Furthermore, the method also comprises a corresponding system.

In summary, the method and system of the invention firstly analyze and process the case proposed by the user to obtain the category of the case and the key words thereof, then for the lawyer plate needing to be matched, firstly obtain the detailed information of the previous case from the Chinese judge document network and classify the documents, further extract the information of the document set of each category, finally design the corresponding recommendation algorithm by combining the comprehensive information of each lawyer, recommend the appropriate lawyer for the case proposed by the user, effectively guarantee the legitimate interest of the user, and improve the satisfaction degree of the user.

Drawings

FIG. 1 illustrates a lawyer matching method according to an embodiment of the present invention;

FIG. 2 is a method for obtaining case classifications and keywords according to another embodiment of the present invention;

FIG. 3 illustrates a method for problem pre-processing according to another embodiment of the present invention;

FIG. 4 is a method for obtaining official documents and associated lawyer evaluations according to another embodiment of the present invention;

FIG. 5 is a method for crawling web referee documents according to another embodiment of the invention;

FIG. 6 is a method for template-based extraction of web page information according to another embodiment of the present invention;

FIG. 7 is a block diagram illustrating a method for extracting information based on a rule-based decision book according to another embodiment of the present invention;

fig. 8 is a flowchart of a decision book information extraction algorithm according to another embodiment of the present invention;

FIG. 9 is a basic concept topology diagram of lawyer recommendation evaluation logic according to another embodiment of the present invention;

FIG. 10 is a bar matching system based on bar matching method according to another embodiment of the present invention;

FIG. 11 is a block diagram of a problem pre-processing module according to another embodiment of the present invention;

FIG. 12 is a block diagram of a web crawler module according to another embodiment of the present invention;

fig. 13 is a block diagram of a referee document information extraction module according to another embodiment of the present invention.

Detailed Description

In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention is fully described below with reference to the accompanying drawings of the specification attached with the specification. It should be understood that the following detailed description is only a partial embodiment of the present invention, and other embodiments or combinations thereof, which can be obtained by those skilled in the art without inventive skill on the basis of the following detailed description, are within the technical spirit and scope of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a lawyer matching method for matching lawyers according to a case, including the following steps:

s1, case classification and keywords are obtained according to the case situation.

The step of obtaining case classification and keywords according to the case mainly relates to case analysis and processing, wherein the case analysis refers to problem understanding of case description input by a user or corresponding question texts. In practice, most of case information input by a user is short texts, so that the analysis of texts input by the user is mainly short text analysis or question analysis. Meanwhile, the analysis of case scenario text serves lawyers required for matching of users, a simpler question-answering system can be adopted on a question-answering system based on a knowledge base to understand the input of the users, the semantic understanding of the question sentences is completed, the question sentences are converted from fuzzy natural language into clear logic language, and the question sentences are processed in an expected manner, wherein the problem analysis mainly comprises problem preprocessing, problem classification and problem expansion.

Wherein, the step of obtaining case classification and keywords according to the case comprises the steps shown in figure 2:

s101, problem preprocessing.

The problem preprocessing refers to the steps of Chinese word segmentation, named entity recognition, part of speech tagging, stop word filtering and the like before semantic analysis and classification are carried out on the problem, and aims to preprocess input information of a user so as to obtain a simple and standard-meeting candidate characteristic phrase with a certain information quantity, wherein the characteristic phrase refers to a phrase capable of reflecting the characteristics of a text, and the characteristic phrase is generally used for representing a basic unit of the text.

More specifically, fig. 3 shows the specific method steps of problem pre-processing, which includes:

s1011. Chinese word segmentation.

The Chinese word segmentation mainly aims at separating a text into word groups, a word segmentation algorithm can be selected according to actual application requirements, and common Chinese word segmentation tools can also be selected and used, for example, Chinese word segmentation tools and part-of-speech tagging tools have Stanford Chinese word segmentation tools, ICTCCLAS of Chinese academy, LIP of the size of the industry of the Chinese academy, and jieba word segmentation.

S1012, named entity recognition.

Named entity recognition is mainly used for recognizing entity classes, time classes, number classes and the like in texts to be processed.

And S1013, identifying and labeling parts of speech.

The part-of-speech recognition is important for removing stop words and retrieval results, and according to the part-of-speech, the method can remove the semanteme words, auxiliary words and other nonsense words in the text, and simultaneously mark and extract the focus and core components of the question sentence. The parts of speech in the part of speech tag mainly means: adjectives, adverbs, conjunctions, verbs, quantifiers, pronouns, and the like.

S1014, stop word filtering.

The stop word filtering mainly refers to filtering out information which does not greatly contribute to the expression of the question information or influence lawyer matching, such as ' yes ', ' so ', ' requested ', ' drive ', thank you ' and the like, wherein words and sentences needing to be filtered can be filtered according to a preset stop word list.

And S1015, feature extraction.

And extracting characteristic phrases.

With continuing reference to fig. 2, the step of obtaining case classifications and keywords according to the case further comprises:

s102, problem classification

The problem classification is to classify the problems described by the natural language and fully collect the information related to the problems so as to improve the accuracy of the subsequent link processing. The main purpose of question classification is to label question types according to the question description of the user, so as to facilitate information retrieval and attorney matching. The problem classification is a special form of text classification, and the research method of the problem classification is generally based on the idea of text classification, and the difference between the problem classification and the problem classification is that the problem is a short text form, the problem contains less information, and no context exists, which causes difficulty in problem classification, so that the problem classification needs to perform deeper analysis on sentences, such as syntactic analysis, semantic analysis, and the like. Problem classification can effectively reduce the search space of candidate lawyers and improve the accuracy of the system for returning correct matching lawyers. Specifically, in the present embodiment, the problem classification mainly refers to classifying cases described by users, so as to determine a candidate set of lawyers according to the type of the case, and then recommending the case through a recommendation algorithm.

Similar to text classification, the problem classification maps feature vectors to type functions for classification, which can be simply expressed as f: A → B. A represents a feature vector formed by a problem set to be classified, and the feature vector comprises feature phrases, parts of speech and the like; b represents the type set of the classification system and is determined by the adopted classification system. The mapping rules between A and B are set by different classification methods, which are mainly embodied in classification algorithm models. Wherein, the applicable classification algorithm model comprises:

(1) support vector machine model

The basic principle of the support vector machine model is that an input vector x is mapped to a high-dimensional vector space through selected nonlinear mapping, a hyperplane for optimally segmenting two types of data is searched in the space, and the classification interval of the two types of mode vectors is maximized so as to ensure that the empirical risk and the structural risk of a classifier are minimized. This hyperplane can be expressed as the classification function f (x) wTx + b, where x is the feature vector of the training sample set, w is the weight vector, and b is the offset. The support vector machine is constructed by a binary classifier, a plurality of binary classifiers are required to be established for multi-class pattern recognition, the processing result depends on the structure of a grasped pattern sample set, large-scale training samples are difficult to implement, the time overhead is high when the multi-class problem is solved, and the support vector machine is suitable for small sample learning.

(2) Bayesian classification model

The classification principle of the Bayes model is that the posterior probability of the object belonging to a certain class is calculated by using a Bayes formula through the prior probability and the characteristic item distribution of the class, and the class with the maximum posterior probability is selected as the text class. The Bayesian classification model is evolved by mathematical probability operation, and has the characteristics of simple algorithm, capability of processing large-scale and multi-class samples, insensitivity to missing data, high efficiency in processing classification problems, low classification precision and incapability of meeting the independence of characteristics.

(3) K-nearest neighbor model

The K-Nearest Neighbor (KNN) algorithm is an inert learning algorithm, and in particular, if a sample is K most similar in feature space, i.e., most of the Nearest Neighbor samples in feature space belong to a certain class, then the sample also belongs to this class. The method only determines the category of the sample to be classified according to the category of the nearest sample or a plurality of samples in the classification decision, and only relates to a small quantity of adjacent samples in the category decision, although the method also depends on the limit theorem in principle. The KNN method is mainly based on the limited adjacent samples at the periphery, so that the KNN method is more suitable for the sample sets to be classified with more domain-like intersections or overlaps. However, since the sample size of one class is large and the data is unbalanced while the sizes of other samples are small, it may cause that the large-size samples in the K nearest neighbors in the samples are not the nearest target samples when inputting new samples, and at the same time, the KNN algorithm has a large calculation amount, and the distance from each classified text to all known samples needs to be calculated to find the K nearest neighbors. Therefore, the KNN algorithm is often combined with other algorithms to deal with classification problems.

(4) Maximum entropy model

The principle of the maximum entropy Model (ME) is that predicting the probability distribution of a random event should satisfy all known conditions, without making any subjective assumption for the unknown case. In this case, the probability distribution is most uniform, the prediction risk is minimum, the information entropy of the probability distribution is maximum, and a good effect can be obtained in natural language processing.

With continued reference to fig. 1, the lawyer matching method further comprises the steps of:

and S2, acquiring the referee documents, classifying the acquired referee documents, extracting preset required information for each classified referee document, and obtaining first uniform evaluation according to the information.

Fig. 4 shows a specific method flow of step S2, including:

s201, crawling referee documents on the network through a web crawler.

For the acquisition of the referee documents, a large number of referee document documents can be acquired from the internet, such as the Chinese referee document network and/or each local court website. Since these official document data are updated daily and the number of updates is large, it is impossible to efficiently acquire them manually, and it is necessary to realize efficient and rapid acquisition of official documents with a large data volume by means of a web crawler.

The web crawler, also called web spider, is mainly used to traverse web page resources on the internet, find out the required resources from the web page resources, and then store the resources in a local library for subsequent research, and is an important component of a search engine. The crawler program firstly gives one or more initial web page URLs, then starts from the given URLs, acquires information on the web pages, simultaneously saves the URLs on the web pages into a URL queue, and finally stops the program when the condition that the program reaches the end of running is that. At present, a common web page search strategy is based on an algorithm in a graph theory, and specifically includes: depth first, breadth first, best first, etc.

Specifically, referring to fig. 5, the specific method for crawling the network referee document includes the following steps:

s2011, analyzing initial URLs of all classifications from the webpage based on the initial URLs;

s2012, analyzing each initial URL, finding and storing a next-level directory URL, and repeating the process until the last-level URL is reached;

s2013, analyzing the URL of the last stage to obtain the URL of the referee document list;

s2014, analyzing the URL of each referee document from the list, and then storing the analyzed URL into a crawler queue; in another embodiment, the step of parsing and storing the URL to the crawler queue further comprises:

s20141.URL deduplication.

Repeated URLs may appear in the process of crawling the referee document, and the modes for judging the repeated URLs comprise a memory storage-based mode and a disk-based cache mode. For duplicate URLs that occur during crawling, a bloom filter needs to be employed for URL deduplication.

And S2015, analyzing the corresponding URL in the referee document, and extracting and storing the information of the referee document.

After crawling referee document information by a web crawler, extracting referee document information by using template-based webpage information. Because resources on the internet mostly exist in the form of HTML web pages, which are composed of texts and HTML tags, the information required for extracting from such web pages composed of texts and tags can be extracted by adopting a static template method. In addition, if crawling of different web pages is involved, for example, crawling of information on websites of various courts is required, a multithreading mode is adopted, and different static templates are set for different web pages.

FIG. 6 illustrates method steps for template-based extraction of web page information, including:

s20151, observing different webpage structures, and searching for needed correlation characteristics.

And by summarizing the rule, finding the unique upper bound and the unique lower bound of each content to be extracted, and then storing the methods for identifying and identifying the unique upper bound and the unique lower bound into different XML documents, thereby facilitating the expandability of subsequent programs.

S20152, loading the pre-stored static template file into a memory, analyzing by using a program, and matching the extracted information.

S20153, formatting information existing in the same row in the webpage to obtain standard formatting information.

S20154, removing HTML tags in the extracted information, checking the encoding format of the extracted information, and jumping to S20156 if the encoding format does not contain codes in the Unicode format.

And the HTML tags in the extracted information are removed, so that the influence of the tags without substantial value on the text analysis result can be effectively reduced.

S20155, if the Unicode codes exist, reading the Unicode codes, and then searching the Chinese characters corresponding to the codes by using corresponding character conversion functions to complete information conversion.

S20156, storing the extracted information to a local place for subsequent analysis and processing or query of related contents.

With continued reference to fig. 4, after acquiring and extracting the referee document information through the web crawler, step S2 further includes:

s202, establishing full-text index of the referee document.

The full-text index of the referee document is established by establishing an index and searching by using the index:

(1) establishing an index: the method comprises the steps of establishing a text set, segmenting words of a document, carrying out language-dependent processing on unit word groups and establishing indexes by utilizing word segmentation results.

(2) Searching by using the index: the method sequentially comprises the steps of analyzing query sentences, searching by using indexes, obtaining documents based on results of the query sentences and query sentence sequencing, and returning query results.

And S203, classifying the referee documents.

The judgment document classification is essentially short text classification, and the adopted classification algorithms are many, including KNN, SVM, naive Bayes, decision trees, word and graph models and the like.

And S204, extracting the required information preset in the referee document.

The case base of the judgment books crawled from the judgment documents is huge in quantity, extraction of information of relevant cases in each field is very necessary, and relevant legal activity participant information can be obtained by extracting the information of the local judgment books, such as: the court of trial of the case, the identity of the original being reported, and the performance and importance of the attorney commissioned or assigned attorney throughout the case, among other things.

The correct and effective extraction of the required information from the official document text requires the use of relevant information extraction techniques. Information extraction techniques, which generally include rule-based methods and machine learning-based methods, refer to extracting specific facts from structured, semi-structured, and unstructured text.

The decision books in the referee's documents are divided into three categories based on the pain in the specific field of law: the writing specifications of different types of judgment books are greatly different, and the information to be extracted is also different, but the information includes basic information of cases, legal role information, case details, judgment results and the like. Therefore, the extraction of legal information in the decision-making book is suitable for adopting the information extraction based on the rule, and the extraction rule is constructed from the information to be extracted, the position of the information in the document, the keywords of the information appearance and the like.

Fig. 7 is a method for extracting information of a rule-based decision statement in the step of extracting information required to be preset in a referee document, which specifically includes:

s2041, reading a judgment book text after the web crawler crawls.

And S2042, selecting a corresponding extraction rule according to the classification type of the judgment book.

And S2043, loading a rule document, and partitioning the judgment book.

And S2044, matching information to be extracted in different blocks according to the block result.

And S2045, correcting the extracted result, and removing wrong matching information to obtain an extracted result.

Since the information content required to be extracted by the three different types of decision books is different, the corresponding extraction rule is changed accordingly, so in step S2042, the information required to be extracted by the different types of decision books should be preset first. Tables 1-3 list the information that needs to be extracted for three different types of decision books.

TABLE 1 information that civil judgment requires extraction

Name (R)	Properties
		Name	Lawyer name
Office	Lawyer place firm
		Plainordenf	Lawyers or quilt lawyers
Winorlose	Victory or complain
		Proidentify	Agent identity
Pronumber	Number of agents
		Ratio	Ratio of requested and determined indemnity amounts
Type	First, second or re-examination cases

Table 2 criminal adjudication information that needs to be extracted

TABLE 3 information to be extracted for administrative decisions

Name (R)	Properties
		Name	Lawyer name
Office	Lawyer place firm
		Plainordenf	Lawyers or quilt lawyers
Winorlose	Victory or complain
		Proidentify	Agent identity
Pronumber	Number of agents
		Money	Amount of money involved in case

After the content to be extracted is clarified, different extraction rules are designed for different types of decision documents to extract information from the official document. Based on the format specification and writing mode of the referee document, an information extraction algorithm shown in fig. 8 is adopted: analysis of chapters → sentence level extraction → word level extraction → annotation. Wherein,

analysis of chapters: based on the structure and content distribution of the sanction book, the sanction book is divided into different parts by using methods such as pattern recognition, regular expression and the like, and entities are further recognized and named by combining syntactic analysis of natural language processing. Specifically, the referee document is divided into several large plates according to its structural features, i.e. the information to be extracted is present at the position of the referee document, for example: the basic information of the referee document, the legal role, the judgment result and the like.

Sentence-level extraction: and analyzing the sentences in each board block obtained by the chapter according to the sections, and subdividing the content of the sentences. And similarly, by using a pattern matching method, extracting the content information of each sentence, and respectively obtaining crude information such as original notice, announced notice, agent identity, agent number, involved money, judgment result and the like.

And (3) word level extraction: and combining results of analysis of sections and chapters and sentence-level extraction to extract concrete information of entities, attributes and entity relations.

Labeling: and marking the entity based on the extraction result to obtain corresponding information.

In another embodiment of the invention, the invention also relates to the further processing of the extracted information.

And in the information extraction step, the related information of the case corresponding to the lawyer is acquired from the judgment book based on the algorithm of rule extraction. Because the information belongs to text information, in order to better utilize the extracted information to effectively match lawyers, the text information can be digitized, different weights are preset according to the importance of each piece of information to ranking scores of the lawyers, and the total score of the lawyers is finally obtained. In some embodiments, different numerical rules may be preset for specific information according to different types of the decision books, and then the text information may be converted into a calculable numerical value by performing numeralization according to the different numerical rules. For example, the civil action and criminal action adjudication can be set according to actual needs by referring to the preset values of the administrative litigation adjudication listed in table 4, but the present invention is not limited thereto.

TABLE 4 Preset numerical Table for information of administrative litigation adjudication

In addition, in the process of digitizing, after the text information is converted into numerical value representation, different weights are respectively assigned to the corresponding categories according to the numerical values, and then weighted summation is performed. The specific algorithm can be expressed as:

wherein M is the number of all judgment books of the lawyer in the field; n is information needing to be summed; delta_jkThe weight of the jth attribute in the kth document; omega_jkIs the value of the jth attribute in the kth document. Therefore, the score of each lawyer in each professional field can be obtained and used as a source of ranking information of the lawyer of the subsequent recommendation algorithm.

With continued reference to fig. 1, after acquiring the referee document and extracting the required information, the embodiment of the present application may further include step S3:

and S3, presetting a lawyer recommendation algorithm, obtaining a set or list of recommended lawyers according to the lawyer recommendation algorithm, and completing lawyer matching.

The attorney recommendation algorithm evaluates prospective attorneys according to the associated reference information, wherein the prospective attorneys form a candidate attorney set. In some embodiments, the prospective attorney may be attorney in a specified range, or attorney meeting some preset condition. The associated reference information includes the user case type and the keywords obtained in step S1, and the lawyer general information obtained from the official document in step S2. In some embodiments, the reference information for evaluating lawyers also includes registration information of the lawyer, question and answer information at the time of registration of the lawyer, geographic location information of the lawyer/user/jurisdictional court, and user evaluation information, among others. In other embodiments, for the convenience of the calculation process, the reference information of the lawyer recommendation algorithm may be quantified by referring to the above quantification method, and then according to a preset threshold, lawyers with a score exceeding the threshold after quantification are recommended to the user as candidate lawyers. Obviously, other evaluation forms without numeralization can be used to express the above reference information.

In one embodiment, for example, the scores of lawyers in the three legal professional fields of civil affairs, criminal and administrative affairs can be respectively obtained according to information extracted from the referee documents, then the scores of lawyers registration information (for example, the score is 80 if the lawyer is good at civil action, the score is 0 if the lawyer is not good at administrative action), the score of registration question and answer can be obtained, the final scores of the lawyers in the civil affairs, criminal and administrative affairs fields can be comprehensively obtained according to the geographic position score (for example, the score is 100 if the lawyer, user and jurisdiction court are all in the same region), the user evaluation score and other types of information scores suitable for evaluation of the lawyers, and then whether the score of the lawyer in the field corresponding to the case exceeds a threshold value or not can be judged according to case classification information obtained by the user case, and if the score exceeds the threshold value, the score is recommended as a candidate lawyer. Fig. 9 shows a basic conceptual topology of lawyer recommendation evaluation logic.

In some embodiments, special rating information, such as geographic location information, may not participate in the composite score, but rather serve as a special rating criterion. For example, lawyers that are not geographically desirable may be excluded by limiting the area in which the lawyer is located or the distance from the user/jurisdictional court.

In some embodiments, the recommended candidate attorneys may be recommended to the user in a list according to a ranking order of scores or ratings, with simple attorney information and a reason for the recommendation noted in the list.

In the embodiments of the present application, the basic algorithm principles of the lawyer recommendation algorithm include, but are not limited to, the following:

(1) collaborative filtering algorithm

The basic concept of the collaborative filtering recommendation algorithm is that contents which are interested by other users can be recommended to the user by finding other users which are similar to the user preference. Sub-algorithms of the collaborative filtering algorithm include:

memory-based algorithms: user-attorney (user-attorney) rating data is used to estimate a rating for a particular attorney or generate a list of recommendations for a target user. The main advantages are that the algorithm is simple and easy to understand and implement. However, in practical problems, the user lawyer scoring matrix is usually very sparse, so that the algorithm faces problems including a cold start problem (new user, new lawyer problem). In addition, the similarity calculation method adopted by the algorithm has the defects that if the number of lawyers scored by two users together is small, the true similarity of the two users is difficult to calculate accurately.

Model-based algorithms: the model-based recommendation method adopts methods such as statistics, machine learning and data mining, a model is established for a user according to historical data of the user, reasonable recommendation is generated according to the model, and the problem of sparsity of a scoring matrix of a lawyer of the user can be solved to a certain extent.

(2) Content-based recommendation algorithm

The main problem to be solved by the content-based recommendation algorithm is how to fully and reasonably utilize various features of lawyers and users, and the method comprises the following specific algorithms:

text content based recommendation: according to the method, user preference text information is constructed according to historical information (browsing records of a user and the like), the similarity between a recommended lawyer and the user preference text is calculated, and the most similar lawyer is recommended to the user. The user preference information and the recommended lawyer information both adopt keywords to represent features, and then a TF-IDF method is adopted to determine the weight of each feature.

Latent Semantic based recommendations (LSA): the method of document-word matrix singular value boundary is adopted to map documents and words to the same low-dimensional potential semantic space, and the similarity between documents, between words or between documents and words can be flexibly calculated in the space. The query request made by the user is also mapped to the same semantic space, the similarity between each document and the user query is calculated, and the most relevant document is returned. The LSA is mainly used for solving the problem of inaccurate calculation caused by synonymy and polysemy phenomena of keywords, and has the defects that the physical semantics of a potential semantic space obtained by singular value decomposition are ambiguous, and the singular value decomposition calculation amount of a matrix is large.

The self-adaptive recommendation algorithm comprises the following steps: since the user's needs may change dynamically over time, the preference document needs to be updated in time to always recommend accurate content to the user. The self-adaptive filtering method recommends the document with high similarity to the user preference document to the user, and meanwhile, the weight of each component of the user preference document is updated by using the document item with high similarity, so that the dynamic adjustment of the user preference document to the requirement is realized. The operation efficiency of the recommendation system can be improved through a threshold value method, and the user preference document is updated only when the similarity between the document items and the user preference document is higher than a set threshold value. Meanwhile, the user demand interest can be further divided into a long-term type and a short-term type, and the preset short-term interest can better reflect the current attention content of the user, so that the keyword of the short-term interest is given a larger weight, and the accuracy of modeling the user interest is further improved.

(3) Recommendation algorithm based on graph structure

The user-attorney matrix can be modeled as a bipartite graph (bipartite graph) in which nodes represent users and attorneys and edges represent user's evaluations of attorneys. The graph structure based recommendation algorithm gives reasonable recommendations by analyzing the bipartite graph structure.

(4) Hybrid recommendation algorithm

The hybrid recommendation algorithm is used to solve inherent problems of collaborative filtering recommendation algorithms, content-based recommendation algorithms, and graph structure-based recommendation algorithms. For example, the content-based recommendation algorithm may solve the "new attorney" problem with collaborative filtering recommendation algorithms, and the collaborative filtering recommendation algorithm may reduce the "overfitting" problem faced by the content-based recommendation algorithm. The mixed recommendation algorithm can independently use collaborative filtering, a recommendation algorithm based on content and a graph structure, the recommendation results generated by the recommendation algorithms are fused, and then the fused results are recommended to a user, and the mixed strategy of the algorithm mainly comprises the following steps:

1. the two methods are carried out independently, and the results are fused;

2. fusing into a collaborative filtering algorithm based on the content;

3. collaborative filtering is fused into a content-based algorithm;

and the number of the first and second groups,

4. the algorithms are mixed under the same framework to generate a new recommended algorithm.

Referring to fig. 10, according to the attorney matching method of steps S1-S3, the present invention further provides an attorney matching system (1) corresponding to the method, comprising: a text information analysis processing subsystem (100), a referee document information processing subsystem (200) and a lawyer recommendation subsystem (300).

The text information analysis processing subsystem (100) is used for analyzing and processing case situations, wherein case situation analysis refers to problem understanding of case situation description input by a user or corresponding question texts, and comprises the following steps: a problem preprocessing module (101) and a problem classification module (102).

Referring to fig. 11, the problem preprocessing module (101) is configured to preprocess the user input information, so as to obtain a compact and normative candidate feature phrase with a certain information amount. Wherein the problem pre-processing module (101) further comprises:

a Chinese word segmentation module (1011) for separating the text into word groups;

a named entity identification module (1012) for identifying an entity class, a time class, a number class and the like in the text to be processed;

the part of speech recognition and labeling module (1013) is used for removing the semanteme words, the auxiliary words and other nonsense words in the text, and simultaneously labeling and extracting the focus and the core components of the question;

the stop word filtering module (1014) is used for filtering information which does not greatly contribute to the expression of the query information or influences lawyer matching, wherein words and sentences needing to be filtered can be filtered according to a preset stop word list;

and the feature extraction module (1015) is used for extracting feature phrases.

The problem classification module (102) is used for classifying the problems described by the natural language and fully collecting information related to the problems so as to improve the accuracy of subsequent link processing. The main purpose of question classification is to label question types according to the question description of the user, so as to facilitate information retrieval and attorney matching. Wherein, the applicable classification algorithm model comprises: support vector machine model, Bayesian classification model, K-nearest neighbor model, maximum entropy model, etc.

With continued reference to fig. 10, the official document information processing subsystem (200) for acquiring the official documents, classifying the acquired official documents, extracting predetermined required information for each classified official document, and obtaining a first uniform evaluation based on the information includes: the system comprises a web crawler module (201), a full-text index module (202), a referee document classification module (203) and a referee document information extraction module (204).

Referring to fig. 12, the web crawler module (201) is configured to parse and acquire a URL of a referee document, and extract and store information required in the referee document according to the acquired URL of the referee document. The web crawler module (201) further comprises:

a URL analysis submodule (2011) for analyzing each classified initial URL based on the initial URL, analyzing step by step according to each classified initial URL and finally obtaining the URL of the referee document list, and finally analyzing from the list to obtain the URL of each referee document;

the URL storage submodule (2012) is used for storing the analyzed URL into a crawler queue;

and the information extraction and storage submodule (2013) analyzes the corresponding URL in the referee document, acquires the referee document information based on a template mode and stores the referee document information.

In some embodiments, the URL parsing sub-module (2011) further includes a URL deduplication sub-module (2014) configured to perform deduplication operations on duplicate URLs that may occur in the process of crawling the referee document, where it is determined that duplicate URLs may be stored in a memory or cached in a disk, and a bloom filter may be used to deduplicate URLs.

Continuing to refer to fig. 10, the full-text index module (202) is configured to establish a full-text index of the referee document, including sequentially establishing a text set, performing word segmentation on a document, performing language-dependent processing on unit phrases, and establishing an index (establishing an index) by using a word segmentation result; parsing the query statement in sequence, searching with an index, obtaining documents based on the results of the query statement and the query statement ordering, and returning the results of the query (searching with an index).

And the judging document classification module (203) classifies the judging documents by adopting algorithms such as KNN, SVM, naive Bayes, decision trees, word and graph models and the like.

And the referee document information extraction module (204) is used for extracting the information of the relevant cases of each field, and the information of the related legal activity participants can be obtained by extracting the information of the judgment documents. Referring to fig. 13, the official document information extraction module (204) further includes: the reading sub-module (2041), the rule setting sub-module (2042), the document partitioning sub-module (2043) and the information extraction sub-module (2044). Wherein,

the reading sub-module (2041) is used for reading the judgment book text after the web crawler crawls;

the rule setting submodule (2042) is used for selecting a corresponding extraction rule according to the classification type of the judgment book;

the document blocking submodule (2043) is used for loading a rule document and blocking a decision document;

and the information extraction sub-module (2044) matches the information to be extracted in different blocks according to the block result, corrects the extracted result, and removes wrong matching information to obtain an extraction result.

In some embodiments, the referee document information extraction module (204) may further include an information list to be extracted (2045) for storing information to be extracted for different types of predetermined referee documents.

With continued reference to fig. 10, attorney recommendation sub-module (300) is configured to pre-set attorney recommendation algorithms from which a set or list of recommended attorneys is derived to complete attorney matching. More specifically, the attorney recommendation sub-module (300) may include a preset algorithm sub-module (301), an association information sub-module (302), an evaluation sub-module (303), and a matching recommendation sub-module (304). Wherein,

the system comprises a preset algorithm submodule (301) for presetting lawyer recommendation algorithms, wherein the preset lawyer recommendation algorithms comprise a collaborative filtering algorithm, a content-based recommendation algorithm, a graph structure-based recommendation algorithm and a mixed recommendation algorithm;

the associated information sub-module (302) is used for acquiring the associated information of the evaluable lawyers, including but not limited to the case types and the keywords of the users, the comprehensive information of the lawyers acquired from the referee documents, the geographic position information of the lawyers/users/courthouse courts, the evaluation information of the users and the like;

the evaluation submodule (303) is used for making single and comprehensive evaluation on lawyers according to various associated information, wherein the evaluation mode can be numerical evaluation or other applicable evaluation modes;

and the matching recommendation sub-module (304) is used for setting an evaluation threshold value, and recommending lawyers with corresponding field evaluations exceeding the evaluation threshold value to the user as candidate lawyers. The recommendation mode can be that the user is recommended in a list form according to the grade sequence of the scores or the evaluations, and simple lawyer information and recommendation reasons are noted in the list.

It is contemplated that one of ordinary skill in the art may implement all or a portion of the above-described embodiments in hardware, or may instruct the associated hardware to implement the above-described embodiments in a program that may be stored in a computer-readable storage medium, such as a read-only memory, a magnetic or optical disk, or the like.

It should be apparent that the above-described preferred embodiments of the present invention are only preferred embodiments of the present invention, and are not intended to limit the present invention, and various modifications and changes can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the scope of the basic idea of the present invention should be included in the protection scope of the present invention.

Claims

1. A lawyer matching method for matching lawyers according to user's case, the method comprising:

acquiring case classification and keywords according to the user case;

acquiring a referee document, classifying the referee document according to the categories of civil affairs, criminals and administration, extracting predetermined required information aiming at different classified referee documents, and obtaining first law master evaluation according to the information;

obtaining a second lawyer evaluation according to other lawyer associated information except the predetermined required information;

obtaining a third lawyer evaluation according to the first lawyer evaluation and the second lawyer evaluation;

and matching corresponding lawyers according to the third lawyer evaluation, the case classification and/or the keywords and preset lawyer evaluation criteria.

2. The lawyer matching method of claim 1, wherein the obtaining of case classifications and keywords according to the case comprises:

the case situation is a text;

preprocessing the case problem;

classifying the problem based on the result of the preprocessing.

3. A lawyer matching method according to claim 2, wherein the pre-processing comprises:

and extracting characteristic phrases.

4. The attorney matching method according to claim 2, wherein said classifying questions based on the result of said preprocessing comprises:

the classification algorithm model comprises a support vector machine model, a Bayesian classification model, a K-nearest neighbor model and/or a maximum entropy model.

5. The attorney matching method according to claim 1, wherein said obtaining a referee document, classifying said referee document, and extracting predetermined required information for each classified referee document comprises:

acquiring the referee document through a web crawler;

establishing full-text indexes of the referee documents and classifying;

reading and classifying the judgment book text;

and respectively extracting the information of each type of judgment book text according to a preset rule, wherein the information of the judgment book text is the predetermined required information.

6. The attorney matching method according to claim 1, wherein obtaining a first attorney review based on said information comprises:

and evaluating the information, distributing weight to the information, and obtaining the first uniform teacher evaluation according to the information evaluation result and the weight occupied by the information evaluation result.

7. A attorney matching method according to claim 1, wherein said deriving a second attorney review based on attorney correlation information other than said predetermined desired information comprises:

and evaluating the other lawyer associated information, distributing weight to the other lawyer associated information, and obtaining the second lawyer evaluation according to the evaluation result and the occupied weight.

8. A attorney matching method according to claim 1, wherein said deriving a third attorney rating based on said first attorney rating and said second attorney rating comprises:

the third attorney rating is a composite rating of attorneys.

9. The attorney matching method according to claim 1, wherein said matching corresponding attorneys according to the third attorney evaluation, the case classification and/or the keyword comprises:

lawyer evaluation criteria are respectively preset according to each case type and/or case keyword;

determining whether the third attorney rating meets the attorney rating criteria;

matching lawyers meeting the lawyer evaluation criteria with corresponding cases.

10. A lawyer matching method according to any of claims 1-9, wherein the evaluation is a numerical evaluation.

11. A lawyer matching system for matching lawyers according to a user's case, comprising: a text information analysis processing subsystem, a referee document information processing subsystem and a lawyer recommendation subsystem, wherein,

the text information analysis processing subsystem is connected with the referee document processing subsystem and the lawyer recommendation subsystem and is used for acquiring case classification and keywords according to the case situation of the user;

the judge document information processing subsystem is also connected with the lawyer recommending subsystem to acquire the judge documents, classify the judge documents according to the categories of civil affairs, criminals and administration, and extract the required information determined in advance for the judge documents of different classifications;

the attorney recommendation subsystem is used for obtaining a first attorney evaluation according to the information, obtaining a second attorney evaluation according to other attorney related information except the predetermined required information, obtaining a third attorney evaluation according to the first attorney evaluation and the second attorney evaluation, and matching corresponding attorneys according to the third attorney evaluation, the case classification and/or the keywords and preset attorney evaluation criteria.

12. The attorney matching system of claim 11 wherein said textual information analysis processing subsystem further comprises a question pre-processing module and a question classification module, wherein,

the problem preprocessing module is used for preprocessing case problems;

the question classification module classifies a question based on a result of the preprocessing;

wherein, the case is in a text format.

13. The attorney matching system of claim 12, wherein said question preprocessing module is further configured to extract feature phrases.

14. The attorney matching system of claim 12, wherein said problem classification module employs a support vector machine model, a bayesian classification model, a K-nearest neighbor model, and/or a maximum entropy model classification algorithm.

15. The attorney matching system according to claim 11, wherein said referee document information processing subsystem comprises a web crawler module, a full text indexing module, a referee document classification module, and a referee document information extraction module, wherein,

the web crawler module is used for acquiring the referee document;

the full-text index module establishes and classifies full-text indexes of the referee documents;

the judge document classification module reads and classifies the judgment document text;

the judgment document information extraction module is used for respectively extracting the information of each type of judgment document text according to a preset rule;

wherein the information of the decision book text is the predetermined required information.

16. An attorney matching system according to claim 11, wherein said attorney recommendation subsystem comprises a correlation information sub-module and an evaluation sub-module, wherein,

the associated information submodule is used for acquiring the information;

and the evaluation submodule is used for evaluating the information, distributing weight to the information and obtaining the first uniform teacher evaluation according to the information evaluation result and the weight occupied by the information evaluation result.

17. An attorney matching system according to claim 11, wherein said attorney recommendation subsystem comprises a correlation information sub-module and an evaluation sub-module, wherein,

the correlation information submodule is used for acquiring other lawyer correlation information except the predetermined required information;

and the evaluation submodule is used for evaluating the other lawyer associated information, distributing weight to the other lawyer associated information and obtaining the second lawyer evaluation according to the evaluation result and the occupied weight.

18. A attorney matching system according to claim 11, wherein said deriving a third attorney rating based on said first attorney rating and said second attorney rating comprises:

the third attorney rating is a composite rating of attorneys.

19. The attorney matching system of claim 11, wherein said attorney recommendation subsystem comprises a match recommendation submodule, wherein said match recommendation submodule is configured to enable matching of attorney recommendations to a plurality of attorney users

20. A lawyer matching system according to any of claims 11-19, wherein the evaluations are numerical evaluations.