CN113407729A - Judicial-oriented personalized case recommendation method and system - Google Patents

Judicial-oriented personalized case recommendation method and system Download PDF

Info

Publication number
CN113407729A
CN113407729A CN202110508832.4A CN202110508832A CN113407729A CN 113407729 A CN113407729 A CN 113407729A CN 202110508832 A CN202110508832 A CN 202110508832A CN 113407729 A CN113407729 A CN 113407729A
Authority
CN
China
Prior art keywords
case
list
user
recommendation
judicial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110508832.4A
Other languages
Chinese (zh)
Other versions
CN113407729B (en
Inventor
丁锴
王腾
陈涛
王超群
蒋立靓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Enjoyor Co Ltd
Original Assignee
Enjoyor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Enjoyor Co Ltd filed Critical Enjoyor Co Ltd
Priority to CN202110508832.4A priority Critical patent/CN113407729B/en
Publication of CN113407729A publication Critical patent/CN113407729A/en
Application granted granted Critical
Publication of CN113407729B publication Critical patent/CN113407729B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services; Handling legal documents

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Game Theory and Decision Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Technology Law (AREA)
  • Primary Health Care (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a judicial-oriented personalized case recommendation method and a judicial-oriented personalized case recommendation system, wherein the judicial-oriented personalized case recommendation method comprises the following steps: s1, acquiring a case database, performing structured processing on case texts, constructing case element knowledge graphs, and extracting case elements in the case database according to the knowledge graphs to form case key feature tables; s2, acquiring a user database and constructing a user portrait feature table; s3, preliminarily calculating a candidate recommendation list according to the user portrait feature table in combination with the case element knowledge map and the case key feature table; and S4, providing a final recommendation list according to the candidate recommendation list and the user portrait feature list. The method is suitable for the condition of small number of users, the personalized recommendation is accurate, the method can be oriented to long-time sequences, and the map construction workload is small.

Description

Judicial-oriented personalized case recommendation method and system
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to a judicial-oriented personalized case recommendation method and system.
Background
The traditional searching method based on conditions such as case-based reasons and keywords is low in accuracy, so that the recommended searching result is poor, workers need to perform secondary screening in the recommended result, and the working efficiency is greatly influenced. Therefore, researchers provide a more refined personalized case recommendation method, and a more expected search result of a user is predicted according to the user interest and behavior records.
Personalized recommendation research is mainly focused on scenes such as search engines, e-commerce, movie and television entertainment and the like at present. For example, google's wide & cross, ali's DIN, DIEN, huachen's deep fm, etc. recommend algorithms, but these algorithms mainly investigate the click-through rate prediction problem, and personalized recommendations should be considered as an organic system. At present, the common scheme is collaborative filtering and a recommendation mode based on a knowledge graph, but the collaborative filtering needs more user data, so that users with similar characteristics can be matched, otherwise, the cold start problem can be caused. The knowledge graph-based method requires a user to input key information and then search for relevant knowledge in a relatively complete knowledge graph, but the construction workload of the knowledge graph is large. Furthermore, recommendation systems often consider the influence of short-time sequences, and cannot mine long-time sequences, resulting in the loss of long-term periodic influencing factors such as seasons.
Disclosure of Invention
Aiming at the problems in the introduction of the background technology, the invention provides a method and a system for recommending individual cases for judicial expertise, which are oriented to long-time sequences and based on semi-automatic knowledge and knowledge of a chart, and solve the problems of small number of users and the like.
The technical scheme adopted by the invention is as follows:
a judicial-oriented personalized case recommendation method comprises the following steps:
s1, acquiring a case database, carrying out structured processing on case texts, constructing case element knowledge maps, extracting case elements in the case database according to the knowledge maps, and forming a case key feature table;
s2, acquiring a user database and constructing a user portrait feature table;
s3, preliminarily calculating a candidate recommendation list according to the user portrait feature table in combination with the case element knowledge map and the case key feature table;
and S4, providing a final recommendation list according to the candidate recommendation list and the user portrait feature list.
Further, the step of structuring the text case in step S1 is as follows:
s1.1, segmenting a case text into text segments with paragraphs or sentences as units, wherein each text segment corresponds to a subdivision label to obtain sample data;
s1.2, inputting sample data into a CNN text classification model, training the model, and optimizing the training model according to the minimum difference between a prediction subdivision label and a real subdivision label;
s1.3, inputting the case text to be labeled into the trained CNN text classification model, and outputting the prediction subdivision label of the case text to be labeled.
Further, the construction of the case element knowledge graph also comprises the step of updating the case element knowledge graph by using a semi-supervised method, which is specifically as follows:
the case text is structurally decomposed into text segments, and subdivided labels are taken as category collections;
cutting the text segments corresponding to the single labels according to sentences, and screening out a legal element set Q in the form of sentences;
screening a standard element sentence subset U from a legal element set Q according to a basic event atlas K of a case element knowledge atlas, establishing one-to-one element manual mapping from ki to ui by combining the standard element sentence subset U, wherein the element sentence set which is not mapped to the event atlas is U ', ki is an element of an atlas, K is an integral atlas, and the standard element sentence subset U and the element sentence subset U' which is not mapped to the event atlas form the legal element set Q;
calculating the similarity between element samples u' i and ui which are not mapped to the graph by using a text similarity measurement method, setting a similarity threshold value according to experience, fusing standard sentence elements larger than the threshold value as lower samples of original graph nodes to a basic event graph, and classifying the samples smaller than the threshold value into an outlier sample set A;
clustering the outlier sample set A, and performing event map amplification judgment according to the scale sequence of the clusters.
Further, updating the case element knowledge graph by using a graph embedding algorithm specifically comprises the following steps:
calculating the distance between event nodes in the basic case element knowledge graph;
calculating a distance-based dimension reduction mapping model by using a umap method;
and calculating the distance between the newly added node and the standard node, and calculating the embedding position by using a umap mapping model. According to the invention, by map embedding, map nodes can be denser, and the accuracy of node dimension reduction is improved.
Further, the forming process of the case key feature table in step S1 is as follows:
structuring the case by using a structuring method, and extracting text segments;
and carrying out sentence segmentation on the text segment, matching the graph standard sentence elements by using the send-bert sentence vectors, wherein the matched graph nodes are the key features, thereby forming a key feature list.
Further, the user portrait feature list in step S2 includes interest features, browsing history, and favorite case list.
Further, the calculation of the candidate recommendation list in step S3 is as follows:
using case element knowledge graph and structuralization to calculate knowledge characteristics on a plurality of dimensions of each case, wherein the knowledge characteristics comprise a finest structured type defined as Fs and subdivision graph knowledge nodes defined as Fk;
converting the knowledge characteristics into vectors by utilizing one-hot coding;
calculating the weight W of each feature by using a linear regression method;
the weight W is estimated by minimizing the variance, as shown in the following equation
Figure BDA0003059439730000031
In the formula, i and j represent sample serial numbers, and epsilon represents whether the two samples are similar and is represented as 0 or 1;
and calculating the case most relevant to the historical browsing case and the favorite list case by using the weight as a candidate recommendation result.
Further, the step of obtaining the final recommendation list in step S4 includes:
calculating cases or legal rules which are possibly useful for the user by using the user interest information, the collaborative user information and the long-time user historical browsing and collected case list records to obtain a click probability list;
sorting is carried out by utilizing the time distance, the case quality characteristics and the context correlation and combining the list click probability according to a linear weighted maximization principle, wherein the case quality is calculated by taking the clicked times, the browsing duration and the collected times as variables, the clicked times are more, the browsing time is long, the collected times are more, and the case quality is high.
Further, the click probability calculation adopts a long-time-sequence-based user information click rate prediction algorithm, uses a hierarchical model, and combines a specific period by using sliding intersection, and specifically comprises the following steps:
embedding case characteristics h of browsing history to generate a fixed-length vector e;
calculating the click probability of the object to be predicted under the condition of a specific time window by using the short time sequence characteristics, and recording the result as b;
performing sliding cross operation on the full-period sequence to find out a short-time sequence related sequence with high click probability and connecting the short-time sequence related sequence with the high click probability into a new sequence e';
and (5) carrying out click prediction on the sequence e 'in combination with the object r1 to be predicted to obtain the final click prediction probability b'.
The judicial-oriented personalized case recommendation method system is characterized in that: comprises that
The user portrait feature module is used for constructing interest features of users in the judicial field;
the standard knowledge graph module is used for constructing a case element knowledge graph;
the case characteristic module is used for extracting case elements according to the knowledge graph to form a case key characteristic table;
the recalling module is used for preliminarily calculating a candidate recommendation list according to the user portrait feature table in combination with the case element knowledge map and the case key feature table;
the recommendation sorting module is used for providing a final recommendation list according to the candidate recommendation list and the user portrait feature table;
and the storage module is used for storing a user list, a case list and an element list.
Further, the recommendation ranking module comprises:
the long-time-sequence user information-based click rate prediction module is used for calculating cases or legal rules which are possibly useful for the user by using user interest information, collaborative user information, long-time-sequence user historical browsing and case list collection records, and obtaining a click probability list;
and the sorting module is used for sorting by utilizing the time distance, the case quality characteristics and the context correlation and combining the list click probability and a linear weighting maximization principle.
Compared with the prior art, the invention has the following remarkable advantages: the method is suitable for the condition of small number of users, the personalized recommendation is accurate, the method can be oriented to long-time sequences, and the map construction workload is small.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
FIG. 2 is a knowledge graph and structured example of the present invention.
FIG. 3 is a schematic flow chart of updating case element knowledge graph based on semi-supervised method.
FIG. 4 is a flow chart illustrating the long sequence-based click probability prediction process of the present invention.
Fig. 5 is a schematic diagram of the system architecture of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are not intended to limit the invention to these embodiments. It will be appreciated by those skilled in the art that the present invention encompasses all alternatives, modifications and equivalents as may be included within the scope of the claims.
Example one
Referring to fig. 1 to 4, the present embodiment provides a judicial-oriented personalized case recommendation method, which includes:
s1, acquiring a case database, carrying out structured processing on case texts, constructing case element knowledge maps, extracting case elements in the case database according to the knowledge maps, and forming a case key feature table;
the steps of carrying out the structuralization processing on the text case are as follows:
s1.1, segmenting a case text into text segments taking paragraphs or sentences as units, taking a structured object as a segment, and corresponding each text segment to a subdivision label to obtain sample data; it may be a sentence or a paragraph, for example, the original appeal is labeled by sentence. Evidence and statements are labeled in units of paragraphs. Each text segment corresponds to a subdivided label, and the subdivided label consists of a primary label and a secondary label, such as (Smn | Tn, Tagba | Taga), wherein Smn | Tn refers to the segment numbered mn in the text numbered n, and Tagba | Taga refers to the secondary label numbered ba under the primary label numbered a. The segment labels may also be comprised of multi-level labels.
S1.2, inputting sample data into a CNN text classification model, training the model, and optimizing the training model according to the minimum difference between a prediction subdivision label and a real subdivision label; sample data, N case texts and a subdivision label corresponding to each segment in the case texts;
s1.3, inputting the case text to be labeled into the trained CNN text classification model, and outputting the prediction subdivision label of the case text to be labeled.
The invention takes the case as input to carry out text structuring, divides the case into the party, the court dialect process and the judgment result, and subdivides one layer again as shown in the left structured part of figure 2. Then, according to the structured subdivision labels, data labeling is carried out by taking paragraphs or sentences as units; and then training a model by using a text classification method based on character-level CNN to realize automatic structuralization of case big data. The CNN text classification model can also adopt RNN, LSTM and DPCNN models.
The construction of the case element knowledge graph of the embodiment also comprises the steps of cutting the structured result according to sentences, updating the case element knowledge graph by using a semi-supervised method, and combining threshold text similarity and new knowledge mining based on clustering, as shown in fig. 3. The method comprises the following specific steps:
the case text is structurally decomposed into text segments, and structured subdivision labels are taken as category collections;
cutting the text segments corresponding to the single labels according to sentences, and screening out a legal element set Q in the form of sentences; the legal standard elements refer to the comparative standard legal events or the expression sentences of the attributes. We regard a sentence as an element, such as ' original claim, two announced to buy leather from them, and owe their money xxx elements ' and ' original beam xx to account for ' xx force ' from the announced web shop on 1 month and 9 days 2015, purchase color classification SC500A movable foldable hydraulic crane, order number: 9309978xxxx 289', the former is precise, the latter is not standard, the event is single and is standard element, the latter is not standard element, the event and attribute are redundant and is non-standard element.
Screening a standard element sentence subset U from a legal element set Q according to a basic event atlas K of a case element knowledge atlas, establishing one-to-one element manual mapping from ki to ui by combining the standard element sentence subset U, wherein the element sentence set which is not mapped to the event atlas is U ', ki is an element of an atlas, K is an integral atlas, and the standard element sentence subset U and the element sentence subset U' which is not mapped to the event atlas form the legal element set Q; the basic event graph is an event graph node and node relation set listed according to expert rules and standard element sets. Such as basic event map of marriage, comprising 3 levels of events, one level: { marriage, divorce, nurturing children, living together }; the second-level node { property division } under the first-level { divorce } is further divided into three-level nodes of { premarious property } and { common property } and the like to form a standard event map, { List, relationship }, where List ═ node1, node2, …, relationship ═ fast _ node, node, son _ node >. One standard sentence element and a plurality of lower sentences in one node.
Then, calculating the similarity between element samples u' i and ui which are not mapped to the graph by using a text similarity measurement method based on sent-bert, setting a similarity threshold value according to experience, fusing standard sentence elements larger than the threshold value as lower samples of original graph nodes to a basic event graph, and classifying the samples smaller than the threshold value into an outlier sample set A; other text similarity measures, such as the DSSM-LSTM method, the doc2vec method, etc., may also be used. The sent-bert can convert the text into vectors, and then the similarity between the text vectors can be measured by using the conventional cosine distance. The sensor-BERT method modifies the pre-trained BERT semantic model to generate sentence embedding vectors with semantics for calculating text similarity.
And finally, clustering the outlier sample set A, and performing event map amplification judgment according to the scale sequence of the clusters.
Assuming that the standard element sentence set A of the outlier has 8000 samples, vector conversion is carried out on the samples by utilizing sent-bert, and then clustering is carried out to form M outlier clusters. And (4) carrying out priority arrangement according to the size of the outlier cluster, and if the element consistency in the ith outlier cluster Mi is better and the legal definition is clear, embedding the element into a basic event map. Updating case element knowledge graph uses graph embedding algorithm, which is as follows:
firstly, calculating the distance between event nodes in a basic case element knowledge graph; the method comprises the following steps: calculating a text vector of the map event node corresponding to the standard element sentence by using the sent-bert; since one event node corresponds to a plurality of element sentences, the node position can be represented by the text vector mean of the element sentences.
Then, calculating a distance-based dimension reduction mapping model by using a umap method; umap is a nonlinear dimension reduction method, the dimension reduction result keeps the high-dimensional data space relation, and the method is suitable for processing sparse non-uniform distribution data.
And then calculating the distance between the newly added node and the standard node, and calculating the embedding position by using a umap mapping model. According to the invention, by map embedding, map nodes can be denser, and the accuracy of node dimension reduction is improved.
In the embodiment, the case key feature table and the id and the text are stored together, and after preprocessing, the searching speed is higher. The forming process of the case key feature table is as follows: firstly, structuring cases by using a structuring method, and extracting fragments such as original advices, original defended evidences and the like; and then, sentence segmentation is carried out on the text segments, the sensor-bert sentence vectors are matched with the standard sentence elements of the graph, and the matched graph nodes are the key features. Finally, the case is represented as a list of key features, which is used to perform user profiling and case recall modules.
S2, acquiring a judicial domain user database, and constructing a user portrait feature table; the user representation feature list comprises interest features, browsing history, a favorite case list and the like.
Specifically, the user interests are calculated according to the probability with the month as a span. The user portrait has great difference according to different application fields, for example, the e-commerce pays attention to the user for repurchase, so the portrait comprises the sex, age, location, hobby commodity category and the like of the user; in the judicial field, users pay attention to the case category direction which is good or expected to expand, so that cases including history attention cases, element features corresponding to the cases, the counties and the like where the users are located are drawn. The portrait drawing method comprises the following steps: first, user browsing data is collected, and user collection data is collected. The user browsing data includes case ID, browsing duration, whether there is help, etc. The user collection includes case ID and collection time. Then, the case group, element characteristics, and the like of the cases that have been processed are read from the stored data by the case ID, and the interest types of the users are counted in order of frequency. For example, the interest plan is defined as [ divorce ], and the interest element is characterized as [ property before marriage, real estate ], etc. During storage, case id, case text, case corresponding characteristics and the like are stored together, and corresponding contents can be read through id indexing.
S3, preliminarily calculating a candidate recommendation list according to the user portrait feature table in combination with the case element knowledge map and the case key feature table;
specifically, the method and the device are used for listing all related cases according to the user interest type, the user browsing history, the collected case list and the like. The commodity correlation can be calculated using natural brand and variety attributes, in contrast to which case attributes need to be generated from a model and then the correlation of the two case attribute features is measured. Firstly, using an event map and structuralization to calculate knowledge characteristics of each case on a family dialect and a plurality of judgment dimensions; features here include two, firstly the finest structured types, such as "grandfather quest", "grandfather evidence", etc., defined as Fs, and secondly the subdivision graph knowledge nodes, such as "agreement divorce", "litigation", etc., defined as Fk. Secondly, converting the knowledge characteristics into vectors by utilizing single-hot coding; then, calculating the weight W of each feature by using a linear regression method; estimating the weight W by using variance minimization, wherein the weight W is shown as the following formula;
Figure BDA0003059439730000061
in the formula, i and j represent sample serial numbers, and epsilon represents whether the two samples are similar and is represented as 0 or 1;
finally, the most relevant cases to the historical browsing cases and the favorite list cases are calculated by the weight as the recall result.
And S4, providing a final recommendation list according to the candidate recommendation list and the user portrait feature list.
The step of obtaining the final recommendation list in this embodiment includes: calculating cases or legal rules which are possibly useful for the user by using the user interest information, the collaborative user information and the long-time-series historical browsing and collected case list records of the user to obtain a click probability list; sorting is carried out by combining the time distance, the case quality characteristics and the context correlation and the list click probability according to the linear weighting maximization principle, wherein the case quality is calculated by taking the clicked times, the browsing duration and the collected times as variables, the clicked times are more, the browsing time is long, and the collected times are more, so that the case quality is high.
The click probability calculation of the embodiment adopts a long time sequence user information-based click rate prediction algorithm, a hierarchical model is used, and sliding intersection is combined with a specific period to improve the efficiency and realize ultra-long time sequence prediction, as shown in fig. 4. First, case features h such as browsing history are embedded to generate a fixed-length vector e. And then, calculating the click probability of the object to be predicted under the condition of a specific time window by using the short time sequence characteristics, and recording the result as b. The short time sequence refers to short-time historical browsing information intercepted, such as browsing data in one day or two days; the specific time window is a selected holiday or a specific period such as a month, a season, a week, etc. Then, in order to realize the coverage of the whole sequence, sliding cross operation is carried out on the full-period sequence, a short-time sequence related sequence with high click probability is found out, and the short-time sequence related sequence is connected into a new sequence e'. Here, an a priori expert rule is used, i.e. there is a strong correlation between the historical browsing records that affect the click. And finally, carrying out click prediction on the sequence e 'in combination with the object r1 to be predicted to obtain the final click prediction probability b'.
Besides the click probability list, the influence of the case arrangement sequence on the click needs to be considered so as to obtain a final recommended case list. For example, the list neighboring cases usually need to have obvious difference, and the user's preference for the occurrence time and place of the case also has difference, then the sorted reference variables are composed of these influencing factors, and the user preference linear weight is calculated.
The method is suitable for the condition of small number of users, the personalized recommendation is accurate, the method can be oriented to long-time sequences, and the map construction workload is small.
Example two
Referring to fig. 5, the embodiment provides a system of the judicial-oriented personalized case recommendation method according to the first embodiment, including a user portrait feature module, configured to construct interest features of users in the judicial field;
the standard knowledge graph module is used for constructing a case element knowledge graph;
the case characteristic module is used for extracting case elements according to the knowledge graph to form a case key characteristic table;
the recalling module is used for preliminarily calculating a candidate recommendation list according to the user portrait feature table in combination with the case element knowledge map and the case key feature table;
the recommendation sorting module is used for providing a final recommendation list according to the candidate recommendation list and the user portrait feature table;
and the storage module is used for storing a user list, a case list and an element list, and is combined with the algorithm module to realize efficient user search response. The user list records user ID, user browsing data, user portrait, user collection case and the like, and the user ID is used as an index list. The case list comprises case IDs, case elements and a case top-n association case list, and the case IDs are used as index columns. The top-n case with the most similar case is recorded in the storage module, so that quick recall is facilitated. When a new case is added, calculating the correlation of the new case to the existing case, and finding out a related case list; meanwhile, the correlation list of the existing cases is traversed, and the correlation list of the existing cases is updated according to the correlation sequence. The element list comprises an element ID, an element standard type and an element top-n associated element, and the element ID is used as an index column.
The recommendation sorting module in this embodiment includes:
the long-time-sequence user information-based click rate prediction module is used for calculating cases or legal rules which are possibly useful for the user by using user interest information, collaborative user information, long-time-sequence user historical browsing and case list collection records, and obtaining a click probability list;
and the sorting module is used for sorting by utilizing the time distance, the case quality characteristics and the context correlation and combining the list click probability and a linear weighting maximization principle.
In this embodiment, a marital case is taken as an example, and the implementation steps are as follows:
1. data preparation
First, data such as cases are collected, mainly referee document data. Then, the document data is structured into an ID number, a case, a judgment place, a party, a case, a judgment result, and the like.
2. Event map
The event map provides a basis for case characteristic extraction and is an important condition for accurate recommendation. The invention provides a semi-supervised map construction method, which comprises the following steps: first, a base event graph is constructed. In the summary of the invention, the marital method is taken as an example, and a foundation map spectrum related to the marital method is shown. And then, manually marking the standard events corresponding to the map nodes. Next, based on the standard event, the sentence vector correlation is used to find the lower instances of the standard event, as shown in the following table. And finally, performing cluster analysis on the events which cannot be matched with the map nodes, and judging whether new map nodes exist or not. And if the node is a new node, further calculating the embedding position of the new node, and amplifying the space coverage range or the density of the graph spectrum.
Figure BDA0003059439730000081
3. User representation
The user refers to a user who uses case recommendation, and the portrait characteristics of the user comprise case, case element characteristics, location, browsing records and the like. The method comprises the steps of calculating the categories of the interesting cases of the user and the like by using the structured cases and the browsing records of the user; then, according to the similarity between users, users with similar interests, places and the like are found out to form a list for recalling the modules and the like. For example, if a legal person is responsible for a marriage law case and deals with a trade contract, a folk loan, a financial loan, etc. in the course of processing the marriage law, the interest cases are mainly focused on [ divorce, house buying and selling, folk loan, financial loan ]. Besides the case, the case characteristics of the marital cases are different, and if accurate recommendation is required, characteristic-level statistics such as [ pre-married property, common property, property segmentation, foster right ] and the like are required according to the specific characteristics of cases browsed by users.
4. Recall module
And the recall module is used as a coarse screening process and used for limiting the range of the recommended objects. The method uses an offline periodic updating mode, respectively takes a case and a user as a core, finds out related cases of the case and related users of the user, and stores the related cases and related users in a storage module according to a correlation sequence. During recalling, the related case list of the case in the storage unit, the historical browsing case of the related user and the related case list are directly read, and are merged into a recall list in a de-duplication de-correlation mode. The total length of the recall list is typically set to a fixed value.
5. Recommendation module
The recommending module is used for sorting the recall list and comprises a click predicting module and a sorting module. The click prediction module inputs the recall case and the user browses the historical cases, and calculates and outputs the clicked probability of the recall case, such as 85%. Thus, the recall case list is traversed and the clicked probability list is computed, resulting in a result in the form of [ 85%, 64% ]. And then, combining the click probability, displaying the difference and time of the adjacent cases, the preference of the place of occurrence and the like, and calculating a final case arrangement list through a linear weight.
6. Memory module
The system is used for storing original case data, case-related intermediate data and the like, and comprises case storage, user storage and atlas storage, wherein the case and the user storage are relational data, and the atlas storage is an atlas database. The case table storage field includes 'case ID', 'case element characteristics', 'case text', 'case correlation list ID', etc. The user table field includes 'user ID', 'user interest', 'related user', etc. The graph relational table field includes 'node name' relative node 'node position', etc.

Claims (11)

1. A judicial-oriented personalized case recommendation method comprises the following steps:
s1, acquiring a case database, performing structured processing on case texts, constructing case element knowledge graphs, and extracting case elements in the case database according to the knowledge graphs to form case key feature tables;
s2, acquiring a user database and constructing a user portrait feature table;
s3, preliminarily calculating a candidate recommendation list according to the user portrait feature table in combination with the case element knowledge map and the case key feature table;
and S4, providing a final recommendation list according to the candidate recommendation list and the user portrait feature list.
2. The judicial-oriented personalized case recommendation method according to claim 1, wherein: the step of structuring the text case in step S1 is as follows:
s1.1, segmenting a case text into text segments with paragraphs or sentences as units, wherein each text segment corresponds to a subdivision label to obtain sample data;
s1.2, inputting sample data into a CNN text classification model, training the model, and optimizing the training model according to the minimum difference between a predicted subdivision label and a real subdivision label;
s1.3, inputting the case text to be labeled into the trained CNN text classification model, and outputting the prediction subdivision label of the case text to be labeled.
3. The judicial-oriented personalized case recommendation method according to claim 2, wherein: the construction of the case element knowledge graph further comprises the step of updating the case element knowledge graph by using a semi-supervised method, which comprises the following specific steps:
the case text is structurally decomposed into text segments, and subdivided labels are taken as category collections;
cutting the text segments corresponding to the single labels according to sentences, and screening out a legal element set Q in the form of sentences;
screening a standard element sentence subset U from a legal element set Q according to a basic event atlas K of a case element knowledge atlas, establishing one-to-one element manual mapping from ki to ui by combining the standard element sentence subset U, wherein the element sentence subset which is not mapped to the event atlas is U ', ki is an element of an atlas, K is an integral atlas, and the standard element sentence subset U and the element sentence subset U' which is not mapped to the event atlas form the legal element set Q;
calculating the similarity between element samples u' i and ui which are not mapped to the graph by using a text similarity measurement method, setting a similarity threshold value according to experience, fusing standard sentence elements which are larger than the threshold value as lower samples of original graph nodes to a basic event graph, and classifying the samples which are smaller than the threshold value as an outlier sample set A;
clustering the outlier sample set A, and performing event map amplification judgment according to the scale sequence of the clusters.
4. The judicial-oriented personalized case recommendation method according to claim 3, wherein: updating case element knowledge graph uses graph embedding algorithm, which is as follows:
calculating the distance between event nodes in the basic case element knowledge graph;
calculating a distance-based dimension reduction mapping model by using a umap method;
and calculating the distance between the newly added node and the standard node, and calculating the embedding position by using a umap mapping model.
5. The judicial-oriented personalized case recommendation method according to claim 1, wherein: the forming process of the case key feature table in step S1 is as follows:
structuring the case by using a structuring method, and extracting text segments;
and carrying out sentence segmentation on the text segment, matching the graph standard sentence elements by using the send-bert sentence vectors, wherein the matched graph nodes are the key features, thereby forming a key feature list.
6. The judicial-oriented personalized case recommendation method according to claim 1, wherein: in step S2, the user representation feature list includes interest features, browsing history, and favorite case list.
7. The judicial-oriented personalized case recommendation method according to claim 1, wherein: the calculation of the candidate recommendation list in step S3 is as follows:
using case element knowledge graph and structuralization to calculate knowledge characteristics on a plurality of dimensions of each case, wherein the knowledge characteristics comprise a finest structured type defined as Fs and a subdivision graph knowledge node defined as Fk;
converting the knowledge characteristics into vectors by utilizing one-hot coding;
calculating the weight W of each feature by using a linear regression method;
the weight W is estimated by minimizing the variance, as shown in the following equation
Figure FDA0003059439720000021
In the formula, i and j represent sample serial numbers, and epsilon represents whether the two samples are similar and is represented as 0 or 1;
and calculating the case most relevant to the historical browsing case and the favorite list case by using the weight as a candidate recommendation result.
8. The judicial-oriented personalized case recommendation method according to claim 1, wherein: the step of acquiring the final recommendation list in step S4 includes:
calculating cases or legal rules which are possibly useful for the user by using the user interest information, the collaborative user information and the long-time user historical browsing and collected case list records to obtain a click probability list;
sorting is carried out by utilizing the time distance, the case quality characteristics and the context correlation and combining the list click probability according to a linear weighting maximization principle, wherein the case quality is calculated by taking the clicked times, the browsing duration and the collected times as variables, the clicked times are more, the browsing time is long, and the collected times are more, so that the case quality is high.
9. The judicial-oriented personalized case recommendation method according to claim 8, wherein: the click probability calculation adopts a long-time-sequence-based user information click rate prediction algorithm, uses a hierarchical model, and combines a specific period by sliding intersection, and specifically comprises the following steps:
embedding case characteristics h of browsing history to generate a fixed-length vector e;
calculating the click probability of the object to be predicted under the condition of a specific time window by using the short time sequence characteristics, and recording the result as b;
performing sliding cross operation on the full-period sequence to find out a short-time sequence related sequence with high click probability and connecting the short-time sequence related sequence with the high click probability into a new sequence e';
and (5) carrying out click prediction on the sequence e 'in combination with the object r1 to be predicted to obtain the final click prediction probability b'.
10. The system for implementing the judicial-oriented personalized case recommendation method of claim 1, wherein: comprises that
The user portrait feature module is used for constructing interest features of users in the judicial field;
the standard knowledge graph module is used for constructing a case element knowledge graph;
the case characteristic module is used for extracting case elements according to the knowledge graph to form a case key characteristic table;
the recalling module is used for preliminarily calculating a candidate recommendation list according to the user portrait feature table in combination with the case element knowledge map and the case key feature table;
the recommendation sorting module is used for providing a final recommendation list according to the candidate recommendation list and the user portrait feature table;
and the storage module is used for storing a user list, a case list and an element list.
11. The system of claim 10, wherein: the recommendation ranking module comprises:
the long-time-sequence user information-based click rate prediction module is used for calculating cases or legal rules which are possibly useful for the user by using user interest information, cooperating with user information, and long-time-sequence user historical browsing and collecting case list records to obtain a click probability list;
and the sorting module is used for sorting by utilizing the time distance, the case quality characteristics and the context correlation and combining the list click probability and a linear weighting maximization principle.
CN202110508832.4A 2021-05-11 2021-05-11 Judicial-oriented personalized case recommendation method and system Active CN113407729B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110508832.4A CN113407729B (en) 2021-05-11 2021-05-11 Judicial-oriented personalized case recommendation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110508832.4A CN113407729B (en) 2021-05-11 2021-05-11 Judicial-oriented personalized case recommendation method and system

Publications (2)

Publication Number Publication Date
CN113407729A true CN113407729A (en) 2021-09-17
CN113407729B CN113407729B (en) 2022-06-24

Family

ID=77678173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110508832.4A Active CN113407729B (en) 2021-05-11 2021-05-11 Judicial-oriented personalized case recommendation method and system

Country Status (1)

Country Link
CN (1) CN113407729B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113961811A (en) * 2021-10-28 2022-01-21 平安科技(深圳)有限公司 Conversational recommendation method, device, equipment and medium based on event map
CN114547257A (en) * 2022-04-25 2022-05-27 湖南工商大学 Class matching method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145536A (en) * 2017-04-19 2017-09-08 畅捷通信息技术股份有限公司 User's portrait construction method and device and recommendation method and apparatus
CN110347814A (en) * 2019-06-28 2019-10-18 银江股份有限公司 A kind of lawyer's accurate recommendation method and system
CN110458641A (en) * 2019-06-28 2019-11-15 苏宁云计算有限公司 A kind of electric business recommended method and system
CN111274413A (en) * 2020-02-14 2020-06-12 迈拓仪表股份有限公司 Intelligent heat supply service recommendation method based on knowledge graph
CN112395506A (en) * 2020-12-04 2021-02-23 上海帜讯信息技术股份有限公司 Information recommendation method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145536A (en) * 2017-04-19 2017-09-08 畅捷通信息技术股份有限公司 User's portrait construction method and device and recommendation method and apparatus
CN110347814A (en) * 2019-06-28 2019-10-18 银江股份有限公司 A kind of lawyer's accurate recommendation method and system
CN110458641A (en) * 2019-06-28 2019-11-15 苏宁云计算有限公司 A kind of electric business recommended method and system
CN111274413A (en) * 2020-02-14 2020-06-12 迈拓仪表股份有限公司 Intelligent heat supply service recommendation method based on knowledge graph
CN112395506A (en) * 2020-12-04 2021-02-23 上海帜讯信息技术股份有限公司 Information recommendation method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张德: "自然语言处理技术在司法过程中的应用研究", 《信息与电脑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113961811A (en) * 2021-10-28 2022-01-21 平安科技(深圳)有限公司 Conversational recommendation method, device, equipment and medium based on event map
CN113961811B (en) * 2021-10-28 2024-04-05 平安科技(深圳)有限公司 Event map-based conversation recommendation method, device, equipment and medium
CN114547257A (en) * 2022-04-25 2022-05-27 湖南工商大学 Class matching method and device, computer equipment and storage medium
CN114547257B (en) * 2022-04-25 2022-07-19 湖南工商大学 Class matching method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN113407729B (en) 2022-06-24

Similar Documents

Publication Publication Date Title
CN110263265B (en) User tag generation method, device, storage medium and computer equipment
CN104834686B (en) A kind of video recommendation method based on mixing semantic matrix
CN107220365B (en) Accurate recommendation system and method based on collaborative filtering and association rule parallel processing
CN111191092B (en) Label determining method and label determining model training method
CN102567464B (en) Based on the knowledge resource method for organizing of expansion thematic map
CN105095187A (en) Search intention identification method and device
CN109918563B (en) Book recommendation method based on public data
CN110851718B (en) Movie recommendation method based on long and short term memory network and user comments
CN107357793A (en) Information recommendation method and device
CN105426514A (en) Personalized mobile APP recommendation method
CN111309936A (en) Method for constructing portrait of movie user
CN107315738A (en) A kind of innovation degree appraisal procedure of text message
CN113407729B (en) Judicial-oriented personalized case recommendation method and system
Dang et al. Framework for retrieving relevant contents related to fashion from online social network data
CN114254201A (en) Recommendation method for science and technology project review experts
CN105677828A (en) User information processing method based on big data
CN110310012B (en) Data analysis method, device, equipment and computer readable storage medium
CN116975615A (en) Task prediction method and device based on video multi-mode information
CN115712780A (en) Information pushing method and device based on cloud computing and big data
CN105677825A (en) Analysis method for client browsing operation
CN113239159B (en) Cross-modal retrieval method for video and text based on relational inference network
CN114168790A (en) Personalized video recommendation method and system based on automatic feature combination
CN116629258B (en) Structured analysis method and system for judicial document based on complex information item data
Zhang et al. A recommender system for cold-start items: a case study in the real estate industry
CN113688281B (en) Video recommendation method and system based on deep learning behavior sequence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province

Applicant after: Yinjiang Technology Co.,Ltd.

Address before: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province

Applicant before: ENJOYOR Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant