CN111309900B - Legal class similarity judging and pushing method - Google Patents
Legal class similarity judging and pushing method Download PDFInfo
- Publication number
- CN111309900B CN111309900B CN202010055473.7A CN202010055473A CN111309900B CN 111309900 B CN111309900 B CN 111309900B CN 202010055473 A CN202010055473 A CN 202010055473A CN 111309900 B CN111309900 B CN 111309900B
- Authority
- CN
- China
- Prior art keywords
- case
- distance
- legal
- event sequence
- historical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000005259 measurement Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 28
- 238000004458 analytical method Methods 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 7
- 230000006870 function Effects 0.000 claims description 4
- 238000007635 classification algorithm Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 claims description 3
- 238000012850 discrimination method Methods 0.000 claims description 3
- 238000003058 natural language processing Methods 0.000 claims description 3
- 241000590419 Polygonia interrogationis Species 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 abstract description 5
- 206010063385 Intellectualisation Diseases 0.000 abstract description 3
- 238000013473 artificial intelligence Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Technology Law (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a legal class similarity judging and pushing method, wherein the related judging method comprises the following steps: classifying target legal cases, and extracting historical cases of the same category from a historical case database to form a candidate set according to the obtained case category; performing event sequence representation on the target legal case and each similar historical case in the candidate set; calculating the distance between the event sequence corresponding to the target legal case and the event sequence corresponding to each historical case in the candidate set according to the event sequence measurement model; and calculating the similarity between the target legal case and the candidate centralized historical case based on the distance of the event sequence and the scoring function. The method can realize more comprehensive and accurate class case identification; meanwhile, the legal documents are expressed as a time sequence event sequence, similarity calculation is carried out based on an unsupervised mode, and historical cases with higher scores are selected for pushing, so that labor input is greatly reduced, and pushing intellectualization can be better realized.
Description
Technical Field
The invention relates to the field of legal intelligence, in particular to a legal class case similarity distinguishing and pushing method.
Background
At present, artificial intelligence theory and technology are mature day by day, and the application range is expanded continuously. In 2017, an intelligent court is built in the national artificial intelligence strategy new-generation artificial intelligence development planning, so that the application of artificial intelligence in evidence collection, case analysis and legal document reading and analysis is promoted, and the intellectualization of a court trial system and trial capacity is realized. The realization of case classification by artificial intelligence technology has become an important research content close to the needs of judges.
The case judgment is used as an auxiliary tool, and aims to search similar or even identical cases for cases being processed by a judge, so as to enlighten and expand case judgment thinking of the judge and help the judge to judge correctly, and the judgment results of the same or similar cases can have smaller deviation. However, the existing class retrieval system has the problems that the case pushing is not accurate and the needs of judges cannot be really solved. If the pushing case does not achieve the same case, even the same case; the number of cases pushed is too high, the judge time is not really saved, and a large amount of manual screening is still needed.
Since most legal case records are electronic documents, the form of the documents is natural language expression text. Thus, class recognition can be considered as an application scenario for text similarity measurement. The existing natural language processing method can realize the case identification to a certain extent, but the core difference points of the case elements are difficult to be accurately distinguished. The main problems are as follows:
1) the accuracy of the way based on keyword matching is not sufficient. Keyword retrieval is actually "sample validation" and its conclusions with a small number of samples are not complete. Meanwhile, the number of the obtained cases is too large, so that a judge is difficult to identify cases with important reference values.
2) The method for constructing the neural network by expressing words as vectors based on word2vec needs a large amount of labeled and structured training linguistic data, and the prior legal field lacks a large amount of labeled legal data and talents which can both understand law and understand technology.
3) The main reference value of the class case is to push the case judgment idea and practice of judges in cases with similar history aiming at a plurality of legal details or difficulties in the cases. However, no legal document similarity measurement model designed for the characteristics of the legal industry exists at present.
Disclosure of Invention
The invention aims to provide a legal class similarity judging and pushing method, which solves the defects that the existing method needs a large amount of manual labeling, class pushing is inaccurate, information is complicated, and the pertinence of legal problems is lacked.
The purpose of the invention is realized by the following technical scheme:
a legal class similarity discrimination method comprises the following steps:
classifying target legal cases, and extracting historical cases of the same category from a historical case database to form a candidate set according to the obtained case category;
performing event sequence representation on the target legal case and each similar historical case in the candidate set;
calculating the distance between the event sequence corresponding to the target legal case and the event sequence corresponding to each historical case in the candidate set according to the event sequence measurement model;
and calculating the similarity of the target legal case and the candidate centralized historical case based on the distance of the event sequence and the scoring function.
A legal class pushing method, comprising: and calculating the similarity between the target legal case and the history cases in the candidate set by using the method, sequencing the history cases in the candidate set according to the sequence of similarity scores from high to low, and extracting M history cases with the top rank for pushing.
According to the technical scheme provided by the invention, by classifying the cases of the legal documents and analyzing the case theme distribution, the history cases which are most similar to the semantic information described by the target case can be selected while ensuring the same case type, so that more comprehensive and accurate case identification is realized; meanwhile, the legal documents are expressed as a time sequence event sequence, similarity calculation is carried out based on an unsupervised mode, and historical cases with higher scores are selected for pushing, so that labor input is greatly reduced, and pushing intellectualization can be better realized.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of a legal class similarity determination method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an event time chain represented by extracting an event corresponding to each time segment according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an event time chain for case extraction according to an embodiment of the present invention;
fig. 4 is a schematic diagram of aligning event sequence time points according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a method for judging similarity of legal classes, as shown in fig. 1, comprising the following steps:
step 1, classifying target legal cases, and extracting historical cases of the same category from a historical case database to form a candidate set according to the obtained case categories.
The case identification aims to find cases similar to or even identical to the target case in the history cases of the same type, so the case classification is carried out firstly.
Each case in the existing historical case database has a corresponding category label, and a certain number of historical cases can be selected for each case category to construct a data set according to case category distribution. And obtaining the case category corresponding to the target case by adopting a relatively mature deep learning text classification algorithm in natural language processing as a classifier. And extracting the history cases of the same type in the history case database to form a candidate set according to the obtained case types.
And 2, representing the event sequence of the target legal case and each similar historical case in the candidate set.
The text representation model is designed aiming at the characteristics of legal cases, so that the relation among events can be effectively represented, the evolution process of the cases along with time is presented, and the method has higher practical application value. The preferred embodiment of this step is as follows:
1) for any legal case, extracting case elements in the legal case document by using an information extraction technology, wherein the case elements at least comprise: the position of the defendant, the post-case attitude, the time, the core words, the involved amount and the like of the defendant.
The core words are extracted by a dependency syntactic analysis method, in brief, sentences in the legal case document are divided according to periods, exclamation marks and question marks, and if the number of the divided sentences is n, dependency analysis is performed on n sentences to obtain n core words.
The dependency syntax analysis method is a mature technology at present, and the principle is as follows: the whole sentence structure is expressed by the dependency relationship among the vocabularies, the dependency relationship expresses the semantic dependency relationship among the components of the sentence, the dependency relationship among all the vocabularies forms a syntactic tree, and the root node of the tree is a sentence core predicate and is used for expressing the core content of the whole sentence. The core predicate is the core word corresponding to the sentence.
2) Two case elements of the position of the announced person and the after-case attitude of the announced person are independently reserved.
3) And qualitatively expressing the time sequence relation among the events of the rest case elements according to different time nodes, respectively expressing the events without crossing and overlapping occurrence time as independent events, and combining the events of the rest cases to form a case scenario event chain.
It will be appreciated by those skilled in the art that the descriptions in the legal documents are generally described in chronological order, as in the two cases presented hereinafter, non-overlapping time periods, corresponding to different events. The event is firstly positioned by time, and then the corresponding event description statement in the time is found. The unstructured event description statements are then structured into combinations of case elements by an event extraction technique (extracting individual case elements).
4) Numerically expressing case elements of events occurring at time i to obtain the events i =(e i1 ,e i2 ,…,e in ) Where n is the number of case elements that occurred at time i.
5) Initialization weight vector weight ═ w 1 ,w 2 ,…,w n ) Wherein, in the step (A),associating the weight vector with the event i Multiplying each element in the time sequence to obtain a final representation vector of the time i event i =(w 1 e i1 ,w 2 e i2 ,…,w n e in );。
In the embodiment of the invention, the weight vector is priori knowledge and is initialized in advance through experience. In the subsequent learning process, the visual result is updated, and the specific updating mode can be selected by the user according to the situation or experience.
6) Connecting the event representations of all the time to obtain a sequence event chain (vector) of vectorized representation 1 ,vector 2 ,…,vector m ) Vector therein i And m is the total number of independent events and combined events extracted from the legal case document.
And 3, calculating the distance between the event sequence corresponding to the target legal case and the event sequence corresponding to each historical case in the candidate set according to the event sequence measurement model.
And 2, measuring the sequence distance according to the event sequence obtained in the step 2. In measuring the similarity of event sequences, there are cases where the time lengths of the sequences to be aligned are not consistent, and therefore, the sequences need to be aligned. Matching an event sequence corresponding to a target legal case with each event sequence corresponding to a historical case from a starting point by adopting a Dynamic Time Warping (DTW) method, calculating the Distance between the corresponding two points when each event sequence reaches one point, accumulating the distances of all points passing before, and finally selecting the minimum accumulated Distance as the Distance of the event sequence EventSequence . The method can reduce the two sequences to the maximum extent by searching the corresponding relation between the point pointsPoint-to-point matching of column distances.
And 4, calculating the similarity between the target legal case and the candidate centralized historical case based on the distance of the event sequence and the scoring function.
In the embodiment of the present invention, the similarity of the topics, the distance of the event sequence obtained in step 3, and the similarity of the positions of the advertised persons and the post-situation degrees of the advertised persons obtained in step 2 are considered, and the similarity is mainly as follows:
1) performing theme analysis on the target legal case and the candidate centralized historical case through a theme analysis model to obtain corresponding theme probability distribution, and calculating the semantic Distance between the target legal case and the candidate centralized historical case as theme similarity Distance according to the theme probability distribution topic 。
2) When the event sequence is expressed, two case elements of the position of the advertiser and the post-case attitude of the advertiser are respectively extracted from the target legal case and the candidate centralized historical case, the similarity of the position of the advertiser and the post-case attitude of the advertiser in the target legal case and the candidate centralized historical case is calculated by utilizing the cosine Distance, and the similarity is respectively marked as Distance position 、 Distance attitude 。
3) The event sequence Distance between the target legal case and the candidate centralized historical case is recorded as Distance EventSequence 。
4) Calculating the similarity of the target legal case and the candidate centralized historical case by using the following formula:
score=α 1 Distance topic +α 2 Distance EventSequence +α 3 Distance position +α 4 Distance attitude
wherein alpha is 1 、α 2 、α 3 And alpha 4 Are all weights, α 1 +α 2 +α 3 +α 4 =1。
For ease of understanding, the following description is made with reference to specific examples, and the case types, case information, and the like referred to in the following examples are all examples.
Case one: the tollgate is used by the person to steal the receivable engineering money of the company A for a plurality of times while the tollgate serves as a salesman of the company A. The specific facts are as follows: 1. the advertiser receives 9.6 ten thousand yuan of project money of a B1 cell committed by Wu and then hands the advertiser to company A for 6.5 ten thousand yuan during the period from 4 to 7 months in 2017, and the rest 3.1 ten thousand yuan is returned to the individual for use. 2. The advertiser receives 5.8 ten thousand yuan of B2 cell engineering money submitted by Wang and is not delivered to company A during 10 months in 2017, and the advertiser is returned to the individual for use. 3. The advertiser receives 11.3 ten thousand yuan of B3 cell engineering money paid in Zhou Gong and is not delivered to company A and is used by the advertiser after being grandfather in 2017 in 12 months. After the case, the defendant grandson really provides the crime fact, returns the 10-ten-thousand RMB of the defendant unit (namely company A) and obtains forgiveness.
Case two: while an advertised leaf is acting as a sales counselor of a car sales service company ltd (hereinafter, referred to as company C), a company vehicle is privately sold to a client for a plurality of times by using job convenience, and part of the vehicle money is returned to the company C for use and is not handed over to the company C. The specific crime facts are as follows: 1. on 11/15/2017, some notifier leaves collect 130500 Yuan vehicle money purchased by a customer and then are used by the notifier. 2. In 2018, 7, 31 and the leaf of the defendant is used by the defendant after receiving 1 RMB 52678 Yuan vehicle money purchased by the customer in the week. 3. In 2018, 8, 4 and 4, the leaf of the defendant collects the money of 20000 Yuan vehicle purchasing of the Renminbi of the customer and then the leaf of the defendant is returned to the user for use. After the case, the defendant leaf loses all economic losses of the company C with the refund, and obtains written understanding of the company.
See the means provided earlier:
step 1, case classification is firstly carried out. And selecting a text classification algorithm in deep learning, such as BERT, FastText, DPCNN and the like as classifiers, and classifying the event of the case I to obtain a classification of 'appropriating capital guilt'. And then selecting all cases of 'appropriating capital crimes' from the historical case database to form a candidate set, and sequentially extracting cases in the candidate set to compare with the case I in similarity, taking the case II in the candidate set as an example.
Step 2, representing the event sequence by the following method:
1) and performing structured extraction on case elements by adopting an information extraction method. The obtained case-the position of the advertiser is 'salesman', and the attitude after the case is 'returned'; case two, the position of the advertiser, "sales consultant", and the post-filing attitude, "reimbursement". Time, core words, involved amount and purpose of case issue corresponding to each case: the first case is as follows: 4-7 months in 2017: collection, project money, 31000 yuan, personal use, 10 months in 2017: collection, project money, 58000 yuan, personal use, 12 months in 2017: collection, project payment, 113000 yuan, personal use; case two is: 11/15/2017: collect, purchase vehicle money, 130500 yuan, use by oneself, 31 months 7 in 2018: collect, purchase vehicle money, 52678 yuan, use by oneself, 8 months and 4 days in 2018: collecting, purchasing vehicle and making money, 20000 Yuan, and using by oneself.
2) According to the extraction result, the position of an advertiser in the case is independently reserved as a salesman, and the attitude after the case is released is returned; case two advertiser positions "sales consultant", post-filing attitude "refund" two elements.
3) As shown in FIG. 2, the time-series case elements are organized in a time-qualitative relationship. Regarding the events with non-crossed and non-overlapped occurrence time, the events are regarded as independent events to be respectively represented, and the other conditions are combined to obtain the event chain shown in the figure 3, which comprises case elements such as time, trigger words, case-related motivations, money amount and the like.
4) And digitizing case elements in the event chain by adopting a word2vec training word vector method in genim.
5) And performing vector calculation according to the weight. Event in 2017 month 10 10 months in 2017 Event is calculated assuming that the initialization weight is weight (0.3, 0.2, 0.2, 0.3) 10 months in 2017 Is expressed as vector 10 months in 2017 =(0.3e Harvesting ,0.2e Engineering money ,0.2e 58000 ,0.3e For oneself to use ),e Element 1 Representing case element 1 vector values.
6) The event representations at all times are concatenated, i.e., a time-series event chain of vectorized representations is obtained.
Step 3, sequence distance is carried out on the event sequence obtained in step 2The amount of separation. By using the dynamic time warping method shown in FIG. 4, event sequence EventSequence corresponding to case one Case one =(vector 4 to 7 months in 2017 ,vector Year 2017, month 10 ,vector 12 months in 2017 ) Event sequence EventSequence corresponding to case two Case two =(vector 11 and 15 months in 2017 ,vector 31/7/2018 ,vector 8/4/2018 ) The vector in case one is obtained by time point alignment and distance calculation 4 to 7 months in 2017 Vector in case two 11 and 15 months in 2017 Correspondingly, the analogy is repeated. Obtaining the Distance calculation result Distance of the event sequence EventSequence 。
And 4, scoring the overall similarity of the sequences by the following method:
1) calculating the similarity Distance of the two cases topic . And inputting case one and case two legal documents by using an LDA theme analysis model in genim to obtain corresponding theme distribution. Calculating the semantic Distance, namely the topic similarity Distance, of the two legal documents by using the Distance formula KL according to the topic probability distribution topic ;
2) Calculating similarity Distance between two case positions of 'salesman' and 'sales advisor' by using cosine Distance position Attitude "refund" and "refund" similarity Distance after filing attitude ;
3) Integrating the topic similarity, the event sequence distance, the position and the post-case attitude similarity of the case to score
score=α 1 Distance topic +α 2 Distance EventSequence +α 3 Distance position +α 4 Distance attitude 。
The method calculates the similarity between the target legal case and the history cases in the candidate set by using the similarity judging method provided by the embodiment, sorts the history cases in the candidate set according to the sequence of similarity scores from high to low, extracts M history cases with the top rank and pushes the M history cases. The value of M may be set according to actual conditions, for example, M is 10.
Through the description of the above embodiments, it is clear to those skilled in the art that the above embodiments may be implemented by software, or by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (5)
1. A legal class similarity discrimination method is characterized by comprising the following steps:
classifying target legal cases, and extracting historical cases of the same category from a historical case database to form a candidate set according to the obtained case category;
performing event sequence representation on the target legal case and each similar historical case in the candidate set;
calculating the distance between the event sequence corresponding to the target legal case and the event sequence corresponding to each historical case in the candidate set according to the event sequence measurement model;
calculating the similarity between the target legal case and the candidate centralized historical case based on the distance of the event sequence and the scoring function;
wherein the event sequence representation of the target legal case and each similar historical case in the candidate set comprises:
for any legal case, extracting case elements in the legal case document by using an information extraction technology, wherein the case elements at least comprise: the positions of the defenders, the postscript attitudes, the time, the core words and the involved amount of the defenders; the core words are extracted by a dependency syntactic analysis method, namely, sentences in the legal case document are divided according to periods, exclamation marks and question marks, dependency analysis is carried out on the sentence division results, and corresponding core words are obtained;
two case elements of the position of the announced person and the after-case attitude of the announced person are independently reserved;
representing the time sequence relation among the events of the rest case elements according to different time nodes, regarding the events without crossing and overlapping in occurrence time as independent events, respectively representing, and combining the events of the rest cases to form a case scenario event chain;
numerically expressing case elements of the events occurring at the time i to obtain the events i =(e i1 ,e i2 ,…,e in ) Wherein n is the number of case elements of the event occurring at the time i;
initialization weight vector weight ═ w 1 ,w 2 ,…,w n ) Wherein, in the step (A),associating the weight vector with the event i Multiplying each element correspondingly to obtain a final representation vector of the time i event i =(w 1 e i1 ,w 2 e i2 ,…,w n e in );
Connecting the event representations of all the time to obtain a sequence event chain (vector) of vectorized representation 1 ,vector 2 ,…,vector m ) Vector therein i And m is the total number of independent events and combined events extracted from the legal case document.
2. The legal class similarity determination method of claim 1, wherein the classifying the target legal case comprises: and classifying the target legal case by using a deep learning text classification algorithm in natural language processing as a classifier to obtain a case category corresponding to the target legal case.
3. The method as claimed in claim 1, wherein the calculating the distance between the event sequence corresponding to the target legal case and the event sequence corresponding to each historical case in the candidate set according to the event sequence metric model comprises:
matching an event sequence corresponding to a target legal case with each event sequence corresponding to a historical case by adopting a Dynamic Time Warping (DTW) method from a starting point, calculating the Distance between the corresponding two points when each event sequence reaches one point, accumulating the distances of all points which pass before, and finally selecting the minimum accumulated Distance as the Distance of the event sequence EventSequence 。
4. The legal class similarity discrimination method of claim 1, wherein calculating the similarity between the target legal case and the candidate centralized history case based on the distance of the event sequence and the scoring function comprises:
performing topic analysis on the target legal case and the candidate centralized history case through a topic analysis model to obtain corresponding topic probability distribution, and calculating the semantic Distance between the target legal case and the candidate centralized history case as topic similarity Distance according to the topic probability distribution topic ;
When the event sequence is expressed, two case elements of the position of the advertiser and the post-event attitude of the advertiser are respectively extracted from the target legal case and the candidate centralized history case, the similarity of the position of the advertiser and the post-event attitude of the advertiser in the target legal case and the candidate centralized history case is calculated by utilizing the cosine Distance, and the similarity is respectively marked as Distance position 、Distance attitude ;
Event sequence distance between target legal case and candidate centralized historical caseIs recorded as Distance EventSequence ;
Calculating the similarity of the target legal case and the candidate centralized historical case by using the following formula:
score=α 1 Distance topic +α 2 Distance EventSequence +α 3 Distance position +α 4 Disyance attitude
wherein alpha is 1 、α 2 、α 3 And alpha 4 Are all weighted, α 1 +α 2 +α 3 +α 4 =1。
5. A legal class pushing method is characterized by comprising the following steps: the method of any one of claims 1 to 4 is utilized to calculate the similarity between the target legal case and the historical cases in the candidate set, then the historical cases in the candidate set are ranked according to the similarity score from high to low, and M historical cases with the top rank are extracted for pushing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010055473.7A CN111309900B (en) | 2020-01-17 | 2020-01-17 | Legal class similarity judging and pushing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010055473.7A CN111309900B (en) | 2020-01-17 | 2020-01-17 | Legal class similarity judging and pushing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111309900A CN111309900A (en) | 2020-06-19 |
CN111309900B true CN111309900B (en) | 2022-09-06 |
Family
ID=71159856
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010055473.7A Active CN111309900B (en) | 2020-01-17 | 2020-01-17 | Legal class similarity judging and pushing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111309900B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019655A (en) * | 2017-07-21 | 2019-07-16 | 北京国双科技有限公司 | Precedent case acquisition methods and device |
CN111797247B (en) * | 2020-09-10 | 2020-12-22 | 平安国际智慧城市科技股份有限公司 | Case pushing method and device based on artificial intelligence, electronic equipment and medium |
CN113806590A (en) * | 2021-09-27 | 2021-12-17 | 北京市律典通科技有限公司 | Intelligent criminal case data pushing method and system |
CN115146065A (en) * | 2022-09-02 | 2022-10-04 | 安徽商信政通信息技术股份有限公司 | Intelligent information reporting similar content merging method and system |
CN115878815B (en) * | 2022-11-29 | 2023-07-18 | 深圳擎盾信息科技有限公司 | Legal document judgment result prediction method, legal document judgment result prediction device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102012918A (en) * | 2010-11-26 | 2011-04-13 | 中金金融认证中心有限公司 | System and method for excavating and executing rule |
CN106126695A (en) * | 2016-06-30 | 2016-11-16 | 张春生 | A kind of similar case search method and device |
CN106503470A (en) * | 2016-11-04 | 2017-03-15 | 中国科学技术大学 | A kind of time serieses distance metric method compared based on status switch |
CN108665182A (en) * | 2018-05-18 | 2018-10-16 | 中国科学技术大学 | A kind of patent action Risk Forecast Method |
CN109213864A (en) * | 2018-08-30 | 2019-01-15 | 广州慧睿思通信息科技有限公司 | Criminal case anticipation system and its building and pre-judging method based on deep learning |
CN109948646A (en) * | 2019-01-24 | 2019-06-28 | 西安交通大学 | A kind of time series data method for measuring similarity and gauging system |
-
2020
- 2020-01-17 CN CN202010055473.7A patent/CN111309900B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102012918A (en) * | 2010-11-26 | 2011-04-13 | 中金金融认证中心有限公司 | System and method for excavating and executing rule |
CN106126695A (en) * | 2016-06-30 | 2016-11-16 | 张春生 | A kind of similar case search method and device |
CN106503470A (en) * | 2016-11-04 | 2017-03-15 | 中国科学技术大学 | A kind of time serieses distance metric method compared based on status switch |
CN108665182A (en) * | 2018-05-18 | 2018-10-16 | 中国科学技术大学 | A kind of patent action Risk Forecast Method |
CN109213864A (en) * | 2018-08-30 | 2019-01-15 | 广州慧睿思通信息科技有限公司 | Criminal case anticipation system and its building and pre-judging method based on deep learning |
CN109948646A (en) * | 2019-01-24 | 2019-06-28 | 西安交通大学 | A kind of time series data method for measuring similarity and gauging system |
Non-Patent Citations (2)
Title |
---|
Legal Information Retrieval: Evaluating Case-Based Reasoning;Symball Rufino de Oliveira,等;《2009 Seventh Brazilian Symposium in Information and Human Language Technology》;20100729;第167-170页 * |
面向刑事案件的精细分类与串并案分析技术研究;夏明;《中国优秀硕士学位论文全文数据库社会科学I辑》;20171115;第2-45页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111309900A (en) | 2020-06-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111309900B (en) | Legal class similarity judging and pushing method | |
CN108491377B (en) | E-commerce product comprehensive scoring method based on multi-dimensional information fusion | |
Haque et al. | Sentiment analysis on large scale Amazon product reviews | |
Juhász et al. | The who, what, when, and how of industrial policy: A text-based approach | |
Atoum et al. | Sentiment analysis of Arabic Jordanian dialect tweets | |
CN108388660B (en) | Improved E-commerce product pain point analysis method | |
JP4595692B2 (en) | Time-series document aggregation method and apparatus, program, and storage medium storing program | |
CN110928764A (en) | Automated mobile application crowdsourcing test report evaluation method and computer storage medium | |
CN109299252A (en) | The viewpoint polarity classification method and device of stock comment based on machine learning | |
CN117474507A (en) | Intelligent recruitment matching method and system based on big data application technology | |
Angelpreethi et al. | An enhanced architecture for feature based opinion mining from product reviews | |
Ellouze et al. | Automatic profile recognition of authors on social media based on hybrid approach | |
CN109035025A (en) | The method and apparatus for evaluating stock comment reliability | |
Ajhari | The Comparison of Sentiment Analysis of Moon Knight Movie Reviews between Multinomial Naive Bayes and Support Vector Machine | |
CN109300031A (en) | Data digging method and device based on stock comment data | |
CN116681383A (en) | Cultural enterprise portrayal method based on big data analysis | |
Indarta et al. | Aspect and opinion extraction of indonesian lipsticks product reviews using conditional random field (crf) | |
Kaur et al. | Gurmukhi Text Emotion Classification System using TF-IDF and N-gram Feature Set Reduced using APSO | |
Urkude et al. | Comparative analysis on machine learning techniques: a case study on Amazon product | |
KR20210001693A (en) | A rcording media for recording program for providing a corporate insolvencies information based on automatic sentiment information labelings | |
KR20210001686A (en) | A program for providing a corporate insolvencies information based on automatic sentiment information labelings | |
CN114519091B (en) | Personality trait analysis method and system based on shopping comments | |
Hawladar et al. | Amazon product reviews sentiment analysis using supervised learning algorithms | |
Takale et al. | Legal Data Assistive Tool Using Deep-Learning | |
CN117150245B (en) | Enterprise intelligent diagnosis information generation method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |