CN111309900B - A method for judging and pushing the similarity of similar legal cases - Google Patents
A method for judging and pushing the similarity of similar legal cases Download PDFInfo
- Publication number
- CN111309900B CN111309900B CN202010055473.7A CN202010055473A CN111309900B CN 111309900 B CN111309900 B CN 111309900B CN 202010055473 A CN202010055473 A CN 202010055473A CN 111309900 B CN111309900 B CN 111309900B
- Authority
- CN
- China
- Prior art keywords
- case
- distance
- historical
- legal
- event sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000005259 measurement Methods 0.000 claims abstract description 8
- 239000013598 vector Substances 0.000 claims description 29
- 238000004458 analytical method Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 4
- 238000007635 classification algorithm Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 claims description 3
- 238000012850 discrimination method Methods 0.000 claims description 3
- 238000003058 natural language processing Methods 0.000 claims description 3
- 241000590419 Polygonia interrogationis Species 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 abstract description 5
- 206010063385 Intellectualisation Diseases 0.000 abstract 1
- 238000013473 artificial intelligence Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Technology Law (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及法律智能领域,尤其涉及一种法律类案相似度判别及推送方法。The invention relates to the field of legal intelligence, in particular to a method for judging and pushing the similarity of legal similar cases.
背景技术Background technique
当前,人工智能理论和技术日益成熟,应用范围不断扩大。2017年,国家人工智能战略《新一代人工智能发展规划》中提出建设智慧法庭,促进人工智能在证据收集、案例分析、法律文件阅读与分析中的应用,实现法院审判体系和审判能力智能化。其中,通过人工智能技术实现类案类判已成为贴近法官需求的一项重要研究内容。At present, the theory and technology of artificial intelligence are becoming more and more mature, and the scope of application is constantly expanding. In 2017, the National Artificial Intelligence Strategy "New Generation Artificial Intelligence Development Plan" proposed to build a smart court, promote the application of artificial intelligence in evidence collection, case analysis, legal document reading and analysis, and realize the intelligentization of the court's trial system and trial capacity. Among them, the realization of similar cases and similar judgments through artificial intelligence technology has become an important research content that is close to the needs of judges.
类案类判作为一种辅助工具,目的为法官手头正在处理的案件寻找相似甚至相同的案件,以实现启发、拓展法官判案思路、帮助法官正确裁判,使相同或相似案件的判决结果能够有较小偏差。但是现有类案检索系统存在推送案例不精准,无法切实解决法官需要的问题。如推送案例并未做到“同案”,甚至不是“同类”;推送案件数量过高,并未真正节约法官时间,仍需大量人工筛选。As an auxiliary tool, the purpose of similar judgments is to find similar or even identical cases for the cases the judges are dealing with, so as to inspire and expand the judges' thinking in judging cases, help the judges make correct judgments, and make the judgment results of the same or similar cases more consistent. Minor deviation. However, the existing similar case retrieval system is inaccurate in pushing cases and cannot effectively solve the problem that judges need. For example, the pushed cases are not "same case" or even "similar"; the number of pushed cases is too high, which does not really save the judge's time, and still requires a lot of manual screening.
由于法律案件记录多为电子文书,其形式为自然语言表述的文本。由此可将类案识别视作文本相似性度量的一种应用场景。应用现有的自然语言处理方法,可在一定程度上实现类案识别,但尚难以准确辨别案件要素的核心区别点。主要问题如下:Since most legal case records are electronic documents, their form is text expressed in natural language. Therefore, class case recognition can be regarded as an application scenario of text similarity measurement. The application of existing natural language processing methods can realize the identification of similar cases to a certain extent, but it is still difficult to accurately identify the core distinguishing points of the elements of a case. The main issues are as follows:
1)基于关键词匹配的方式准确度不够。关键词检索实际上为“抽样验证”,其借助于少量样本得出的结论并不完备。同时,该方法得到的类案数量过多,使得法官难以甄别具有重要参考价值的案件。1) The method based on keyword matching is not accurate enough. Keyword search is actually "sampling verification", and the conclusions drawn with the help of a small number of samples are not complete. At the same time, the number of similar cases obtained by this method is too large, making it difficult for judges to identify cases with important reference value.
2)基于word2vec将词语表示为向量以此构建神经网络的方法需要大量标签化、结构化的训练语料,而当前法律领域缺少海量翔实的标签化法律数据,亦缺少既懂法律又懂技术的人才。2) The method of constructing a neural network by expressing words as vectors based on word2vec requires a large amount of labeled and structured training corpus, while the current legal field lacks massive and detailed labeled legal data, and lacks both legal and technical talents .
3)类案的主要参考价值在于针对案件中的若干法律细节或难点,推送相似历史案件中法官的判案思路与做法。但是目前未有针对法律行业特点设计的法律文书相似性度量模型。3) The main reference value of similar cases is to push forward the judgment ideas and practices of judges in similar historical cases based on some legal details or difficulties in the cases. However, there is no legal document similarity measurement model designed for the characteristics of the legal industry.
发明内容SUMMARY OF THE INVENTION
本发明的目的是提供一种法律类案相似度判别及推送方法,解决现有方法需大量手工标注及类案推送不准确、信息冗杂、缺乏法律问题针对性等缺点。The purpose of the present invention is to provide a method for judging and pushing the similarity of legal similar cases, which solves the shortcomings of the existing method, such as the need for a large number of manual annotations, inaccurate push of similar cases, redundant information, and lack of legal issues pertinence.
本发明的目的是通过以下技术方案实现的:The purpose of this invention is to realize through the following technical solutions:
一种法律类案相似度判别方法,包括:A method for judging similarity of similar legal cases, including:
对目标法律案件进行分类,根据得到的案件类别,从历史案件数据库提取相同类别的历史案件构成候选集;Classify the target legal case, according to the obtained case category, extract the historical cases of the same category from the historical case database to form a candidate set;
对目标法律案件与候选集中的每个同类历史案件,进行事件序列表示;Represent the event sequence for the target legal case and each similar historical case in the candidate set;
根据事件序列度量模型,计算目标法律案件对应的事件序列与候选集中每个历史案件对应的事件序列的距离;According to the event sequence measurement model, calculate the distance between the event sequence corresponding to the target legal case and the event sequence corresponding to each historical case in the candidate set;
基于事件序列的距离并结合打分函数,计算目标法律案件与候选集中历史案件的相似度。Based on the distance of the event sequence and combined with the scoring function, the similarity between the target legal case and the historical cases in the candidate set is calculated.
一种法律类案推送方法,包括:利用前述的方法计算目标法律案件与候选集中历史案件的相似度,再按照相似度分值从高到低的顺序对候选集中的历史案件进行排序,提取出排名靠前的M个历史案件进行推送。A method for pushing similar legal cases, comprising: using the aforementioned method to calculate the similarity between a target legal case and a historical case in a candidate set, and then sorting the historical cases in the candidate set in descending order of similarity scores, and extracting The top M historical cases are pushed.
由上述本发明提供的技术方案可以看出,通过对法律文书进行案件分类,并分析案件主题分布,可在保证案件类别相同的同时,选取与目标案件所描述的语义信息最为相似的历史案件,实现更加全面、准确的类案识别;同时,通过将法律文书表示为时序事件序列,并基于无监督方式进行相似性计算,选取分值较高的历史案件进行推送,大大减少了人力投入,可更好地实现推送智能化。It can be seen from the technical solution provided by the present invention that, by classifying legal documents and analyzing the distribution of case subjects, the historical cases that are most similar to the semantic information described in the target case can be selected while ensuring the same case category. Realize more comprehensive and accurate identification of similar cases; at the same time, by representing legal documents as time-series event sequences and performing similarity calculation based on an unsupervised method, historical cases with higher scores are selected for push, which greatly reduces manpower investment and can Better realization of push intelligence.
附图说明Description of drawings
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.
图1为本发明实施例提供的一种法律类案相似度判别方法的流程图;1 is a flowchart of a method for judging similarity of legal similar cases provided by an embodiment of the present invention;
图2为本发明实施例提供的通过抽取每个时间段对应的事件所表示的事件时间链示意图;2 is a schematic diagram of an event time chain represented by extracting events corresponding to each time period provided by an embodiment of the present invention;
图3为本发明实施例提供的具体案例抽取的事件时间链示意图;3 is a schematic diagram of an event time chain extracted from a specific case provided by an embodiment of the present invention;
图4为本发明实施例提供的对事件序列时间点进行对齐的示意图。FIG. 4 is a schematic diagram of aligning event sequence time points according to an embodiment of the present invention.
具体实施方式Detailed ways
下面结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明的保护范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present invention.
本发明实施例提供一种法律类案相似度判别方法,如图1所示,包括:An embodiment of the present invention provides a method for judging the similarity of legal similar cases, as shown in FIG. 1 , including:
步骤1、对目标法律案件进行分类,根据得到的案件类别,从历史案件数据库提取相同类别的历史案件构成候选集。Step 1. Classify the target legal case, and according to the obtained case category, extract historical cases of the same category from the historical case database to form a candidate set.
类案识别旨在寻找同类历史案件中与目标案件相似甚至相同的案件,因此首先要进行案件分类。Similar case identification aims to find similar or even identical cases to the target case in similar historical cases, so case classification should be carried out first.
现有历史案件数据库中每个案件均有对应的类别标签,可根据案件类别分布,对每个案件类别选取一定数量历史案件构建数据集。通过采用自然语言处理中比较成熟的深度学习文本分类算法作为分类器,得到目标案件对应的案件类别。根据得到的案件类别,抽取历史案件数据库中同类历史案件构成候选集。Each case in the existing historical case database has a corresponding category label. According to the case category distribution, a certain number of historical cases can be selected for each case category to construct a data set. By using a relatively mature deep learning text classification algorithm in natural language processing as a classifier, the case category corresponding to the target case is obtained. According to the obtained case category, similar historical cases in the historical case database are extracted to form a candidate set.
步骤2、对目标法律案件与候选集中的每个同类历史案件,进行事件序列表示。Step 2. Perform event sequence representation on the target legal case and each similar historical case in the candidate set.
本步骤针对法律案件特点设计文本表示模型,可有效表示事件之间的联系,呈现案件随时间的演变过程,具有更高的实际应用价值。本步骤的优选实施方式如下:In this step, a text representation model is designed according to the characteristics of legal cases, which can effectively represent the connection between events and present the evolution process of the case over time, which has higher practical application value. The preferred embodiment of this step is as follows:
1)对于任意法律案件,利用信息抽取技术对法律案件文书中的案件要素进行抽取,案件要素至少包括:被告人职位、被告人案发后态度、时间、核心词、以及涉案金额等。1) For any legal case, use information extraction technology to extract the case elements in the legal case documents. The case elements at least include: the defendant's position, the defendant's attitude after the case, time, core words, and the amount involved.
其中,核心词通过依存句法分析方法提取,简单来说,是将法律案件文书中的语句按照句号、感叹号及问号进行分句,如果分句后数目为n,则对n句话进行依存分析,得到n个核心词。Among them, the core words are extracted by the method of dependency syntax analysis. In short, the sentences in the legal case document are divided into sentences according to periods, exclamation marks and question marks. Get n core words.
依存句法分析方法是目前较为的成熟的技术,其原理是:通过词汇之间的依存关系表达整个句子结构,这些依存关系表达了句子各成分之间的语义依赖关系,所有词汇之间的依存关系构成一棵句法树,树的根节点为句子核心谓词,用来表达整个句子的核心内容。该核心谓语即为句子对应的核心词。Dependency syntactic analysis is a relatively mature technology at present. Its principle is: the entire sentence structure is expressed through the dependencies between words. These dependencies express the semantic dependencies between the components of the sentence and the dependencies between all words. A syntax tree is formed, and the root node of the tree is the core predicate of the sentence, which is used to express the core content of the entire sentence. The core predicate is the core word corresponding to the sentence.
2)独立保留被告人职位与被告人案发后态度两个案件要素。2) Independently retain the defendant's position and the defendant's attitude after the incident.
3)将剩余案件要素按照不同的时间节点定性进行事件之间的时序关系表示,对于发生时间无交叉、且无重叠的事件,视为独立事件分别表示,其余情况进行事件合并,以此组成案件情节事件链。3) The remaining case elements are qualitatively represented by the time sequence relationship between the events according to different time nodes. For the events with no overlap and no overlap in occurrence time, they are regarded as independent events, and the remaining cases are combined to form a case. Plot event chain.
本领域技术人员可以理解,法律案件文书中通常按照时间顺序进行描述,如后文给出的两个案例,不重叠的时间段,对应不同事件。事件首先通过时间定位,然后找到该时间内对应的事件描述语句。然后通过事件抽取技术(抽取各个案件要素)来将非结构化的事件描述语句结构化为各个案件要素的组合。Those skilled in the art can understand that legal case documents are usually described in chronological order. For example, two cases given below, non-overlapping time periods correspond to different events. The event is first located by time, and then the corresponding event description sentence within the time is found. Then, the unstructured event description sentence is structured into a combination of various case elements through event extraction technology (extracting each case element).
4)对时间i时发生事件的案件要素进行数值化表示,得到eventi=(ei1,ei2,…,ein),其中,n为时间i时发生事件的案件要素的个数。4) Numerically represent the case elements with events occurring at time i to obtain event i =(e i1 , e i2 ,..., e in ), where n is the number of case elements with events occurring at time i.
5)初始化权重向量weight=(w1,w2,…,wn),其中,将权重向量与eventi中每个元素对应相乘,得到时间i事件的最终表示vectori=(w1ei1,w2ei2,…,wnein);。5) Initialize the weight vector weight=(w 1 , w 2 , . . . , wn ), where, Multiply the weight vector with each element in event i correspondingly to obtain the final representation vector i of the event i at time i = (w 1 e i1 , w 2 e i2 , . . . , w n e in );.
本发明实施例中,权重向量为先验知识,通过经验预先初始化。在后续学习过程中,可视结果进行更新,具体更新方式可由用户根据情况或者经验自行选定。In the embodiment of the present invention, the weight vector is a priori knowledge and is pre-initialized through experience. In the follow-up learning process, the results can be updated, and the specific update method can be selected by the user according to the situation or experience.
6)将所有时间的事件表示连接起来,得到向量化表示的时序事件链EventSequence=(vector1,vector2,…,vectorm),其中vectori表示时间i时发生的事件对应的向量表示,m为从法律案件文书中提取的独立事件与合并事件的总个数。6) Connect the event representations of all times to obtain a vectorized sequential event chain EventSequence=(vector 1 , vector 2 , ..., vector m ), where vector i represents the vector representation corresponding to the event that occurs at time i, m is the total number of independent events and combined events extracted from legal case documents.
步骤3、根据事件序列度量模型,计算目标法律案件对应的事件序列与候选集中每个历史案件对应的事件序列的距离。Step 3: Calculate the distance between the event sequence corresponding to the target legal case and the event sequence corresponding to each historical case in the candidate set according to the event sequence measurement model.
根据步骤2得到的事件序列进行序列距离度量。在衡量事件序列相似性时,存在待比对序列时间长度不一致的情况,因此需要对序列进行对齐。采用动态时间规整DTW方法,将目标法律案件对应的事件序列与历史案件对应的每个事件序列,从起点开始匹配,每到一个点,计算对应两点之间的距离,并累加之前通过的所有点的距离,最后选取最小的累加距离作为事件序列距离DistanceEventSequence。该方法能够通过寻找点点之间的对应关系,最大程度降低两个序列距离的点到点的匹配。Sequence distance measurement is performed according to the event sequence obtained in step 2. When measuring the sequence similarity of events, there are cases where the time lengths of the sequences to be compared are inconsistent, so the sequences need to be aligned. Using the dynamic time warping DTW method, the event sequence corresponding to the target legal case is matched with each event sequence corresponding to the historical case, starting from the starting point, and each time a point is reached, the distance between the corresponding two points is calculated, and all previously passed through are accumulated. Point distance, and finally select the smallest accumulated distance as the event sequence distance Distance EventSequence . This method can minimize the point-to-point matching of the distance between the two sequences by finding the correspondence between the points.
步骤4、基于事件序列的距离并结合打分函数,计算目标法律案件与候选集中历史案件的相似度。Step 4: Calculate the similarity between the target legal case and the historical case in the candidate set based on the distance of the event sequence and combining with the scoring function.
本发明实施例中,考虑主题相似度、步骤3获得的事件序列的距离、以及步骤2获得的被告人职位、被告人案发后态度的相似度,主要如下:In the embodiment of the present invention, considering the similarity of the subject, the distance of the event sequence obtained in step 3, and the similarity of the position of the defendant obtained in step 2 and the attitude of the defendant after the incident, the main factors are as follows:
1)通过主题分析模型,对目标法律案件与候选集中历史案件进行主题分析,得到相应的主题概率分布,根据主题概率分布,计算目标法律案件与候选集中历史案件的语义距离作为主题相似度Distancetopic。1) Through the topic analysis model, perform topic analysis on the target legal case and the historical cases in the candidate set, and obtain the corresponding topic probability distribution. According to the topic probability distribution, calculate the semantic distance between the target legal case and the historical cases in the candidate set as the topic similarity Distance topic .
2)进行事件序列表示时,从目标法律案件与候选集中历史案件中各自提取了被告人职位与被告人案发后态度两个案件要素,利用余弦距离计算目标法律案件与候选集中历史案件中,被告人职位、被告人案发后态度的相似度,分别记为Distanceposition、Distanceattitude。2) In the event sequence representation, the two elements of the defendant's position and the defendant's attitude after the incident were extracted from the target legal case and the historical cases in the candidate set, respectively, and the cosine distance was used to calculate the target legal case and the historical cases in the candidate set. The similarity of the defendant's position and the defendant's attitude after the incident was recorded as Distance position and Distance attitude respectively.
3)目标法律案件与候选集中历史案件的事件序列距离记为DistanceEventSequence。3) The event sequence distance between the target legal case and the historical cases in the candidate set is recorded as Distance EventSequence .
4)利用下述公式计算目标法律案件与候选集中历史案件的相似度:4) Calculate the similarity between the target legal case and the historical cases in the candidate set using the following formula:
score=α1Distancetopic+α2DistanceEventSequence +α3Distanceposition+α4Distanceattitude score=α 1 Distance topic +α 2 Distance EventSequence +α 3 Distance position +α 4 Distance attitude
其中,α1、α2、α3及α4均为权重,α1+α2+α3+α4=1。Among them, α 1 , α 2 , α 3 and α 4 are all weights, and α 1 +α 2 +α 3 +α 4 =1.
为了便于理解,下面结合具体的示例进行介绍,下述示例中所涉及的案件类型、案件信息等均为举例。For ease of understanding, the following description is given in conjunction with specific examples, and the case types and case information involved in the following examples are all examples.
案例一:被告人孙某在担任A公司业务员期间,多次挪用A公司的应收工程款归个人使用。具体事实分述如下:1、被告人孙某于2017年4月至7月期间,收取吴某交纳的B1小区工程款9.6万元后,交给A公司6.5万元,将其余3.1万元归个人使用。2、被告人孙某于 2017年10月期间,收取王某交纳的B2小区工程款5.8万元后未交给A公司,归个人使用。 3、被告人孙某于2017年12月期间,收取周某交纳的B3小区工程款11.3万元后未交给A公司,归个人使用。案发后,被告人孙某如实供述了犯罪事实,退还了被害单位(即A公司)人民币10万元并取得谅解。Case 1: During the period when the defendant, Mr. Sun, was a salesperson of Company A, he repeatedly misappropriated the project receivables of Company A for personal use. The specific facts are as follows: 1. During the period from April to July 2017, the defendant, Mr. Sun, received 96,000 yuan for the B1 community project paid by Wu, and then handed over 65,000 yuan to Company A, and returned the remaining 31,000 yuan to the company. personal use. 2. During October 2017, the defendant, Mr. Sun, received 58,000 yuan for the B2 community project paid by Mr. Wang, but did not hand it over to Company A for personal use. 3. During December 2017, the defendant Sun received 113,000 yuan for the B3 community project paid by Zhou, but did not hand it over to Company A for personal use. After the incident, the defendant Sun Mou truthfully confessed the facts of the crime, returned RMB 100,000 to the victim unit (namely Company A) and obtained an understanding.
案例二:被告人叶某担任某汽车销售服务有限公司(以下简称为C公司)销售顾问期间,利用职务便利,多次私自将公司车辆卖给客户,并将部分车款归自己使用未上交给C公司。具体犯罪事实如下:1、2017年11月15日,被告人叶某收取客户李某人民币130500 元购车款后归自己使用。2、2018年7月31日,被告人叶某收取客户周某1人民币52678元购车款后归自己使用。3、2018年8月4日,被告人叶某收取客户代某人民币20000元购车定金后归自己使用。案发后,被告人叶某已退赔C公司全部经济损失,并取得公司的书面谅解。Case 2: During the period when the defendant Ye was a sales consultant of an automobile sales service company (hereinafter referred to as Company C), he took advantage of his position to sell the company's vehicles to customers without permission for many times, and used part of the car money for himself without handing it over. to Company C. The specific criminal facts are as follows: 1. On November 15, 2017, the defendant Ye received a car payment of RMB 130,500 from the customer Li for his own use. 2. On July 31, 2018, the defendant Ye received the car payment of RMB 52,678 from the customer Zhou for his own use. 3. On August 4, 2018, the defendant Ye received a deposit of RMB 20,000 from the customer to purchase a car on his behalf and used it for himself. After the incident, the defendant, Ye Mou, has refunded all the economic losses of Company C and obtained a written understanding from the company.
参见前文提供的方式:See the method provided above:
步骤1、首先进行案件分类。选取采用深度学习中文本分类算法,如BERT、FastText、DPCNN等作为分类器,对案例一进行事件分类,得到类别为“挪用资金罪”。然后从历史案件数据库中选取所有“挪用资金罪”案件构成候选集,并依次抽取候选集中的案件与案例一进行相似度比较,以候选集中案例二为例。Step 1. First, classify the cases. Select the text classification algorithm in deep learning, such as BERT, FastText, DPCNN, etc., as the classifier, classify the events of Case 1, and get the category as "crime of misappropriation of funds". Then, select all "crime of misappropriation of funds" cases from the historical case database to form a candidate set, and sequentially select the cases in the candidate set to compare the similarity with case 1, taking case 2 in the candidate set as an example.
步骤2、通过以下方法进行事件序列表示:Step 2. Represent the event sequence by the following methods:
1)采用信息抽取方法,对案件要素进行结构化提取。得到案例一被告人职位为“业务员”,案发后态度“退还”;案例二被告人职位“销售顾问”,案发后态度“退赔”。各案例对应的时间、核心词、涉案金额及案发目的:案例一为:2017年4月至7月:收取、工程款、 31000元、个人使用,2017年10月:收取、工程款、58000元、个人使用,2017年12 月:收取、工程款、113000元、个人使用;案例二为:2017年11月15日:收取、购车款、130500元、自己使用,2018年7月31日:收取、购车款、52678元、自己使用, 2018年8月4日:收取、购车定金、20000元、自己使用。1) Use the information extraction method to extract the case elements in a structured manner. It was obtained that the defendant in case 1 was "salesman", and his attitude was "refunded" after the incident; the defendant's position in case 2 was "sales consultant", and his attitude was "refunded" after the incident. Corresponding time, key words, amount involved and purpose of the case: Case 1: April to July 2017: collection, project payment, 31,000 yuan, personal use, October 2017: collection, project payment, 58,000 RMB, personal use, December 2017: collection, project payment, 113,000 yuan, personal use; case 2: November 15, 2017: collection, car purchase, 130,500 yuan, own use, July 31, 2018: Collection, car purchase, 52,678 yuan, own use, August 4, 2018: collection, car purchase deposit, 20,000 yuan, own use.
2)根据抽取结果,独立保留案例一被告人职位为“业务员”,案发后态度“退还”;案例二被告人职位“销售顾问”,案发后态度“退赔”两要素。2) According to the extraction results, the position of the defendant in Case 1 was independently reserved as "salesman", and the attitude of the defendant was "refunded" after the incident; the position of the defendant in Case 2 was "sales consultant", and the attitude of "refund" after the incident occurred.
3)如图2所示,按照时间定性关系对时序案件要素进行组织。对于发生时间无交叉、无重叠的事件,视为独立事件分别表示,其余情况进行事件合并,得到图3所示事件链,包含时间、触发词、涉案动机及金额等案件要素。3) As shown in Figure 2, the time series case elements are organized according to the temporal qualitative relationship. Events with no overlap or overlap in occurrence time are regarded as independent events and represented separately, and events are combined for other cases to obtain the event chain shown in Figure 3, including case elements such as time, trigger words, motives involved, and amount of money.
4)采用gensim中word2vec训练词向量方法将事件链中的案件要素数值化。4) Use the word2vec training word vector method in gensim to quantify the case elements in the event chain.
5)根据权重进行向量计算。如案例一2017年10月事件event2017年10月= (收取,工程款,58000,自己使用),假设初始化权重为weight=(0.3,0.2,0.2,0.3),则计算得到事件event2017年10月的向量化表示为 vector2017年10月=(0.3e收取,0.2e工程款,0.2e58000,0.3e自己使用),e要素1代表案件要素1向量值。5) Perform vector calculation according to the weight. For example, the event event in October 2017 in case 1 = (collection, project payment, 58000, self-use), assuming that the initialization weight is weight=(0.3, 0.2, 0.2, 0.3), then the event event is calculated as 10 in 2017 The vectorization of month is expressed as vector October 2017 = (0.3e for collection , 0.2e for project payment , 0.2e for 58000 , 0.3e for own use ), e element 1 represents the vector value of case element 1.
6)将所有时间的事件表示连接起来,即得到向量化表示的时序事件链。6) Connect the event representations at all times, that is, to obtain a vectorized sequential event chain.
步骤3、步骤2得到的事件序列进行序列距离度量。采用如图4所示动态时间规整方法,对案例一对应的事件序列EventSequence案例一=(vector2017年4月至7月,vector2017年10月,vector2017年12月)和案例二对应的事件序列 EventSequence案例二=(vector2017年11月15日,vector2018年7月31日,vector2018年8月4日)进行时间点对齐与距离计算,则案例一中vector2017年4月至7月与案例二中vector2017年11月15日对应,依次类推。得到事件序列距离计算结果DistanceEventSequence。The event sequence obtained in step 3 and step 2 is subjected to sequence distance measurement. Using the dynamic time warping method shown in Figure 4, for the event sequence EventSequence corresponding to Case 1, Case 1 = (vector April to July 2017 , vector October 2017 , vector December 2017 ) and events corresponding to Case 2 Sequence EventSequence case 2 = (vector on November 15, 2017 , vector on July 31, 2018, vector on August 4, 2018 ) for time point alignment and distance calculation, then in case one, the vector is from April to July 2017 Corresponds to the vector in case 2 on November 15, 2017 , and so on. Get the distance calculation result of the event sequence Distance EventSequence .
步骤4、通过以下方法对序列进行整体相似性打分:Step 4. Score the overall similarity of the sequences by the following methods:
1)计算两案件主题相似度Distancetopic。利用gensim中的LDA主题分析模型,输入案例一、案例二法律文书,得到相应的主题分布。根据主题概率分布,通过距离公式KL 距离计算出两篇法律文书的语义距离即主题相似度Distancetopic;1) Calculate the distance topic similarity between the two cases. Using the LDA topic analysis model in gensim, input the legal documents of Case 1 and Case 2 to obtain the corresponding topic distribution. According to the topic probability distribution, the semantic distance of the two legal documents is calculated by the distance formula KL distance, that is, the topic similarity Distance topic ;
2)利用余弦距离计算两案件职位“业务员”与“销售顾问”相似度Distanceposition和案发后态度“退还”和“退赔”相似度Distanceattitude;2) Use the cosine distance to calculate the similarity Distance position of the positions "salesperson" and "sales consultant" in the two cases, and the similarities Distance attitude of "refund" and "refund" after the case;
3)综合案件的主题相似度、事件序列距离、职位及案发后态度相似度进行打分3) Scoring based on the subject similarity, event sequence distance, position and attitude similarity after the incident of the case
score=α1Distancetopic+α2DistanceEventSequence +α3Distanceposition+α4Distanceattitude。score=α 1 Distance topic +α 2 Distance EventSequence +α 3 Distance position +α 4 Distance attitude .
本发明另一实施例还提供一种法律类案推送方法,该方法利用前述实施例提供的相似度判别方法计算出目标法律案件与候选集中历史案件的相似度,再按照相似度分值从高到低的顺序对候选集中的历史案件进行排序,提取出排名靠前的M个历史案件进行推送。其中,M的取值可根据实际情况自行设定,例如,M=10。Another embodiment of the present invention also provides a method for pushing legal similar cases, which calculates the similarity between the target legal case and the historical cases in the candidate set by using the similarity discrimination method provided in the foregoing embodiment, and then calculates the similarity according to the similarity score from high to high. Sort the historical cases in the candidate set in the lowest order, extract the top M historical cases and push them. The value of M can be set according to the actual situation, for example, M=10.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例可以通过软件实现,也可以借助软件加必要的通用硬件平台的方式来实现。基于这样的理解,上述实施例的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the above embodiments can be implemented by software or by means of software plus a necessary general hardware platform. Based on this understanding, the technical solutions of the above embodiments may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.), including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments of the present invention.
以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明披露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求书的保护范围为准。The above description is only a preferred embodiment of the present invention, but the protection scope of the present invention is not limited to this. Substitutions should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010055473.7A CN111309900B (en) | 2020-01-17 | 2020-01-17 | A method for judging and pushing the similarity of similar legal cases |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010055473.7A CN111309900B (en) | 2020-01-17 | 2020-01-17 | A method for judging and pushing the similarity of similar legal cases |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111309900A CN111309900A (en) | 2020-06-19 |
CN111309900B true CN111309900B (en) | 2022-09-06 |
Family
ID=71159856
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010055473.7A Active CN111309900B (en) | 2020-01-17 | 2020-01-17 | A method for judging and pushing the similarity of similar legal cases |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111309900B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019655A (en) * | 2017-07-21 | 2019-07-16 | 北京国双科技有限公司 | Precedent case acquisition methods and device |
CN111797247B (en) * | 2020-09-10 | 2020-12-22 | 平安国际智慧城市科技股份有限公司 | Case pushing method and device based on artificial intelligence, electronic equipment and medium |
CN113806590A (en) * | 2021-09-27 | 2021-12-17 | 北京市律典通科技有限公司 | Intelligent criminal case data pushing method and system |
CN115146065A (en) * | 2022-09-02 | 2022-10-04 | 安徽商信政通信息技术股份有限公司 | Intelligent information reporting similar content merging method and system |
CN115878815B (en) * | 2022-11-29 | 2023-07-18 | 深圳擎盾信息科技有限公司 | Legal document judgment result prediction method, legal document judgment result prediction device and storage medium |
CN118839995A (en) * | 2024-04-17 | 2024-10-25 | 大律云(北京)科技有限公司 | Legal case processing method and system based on big data and deep learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102012918A (en) * | 2010-11-26 | 2011-04-13 | 中金金融认证中心有限公司 | System and method for excavating and executing rule |
CN106126695A (en) * | 2016-06-30 | 2016-11-16 | 张春生 | A kind of similar case search method and device |
CN106503470A (en) * | 2016-11-04 | 2017-03-15 | 中国科学技术大学 | A kind of time serieses distance metric method compared based on status switch |
CN108665182A (en) * | 2018-05-18 | 2018-10-16 | 中国科学技术大学 | A kind of patent action Risk Forecast Method |
CN109213864A (en) * | 2018-08-30 | 2019-01-15 | 广州慧睿思通信息科技有限公司 | Criminal case anticipation system and its building and pre-judging method based on deep learning |
CN109948646A (en) * | 2019-01-24 | 2019-06-28 | 西安交通大学 | A time series data similarity measurement method and measurement system |
-
2020
- 2020-01-17 CN CN202010055473.7A patent/CN111309900B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102012918A (en) * | 2010-11-26 | 2011-04-13 | 中金金融认证中心有限公司 | System and method for excavating and executing rule |
CN106126695A (en) * | 2016-06-30 | 2016-11-16 | 张春生 | A kind of similar case search method and device |
CN106503470A (en) * | 2016-11-04 | 2017-03-15 | 中国科学技术大学 | A kind of time serieses distance metric method compared based on status switch |
CN108665182A (en) * | 2018-05-18 | 2018-10-16 | 中国科学技术大学 | A kind of patent action Risk Forecast Method |
CN109213864A (en) * | 2018-08-30 | 2019-01-15 | 广州慧睿思通信息科技有限公司 | Criminal case anticipation system and its building and pre-judging method based on deep learning |
CN109948646A (en) * | 2019-01-24 | 2019-06-28 | 西安交通大学 | A time series data similarity measurement method and measurement system |
Non-Patent Citations (2)
Title |
---|
Legal Information Retrieval: Evaluating Case-Based Reasoning;Symball Rufino de Oliveira,等;《2009 Seventh Brazilian Symposium in Information and Human Language Technology》;20100729;第167-170页 * |
面向刑事案件的精细分类与串并案分析技术研究;夏明;《中国优秀硕士学位论文全文数据库社会科学I辑》;20171115;第2-45页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111309900A (en) | 2020-06-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111309900B (en) | A method for judging and pushing the similarity of similar legal cases | |
WO2022110637A1 (en) | Question and answer dialog evaluation method and apparatus, device, and storage medium | |
CN110928764B (en) | Automated evaluation method for crowdsourcing test report of mobile application and computer storage medium | |
TWI554896B (en) | Information Classification Method and Information Classification System Based on Product Identification | |
CN107977798A (en) | A kind of risk evaluating method of e-commerce product quality | |
CN103605665A (en) | Keyword based evaluation expert intelligent search and recommendation method | |
CN108388660A (en) | A kind of improved electric business product pain spot analysis method | |
CN104778186A (en) | Method and system for hanging commodity object to standard product unit (SPU) | |
CN112990973B (en) | Online shop portrait construction method and system | |
CN112365372B (en) | Quality detection and evaluation method and system for referee document | |
Chi et al. | Establish a patent risk prediction model for emerging technologies using deep learning and data augmentation | |
CN114942974A (en) | E-commerce platform commodity user evaluation emotional tendency classification method | |
TWI477987B (en) | News text sentiment orientation analysis method | |
CN104391852A (en) | Method and device for establishing keyword word bank | |
CN116342167A (en) | Intelligent cost measurement method and device based on sequence labeling named entity recognition | |
CN110955767A (en) | Algorithm and device for generating intention candidate set list set in robot dialogue system | |
Reddy et al. | Prediction of star ratings from online reviews | |
CN118313727A (en) | Enterprise comprehensive assessment method and system based on large language model technology | |
CN111427880A (en) | Data processing method, device, computing equipment and medium | |
CN109635289A (en) | Entry classification method and audit information abstracting method | |
CN107609921A (en) | A kind of data processing method and server | |
CN115544211A (en) | A method for foreign trade legal index and industry risk assessment | |
CN113239277A (en) | Probability matrix decomposition recommendation method based on user comments | |
CN112016323A (en) | Automatic extraction method of technical phrases in patent | |
Desai et al. | Analysis of Health Care Data Using Natural Language Processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |