CN111753059A - Neural Embedding-based intelligent analysis method for judicial cases - Google Patents
Neural Embedding-based intelligent analysis method for judicial cases Download PDFInfo
- Publication number
- CN111753059A CN111753059A CN202010626120.8A CN202010626120A CN111753059A CN 111753059 A CN111753059 A CN 111753059A CN 202010626120 A CN202010626120 A CN 202010626120A CN 111753059 A CN111753059 A CN 111753059A
- Authority
- CN
- China
- Prior art keywords
- entity
- case
- document
- criminal
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 14
- 230000001537 neural effect Effects 0.000 title claims abstract description 13
- 239000013598 vector Substances 0.000 claims abstract description 42
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 238000000034 method Methods 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 4
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 claims description 3
- 239000008280 blood Substances 0.000 claims description 3
- 210000004369 blood Anatomy 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000007689 inspection Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Tourism & Hospitality (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Technology Law (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Alarm Systems (AREA)
Abstract
A Neural Embedding-based intelligent analysis method for judicial cases adopts collecting criminal law documents and preprocessing case fact description documents; taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; and calculating the similarity of the document entity and the tag entity for the new data by using the trained Embedding vector, thereby making case judgment. The invention assists judges the criminals and lawyer professionals in terms of criminal names, recommends related law rules and judges the law in terms of criminal period, thereby saving the time and labor cost for case examination, improving the accuracy of judge law, and enabling the criminals and lawyer professionals to finish judge law more efficiently.
Description
Technical Field
The invention relates to the technical field of intelligent judicial judgment assistance, in particular to a Neural Embedding-based intelligent analysis method for judicial cases.
Background
The traditional case examination and judicial judgment rely on the professional solution and debate process of judges, lawyers and persons related to the inspection and official laws. In the face of complicated criminal cases, a large amount of manpower and material resources are needed for examination. And for the same case, different law practitioners have great difference and strong controversy on the criminal result of the case, and professionals who deal with the case have strong subjectivity on the case. Therefore, an intelligent judicial judgment method based on embedding is designed, and the judicial officers and lawyer professionals are assisted to judge the names of the criminals, recommend related legal regulations and judge the criminal period, so that the time and labor cost for examining the cases are saved, the accuracy of judicial judgment is improved, and the officers and the lawyer professionals can finish the legal judgment more efficiently.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a Neural Embedding-based intelligent analysis method for judicial cases, and the method adopts the steps of collecting criminal legal documents and preprocessing case situation fact description documents; taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; calculating the similarity of the document entity and the tag entity for the new data by using the trained Embedding vector, thereby making case judgment; the judicial officer and lawyer professional are assisted to judge the names of the criminals, recommend related legal rules and judge the judicial judgment of the criminal period, so that the time and labor cost for case examination are saved, the accuracy of judicial judgment is improved, and the judicial officer and lawyer can finish the legal judgment more efficiently.
The technical scheme adopted by the invention for solving the problems is as follows:
a Neural Embedding-based intelligent analysis method for judicial cases comprises the following steps,
s1: collecting criminal law documents and preprocessing case fact description documents;
s2: taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode;
s3: and calculating the similarity between the document entity and the label entity for the new data by using the Embedding vector obtained by training in the S2, and making case judgment by professionals such as judges, lawyers and the like according to the similarity.
Further, the preprocessing of the case fact description document in the step S1 includes the following sub-steps,
s101: filtering stop words and special symbols in case fact description documents, reserving Chinese character numbers and punctuation mark information, and segmenting words of texts;
s102: and discretizing the criminal sensitive word number in the case fact description text.
Further, the criminal-sensitive words described in step S102 include three types of money, weight, and blood alcohol concentration.
A specific method of obtaining the Embedding vector in step S2 includes the following steps,
s201: firstly, acquiring a vector of a stable entity a by adopting a characteristic summation mode;
s202: then, mapping out a vector of a tag entity b consistent with the dimension of the document entity;
s203: and finally, establishing a loss function for the vector of the document entity a and the vector of the label entity b to obtain an Embedding vector.
S201 comprises the following specific steps of using N-Gram in a word bag form as the characteristics of case description documents, wherein each document is composed of a group of documents fromDescribing discrete features of a fixed-length dictionary, assigning a D-dimensional vector to each feature, and then summing the features as vectors of document entities, assuming that the features of dictionary D are defined as a matrix F of D × D, where FiIndex iththCharacteristic, ∑i∈aFiAs a vector for document entity a;
s202 comprises the following specific steps of taking all discrete values of crime names and criminal period long and short elements of the law clauses, the judged notifier as different label entities b, and mapping all the label entities into different vectors with the same dimension as the document entities;
the specific steps of S203 are as follows, a loss function is established for the document entity a and the tag entity b, see formula (1):
this formula consists of the following parts:
1) generating a positive case entity pair (a, b) from the data, wherein a is a document entity described by case facts, and b is a label entity corresponding to the document entity a, such as a french article corresponding to the document entity a, a crime name of a judged notifier and a criminal period long and short label;
2) obtaining negative example entity b from random sampling in label not belonging to document entity a-Forming negative case entity pair (a, b)-);
3) Calculating the similarity of different entity pairs by using the cosine similarity;
4) loss function LbatchCalculating the loss by comparing the difference between the positive and negative example entity pairs; here, margin ranking loss is used as a loss function, see equation 2:
wherein B is the batch size, N is the number of negative cases, pbRepresents a positive case entity pair, pnRepresenting negative example entity pairs, the score function is a similarity function of the entity pairs, the loss function is optimized by random gradient descent,so that entity vectors with correlation have higher similarity.
In conclusion, the beneficial effects of the invention are as follows:
the method collects criminal law documents and preprocesses case fact description documents; taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; calculating the similarity of the document entity and the tag entity for the new data by using the trained Embedding vector, thereby making case judgment; the judicial officer and lawyer professional are assisted to judge the names of the criminals, recommend related legal rules and judge the judicial judgment of the criminal period, so that the time and labor cost for case examination are saved, the accuracy of judicial judgment is improved, and the judicial officer and lawyer can finish the legal judgment more efficiently.
Drawings
FIG. 1 is a schematic flow chart of the steps of the present invention.
Detailed Description
In order to solve the problem that the traditional case examination and judicial judgment in the prior art depends on professional answers and debate processes of judges, lawyers and related persons of the inspection official laws, in the face of complicated criminal cases, a large amount of manpower and material resources are needed for examination, different law practitioners have great difference and strong dispute on the criminal results of the cases for the same case, and the case-handling professional persons have strong subjectivity on the case; the method collects criminal law documents and preprocesses case fact description documents; taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; calculating the similarity of the document entity and the tag entity for the new data by using the trained Embedding vector, thereby making case judgment; the judicial officer and lawyer professional are assisted to judge the names of the criminals, recommend related legal rules and judge the judicial judgment of the criminal period, so that the time and labor cost for case examination are saved, the accuracy of judicial judgment is improved, and the judicial officer and lawyer can finish the legal judgment more efficiently. The present invention will be described in further detail with reference to the following examples and accompanying drawings, but the embodiments of the present invention are not limited thereto, and the drawings are only an example of the application of the present invention and do not essentially restrict the principle of the present invention.
Example (b):
as shown in fig. 1, a Neural Embedding-based intelligent analysis method for judicial cases includes the following steps,
s1: collecting criminal law documents and preprocessing case fact description documents; each case consists of case description and fact parts in a legal document and also comprises the legal provision related to each case, the name of the crime judged by the notifier, the duration of the criminal period and other elements;
s2: taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; because there is certain correlation among the law strip, the crime name of the judged notifier and the criminal phase, the three elements are combined with the case fact description text for combined training to obtain better effect, that is, all the document entities and the label entities are embedded into the same space, so that the distance between the entity vectors with the correlation is closer;
s3: and calculating the similarity between the document entity and the label entity for the new data by using the Embedding vector obtained by training in the S2, and making case judgment by professionals such as judges, lawyers and the like according to the similarity.
Further in this embodiment, the preprocessing of the case fact description document in step S1 includes the following sub-steps,
s101: filtering stop words and special symbols in case fact description documents, reserving Chinese character numbers and punctuation mark information, and segmenting words of texts;
s102: and discretizing the criminal sensitive word number in the case fact description text.
Further in the present embodiment, the criminal-sensitive words described in step S102 include three types of money, weight, and blood alcohol concentration.
Further in this embodiment, step S2 further includes the following sub-steps,
s201, using N-Gram to describe characteristics of documents in a word bag mode, wherein each document is described by a group of discrete characteristics from a fixed-length dictionary, allocating a D-dimensional vector to each characteristic, then summing the characteristics to be used as a vector of document entities, and assuming that the characteristics of a dictionary D are defined as a matrix F of D × D, wherein F isiIndex iththCharacteristic, ∑i∈aFiAs a vector for document entity a;
s202: taking all discrete values of the criminal names and the long and short criminal periods of the law clauses and the referees as different label entities b, and mapping all the label entities into different vectors with the same dimension as the document entities;
s203, establishing a loss function for the document entity a and the label entity b, and obtaining a result according to the formula (1):
this formula consists of the following parts:
1) generating a positive case entity pair (a, b) from the data, wherein a is a document entity described by case facts, and b is a label entity corresponding to the document entity a, such as a french article corresponding to the document entity a, a crime name of a judged notifier and a criminal period long and short label;
2) obtaining negative example entity b from random sampling in label not belonging to document entity a-Forming negative case entity pair (a, b)-);
3) Calculating the similarity of different entity pairs by using the cosine similarity;
4) loss function LbatchCalculating the loss by comparing the difference between the positive and negative example entity pairs; here, margin ranking loss is used as a loss function, see equation 2:
wherein B is the batch size, N is the number of negative cases, pbRepresents a positive case entity pair, pnAnd representing negative example entity pairs, wherein the score function is a similarity function of the entity pairs, and the loss function is optimized through random gradient descent, so that the entity vectors with correlation have higher similarity.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.
Claims (7)
1. A Neural Embedding-based intelligent analysis method for judicial cases is characterized by comprising the following steps,
s1: collecting criminal law documents and preprocessing case fact description documents;
s2: taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode;
s3: and calculating the similarity of the document entity and the label entity for the new data by using the Embedding vector obtained by training in the S2.
2. A Neural Embedding-based intelligent analysis method for judicial cases according to claim 1, wherein the preprocessing of case fact description documents in step S1 includes the following sub-steps,
s101: filtering stop words and special symbols in case fact description documents, reserving Chinese character numbers and punctuation mark information, and segmenting words of texts;
s102: and discretizing the criminal sensitive word number in the case fact description text.
3. The intelligent forensic case-based analysis method according to claim 2 in which the criminal sensitive words in step S102 comprise three types of money, weight and blood alcohol concentration.
4. The judicial case based Neural Embedding intelligent analysis method according to claim 1, wherein the specific method for obtaining the Embedding vector in step S2 comprises the following steps,
s201: firstly, acquiring a vector of a stable entity a by adopting a characteristic summation mode;
s202: then, mapping out a vector of a tag entity b consistent with the dimension of the document entity;
s203: and finally, establishing a loss function for the vector of the document entity a and the vector of the label entity b to obtain an Embedding vector.
5. The judicial case based Neural Embedding intelligent analysis method according to claim 4, wherein the specific steps of S201 are as follows,
using N-Gram in bag of words as a feature for case describing documents, each document being described by a set of discrete features from a fixed length dictionary, assigning a D-dimensional vector to each feature, and then summing the features as a vector of document entities, assuming that the features of dictionary D are defined as a matrix F of D × D, where F isiIndex iththCharacteristic, ∑i∈aFiAs a vector of document entity a.
6. The judicial case based Neural Embedding intelligent analysis method according to claim 4, wherein the specific steps of S202 are as follows,
all discrete values of the criminal names and the long and short criminal period elements of the law clauses, the judged notifiers are used as different label entities b, and all the label entities are mapped into different vectors with the same dimension as the document entities.
7. The judicial case based Neural Embedding intelligent analysis method according to claim 4, wherein the specific steps of S203 are as follows,
establishing a loss function for the document entity a and the tag entity b, and obtaining a loss function according to formula (1):
this formula consists of the following parts:
1) generating a positive case entity pair (a, b) from the data, wherein a is a document entity described by case facts, and b is a label entity corresponding to the document entity a, such as a french article corresponding to the document entity a, a crime name of a judged notifier and a criminal period long and short label;
2) obtaining negative example entity b from random sampling in label not belonging to document entity a-Forming negative case entity pair (a, b)-);
3) Calculating the similarity of different entity pairs by using the cosine similarity;
4) loss function LbatchCalculating the loss by comparing the difference between the positive and negative example entity pairs; here, margin ranking loss is used as a loss function, see equation 2:
wherein B is the batch size, N is the number of negative cases, pbRepresents a positive case entity pair, pnRepresenting negative example entity pairs, score function being a similarity function of the entity pairs, and loss function being optimized by random gradient descent to haveCorrelated entity vectors have higher similarity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010626120.8A CN111753059A (en) | 2020-07-02 | 2020-07-02 | Neural Embedding-based intelligent analysis method for judicial cases |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010626120.8A CN111753059A (en) | 2020-07-02 | 2020-07-02 | Neural Embedding-based intelligent analysis method for judicial cases |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111753059A true CN111753059A (en) | 2020-10-09 |
Family
ID=72678586
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010626120.8A Pending CN111753059A (en) | 2020-07-02 | 2020-07-02 | Neural Embedding-based intelligent analysis method for judicial cases |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111753059A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033176A (en) * | 2021-05-19 | 2021-06-25 | 苏州黑云智能科技有限公司 | Court case judgment prediction method |
CN113204567A (en) * | 2021-05-31 | 2021-08-03 | 山东政法学院司法鉴定中心 | Big data judicial case analysis and processing system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109241285A (en) * | 2018-08-29 | 2019-01-18 | 东南大学 | A kind of device of the judicial decision in a case of auxiliary based on machine learning |
CN110858269A (en) * | 2018-08-09 | 2020-03-03 | 清华大学 | Criminal name prediction method and device |
-
2020
- 2020-07-02 CN CN202010626120.8A patent/CN111753059A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110858269A (en) * | 2018-08-09 | 2020-03-03 | 清华大学 | Criminal name prediction method and device |
CN109241285A (en) * | 2018-08-29 | 2019-01-18 | 东南大学 | A kind of device of the judicial decision in a case of auxiliary based on machine learning |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033176A (en) * | 2021-05-19 | 2021-06-25 | 苏州黑云智能科技有限公司 | Court case judgment prediction method |
CN113204567A (en) * | 2021-05-31 | 2021-08-03 | 山东政法学院司法鉴定中心 | Big data judicial case analysis and processing system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109033307B (en) | CRP clustering-based word multi-prototype vector representation and word sense disambiguation method | |
CN111966917B (en) | Event detection and summarization method based on pre-training language model | |
CN112231447B (en) | Method and system for extracting Chinese document events | |
CN108536870A (en) | A kind of text sentiment classification method of fusion affective characteristics and semantic feature | |
CN110717324B (en) | Judgment document answer information extraction method, device, extractor, medium and equipment | |
CN113505586A (en) | Seat-assisted question-answering method and system integrating semantic classification and knowledge graph | |
CN109918556B (en) | Method for identifying depressed mood by integrating social relationship and text features of microblog users | |
Huang et al. | Expert as a service: Software expert recommendation via knowledge domain embeddings in stack overflow | |
CN109446423B (en) | System and method for judging sentiment of news and texts | |
CN112101027A (en) | Chinese named entity recognition method based on reading understanding | |
CN113742733B (en) | Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type | |
CN111753059A (en) | Neural Embedding-based intelligent analysis method for judicial cases | |
CN116304035B (en) | Multi-notice multi-crime name relation extraction method and device in complex case | |
CN113112239A (en) | Portable post talent screening method | |
CN113946677A (en) | Event identification and classification method based on bidirectional cyclic neural network and attention mechanism | |
CN112434533A (en) | Entity disambiguation method, apparatus, electronic device, and computer-readable storage medium | |
CN115221864A (en) | Multi-mode false news detection method and system | |
CN112329442A (en) | Multi-task reading system and method for heterogeneous legal data | |
CN117115505A (en) | Emotion enhancement continuous training method combining knowledge distillation and contrast learning | |
CN113836941B (en) | Contract navigation method and device | |
CN114298047A (en) | Chinese named entity recognition method and system based on stroke volume and word vector | |
CN112711700A (en) | Method and system for recommending case for fair litigation | |
CN113590908A (en) | Information recommendation method based on attention mechanism | |
CN113220850A (en) | Case portrait mining method for court trial scoring | |
Liu | IntelliExtract: An End-to-End Framework for Chinese Resume Information Extraction from Document Images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |