CN111753059A - Neural Embedding-based intelligent analysis method for judicial cases - Google Patents

Neural Embedding-based intelligent analysis method for judicial cases Download PDF

Info

Publication number
CN111753059A
CN111753059A CN202010626120.8A CN202010626120A CN111753059A CN 111753059 A CN111753059 A CN 111753059A CN 202010626120 A CN202010626120 A CN 202010626120A CN 111753059 A CN111753059 A CN 111753059A
Authority
CN
China
Prior art keywords
entity
case
document
criminal
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010626120.8A
Other languages
Chinese (zh)
Inventor
肖利
蒋欣辰
曾芳伊雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Shufeng Technology Co ltd
Chengdu Ruima Technology Co ltd
Original Assignee
Hangzhou Shufeng Technology Co ltd
Chengdu Ruima Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Shufeng Technology Co ltd, Chengdu Ruima Technology Co ltd filed Critical Hangzhou Shufeng Technology Co ltd
Priority to CN202010626120.8A priority Critical patent/CN111753059A/en
Publication of CN111753059A publication Critical patent/CN111753059A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Technology Law (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Alarm Systems (AREA)

Abstract

A Neural Embedding-based intelligent analysis method for judicial cases adopts collecting criminal law documents and preprocessing case fact description documents; taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; and calculating the similarity of the document entity and the tag entity for the new data by using the trained Embedding vector, thereby making case judgment. The invention assists judges the criminals and lawyer professionals in terms of criminal names, recommends related law rules and judges the law in terms of criminal period, thereby saving the time and labor cost for case examination, improving the accuracy of judge law, and enabling the criminals and lawyer professionals to finish judge law more efficiently.

Description

Neural Embedding-based intelligent analysis method for judicial cases
Technical Field
The invention relates to the technical field of intelligent judicial judgment assistance, in particular to a Neural Embedding-based intelligent analysis method for judicial cases.
Background
The traditional case examination and judicial judgment rely on the professional solution and debate process of judges, lawyers and persons related to the inspection and official laws. In the face of complicated criminal cases, a large amount of manpower and material resources are needed for examination. And for the same case, different law practitioners have great difference and strong controversy on the criminal result of the case, and professionals who deal with the case have strong subjectivity on the case. Therefore, an intelligent judicial judgment method based on embedding is designed, and the judicial officers and lawyer professionals are assisted to judge the names of the criminals, recommend related legal regulations and judge the criminal period, so that the time and labor cost for examining the cases are saved, the accuracy of judicial judgment is improved, and the officers and the lawyer professionals can finish the legal judgment more efficiently.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a Neural Embedding-based intelligent analysis method for judicial cases, and the method adopts the steps of collecting criminal legal documents and preprocessing case situation fact description documents; taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; calculating the similarity of the document entity and the tag entity for the new data by using the trained Embedding vector, thereby making case judgment; the judicial officer and lawyer professional are assisted to judge the names of the criminals, recommend related legal rules and judge the judicial judgment of the criminal period, so that the time and labor cost for case examination are saved, the accuracy of judicial judgment is improved, and the judicial officer and lawyer can finish the legal judgment more efficiently.
The technical scheme adopted by the invention for solving the problems is as follows:
a Neural Embedding-based intelligent analysis method for judicial cases comprises the following steps,
s1: collecting criminal law documents and preprocessing case fact description documents;
s2: taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode;
s3: and calculating the similarity between the document entity and the label entity for the new data by using the Embedding vector obtained by training in the S2, and making case judgment by professionals such as judges, lawyers and the like according to the similarity.
Further, the preprocessing of the case fact description document in the step S1 includes the following sub-steps,
s101: filtering stop words and special symbols in case fact description documents, reserving Chinese character numbers and punctuation mark information, and segmenting words of texts;
s102: and discretizing the criminal sensitive word number in the case fact description text.
Further, the criminal-sensitive words described in step S102 include three types of money, weight, and blood alcohol concentration.
A specific method of obtaining the Embedding vector in step S2 includes the following steps,
s201: firstly, acquiring a vector of a stable entity a by adopting a characteristic summation mode;
s202: then, mapping out a vector of a tag entity b consistent with the dimension of the document entity;
s203: and finally, establishing a loss function for the vector of the document entity a and the vector of the label entity b to obtain an Embedding vector.
S201 comprises the following specific steps of using N-Gram in a word bag form as the characteristics of case description documents, wherein each document is composed of a group of documents fromDescribing discrete features of a fixed-length dictionary, assigning a D-dimensional vector to each feature, and then summing the features as vectors of document entities, assuming that the features of dictionary D are defined as a matrix F of D × D, where FiIndex iththCharacteristic, ∑i∈aFiAs a vector for document entity a;
s202 comprises the following specific steps of taking all discrete values of crime names and criminal period long and short elements of the law clauses, the judged notifier as different label entities b, and mapping all the label entities into different vectors with the same dimension as the document entities;
the specific steps of S203 are as follows, a loss function is established for the document entity a and the tag entity b, see formula (1):
Figure BDA0002566580200000021
this formula consists of the following parts:
1) generating a positive case entity pair (a, b) from the data, wherein a is a document entity described by case facts, and b is a label entity corresponding to the document entity a, such as a french article corresponding to the document entity a, a crime name of a judged notifier and a criminal period long and short label;
2) obtaining negative example entity b from random sampling in label not belonging to document entity a-Forming negative case entity pair (a, b)-);
3) Calculating the similarity of different entity pairs by using the cosine similarity;
4) loss function LbatchCalculating the loss by comparing the difference between the positive and negative example entity pairs; here, margin ranking loss is used as a loss function, see equation 2:
Figure BDA0002566580200000022
wherein B is the batch size, N is the number of negative cases, pbRepresents a positive case entity pair, pnRepresenting negative example entity pairs, the score function is a similarity function of the entity pairs, the loss function is optimized by random gradient descent,so that entity vectors with correlation have higher similarity.
In conclusion, the beneficial effects of the invention are as follows:
the method collects criminal law documents and preprocesses case fact description documents; taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; calculating the similarity of the document entity and the tag entity for the new data by using the trained Embedding vector, thereby making case judgment; the judicial officer and lawyer professional are assisted to judge the names of the criminals, recommend related legal rules and judge the judicial judgment of the criminal period, so that the time and labor cost for case examination are saved, the accuracy of judicial judgment is improved, and the judicial officer and lawyer can finish the legal judgment more efficiently.
Drawings
FIG. 1 is a schematic flow chart of the steps of the present invention.
Detailed Description
In order to solve the problem that the traditional case examination and judicial judgment in the prior art depends on professional answers and debate processes of judges, lawyers and related persons of the inspection official laws, in the face of complicated criminal cases, a large amount of manpower and material resources are needed for examination, different law practitioners have great difference and strong dispute on the criminal results of the cases for the same case, and the case-handling professional persons have strong subjectivity on the case; the method collects criminal law documents and preprocesses case fact description documents; taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; calculating the similarity of the document entity and the tag entity for the new data by using the trained Embedding vector, thereby making case judgment; the judicial officer and lawyer professional are assisted to judge the names of the criminals, recommend related legal rules and judge the judicial judgment of the criminal period, so that the time and labor cost for case examination are saved, the accuracy of judicial judgment is improved, and the judicial officer and lawyer can finish the legal judgment more efficiently. The present invention will be described in further detail with reference to the following examples and accompanying drawings, but the embodiments of the present invention are not limited thereto, and the drawings are only an example of the application of the present invention and do not essentially restrict the principle of the present invention.
Example (b):
as shown in fig. 1, a Neural Embedding-based intelligent analysis method for judicial cases includes the following steps,
s1: collecting criminal law documents and preprocessing case fact description documents; each case consists of case description and fact parts in a legal document and also comprises the legal provision related to each case, the name of the crime judged by the notifier, the duration of the criminal period and other elements;
s2: taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; because there is certain correlation among the law strip, the crime name of the judged notifier and the criminal phase, the three elements are combined with the case fact description text for combined training to obtain better effect, that is, all the document entities and the label entities are embedded into the same space, so that the distance between the entity vectors with the correlation is closer;
s3: and calculating the similarity between the document entity and the label entity for the new data by using the Embedding vector obtained by training in the S2, and making case judgment by professionals such as judges, lawyers and the like according to the similarity.
Further in this embodiment, the preprocessing of the case fact description document in step S1 includes the following sub-steps,
s101: filtering stop words and special symbols in case fact description documents, reserving Chinese character numbers and punctuation mark information, and segmenting words of texts;
s102: and discretizing the criminal sensitive word number in the case fact description text.
Further in the present embodiment, the criminal-sensitive words described in step S102 include three types of money, weight, and blood alcohol concentration.
Further in this embodiment, step S2 further includes the following sub-steps,
s201, using N-Gram to describe characteristics of documents in a word bag mode, wherein each document is described by a group of discrete characteristics from a fixed-length dictionary, allocating a D-dimensional vector to each characteristic, then summing the characteristics to be used as a vector of document entities, and assuming that the characteristics of a dictionary D are defined as a matrix F of D × D, wherein F isiIndex iththCharacteristic, ∑i∈aFiAs a vector for document entity a;
s202: taking all discrete values of the criminal names and the long and short criminal periods of the law clauses and the referees as different label entities b, and mapping all the label entities into different vectors with the same dimension as the document entities;
s203, establishing a loss function for the document entity a and the label entity b, and obtaining a result according to the formula (1):
Figure BDA0002566580200000041
this formula consists of the following parts:
1) generating a positive case entity pair (a, b) from the data, wherein a is a document entity described by case facts, and b is a label entity corresponding to the document entity a, such as a french article corresponding to the document entity a, a crime name of a judged notifier and a criminal period long and short label;
2) obtaining negative example entity b from random sampling in label not belonging to document entity a-Forming negative case entity pair (a, b)-);
3) Calculating the similarity of different entity pairs by using the cosine similarity;
4) loss function LbatchCalculating the loss by comparing the difference between the positive and negative example entity pairs; here, margin ranking loss is used as a loss function, see equation 2:
Figure BDA0002566580200000042
wherein B is the batch size, N is the number of negative cases, pbRepresents a positive case entity pair, pnAnd representing negative example entity pairs, wherein the score function is a similarity function of the entity pairs, and the loss function is optimized through random gradient descent, so that the entity vectors with correlation have higher similarity.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims (7)

1. A Neural Embedding-based intelligent analysis method for judicial cases is characterized by comprising the following steps,
s1: collecting criminal law documents and preprocessing case fact description documents;
s2: taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode;
s3: and calculating the similarity of the document entity and the label entity for the new data by using the Embedding vector obtained by training in the S2.
2. A Neural Embedding-based intelligent analysis method for judicial cases according to claim 1, wherein the preprocessing of case fact description documents in step S1 includes the following sub-steps,
s101: filtering stop words and special symbols in case fact description documents, reserving Chinese character numbers and punctuation mark information, and segmenting words of texts;
s102: and discretizing the criminal sensitive word number in the case fact description text.
3. The intelligent forensic case-based analysis method according to claim 2 in which the criminal sensitive words in step S102 comprise three types of money, weight and blood alcohol concentration.
4. The judicial case based Neural Embedding intelligent analysis method according to claim 1, wherein the specific method for obtaining the Embedding vector in step S2 comprises the following steps,
s201: firstly, acquiring a vector of a stable entity a by adopting a characteristic summation mode;
s202: then, mapping out a vector of a tag entity b consistent with the dimension of the document entity;
s203: and finally, establishing a loss function for the vector of the document entity a and the vector of the label entity b to obtain an Embedding vector.
5. The judicial case based Neural Embedding intelligent analysis method according to claim 4, wherein the specific steps of S201 are as follows,
using N-Gram in bag of words as a feature for case describing documents, each document being described by a set of discrete features from a fixed length dictionary, assigning a D-dimensional vector to each feature, and then summing the features as a vector of document entities, assuming that the features of dictionary D are defined as a matrix F of D × D, where F isiIndex iththCharacteristic, ∑i∈aFiAs a vector of document entity a.
6. The judicial case based Neural Embedding intelligent analysis method according to claim 4, wherein the specific steps of S202 are as follows,
all discrete values of the criminal names and the long and short criminal period elements of the law clauses, the judged notifiers are used as different label entities b, and all the label entities are mapped into different vectors with the same dimension as the document entities.
7. The judicial case based Neural Embedding intelligent analysis method according to claim 4, wherein the specific steps of S203 are as follows,
establishing a loss function for the document entity a and the tag entity b, and obtaining a loss function according to formula (1):
Figure FDA0002566580190000021
this formula consists of the following parts:
1) generating a positive case entity pair (a, b) from the data, wherein a is a document entity described by case facts, and b is a label entity corresponding to the document entity a, such as a french article corresponding to the document entity a, a crime name of a judged notifier and a criminal period long and short label;
2) obtaining negative example entity b from random sampling in label not belonging to document entity a-Forming negative case entity pair (a, b)-);
3) Calculating the similarity of different entity pairs by using the cosine similarity;
4) loss function LbatchCalculating the loss by comparing the difference between the positive and negative example entity pairs; here, margin ranking loss is used as a loss function, see equation 2:
Figure FDA0002566580190000022
wherein B is the batch size, N is the number of negative cases, pbRepresents a positive case entity pair, pnRepresenting negative example entity pairs, score function being a similarity function of the entity pairs, and loss function being optimized by random gradient descent to haveCorrelated entity vectors have higher similarity.
CN202010626120.8A 2020-07-02 2020-07-02 Neural Embedding-based intelligent analysis method for judicial cases Pending CN111753059A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010626120.8A CN111753059A (en) 2020-07-02 2020-07-02 Neural Embedding-based intelligent analysis method for judicial cases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010626120.8A CN111753059A (en) 2020-07-02 2020-07-02 Neural Embedding-based intelligent analysis method for judicial cases

Publications (1)

Publication Number Publication Date
CN111753059A true CN111753059A (en) 2020-10-09

Family

ID=72678586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010626120.8A Pending CN111753059A (en) 2020-07-02 2020-07-02 Neural Embedding-based intelligent analysis method for judicial cases

Country Status (1)

Country Link
CN (1) CN111753059A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033176A (en) * 2021-05-19 2021-06-25 苏州黑云智能科技有限公司 Court case judgment prediction method
CN113204567A (en) * 2021-05-31 2021-08-03 山东政法学院司法鉴定中心 Big data judicial case analysis and processing system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241285A (en) * 2018-08-29 2019-01-18 东南大学 A kind of device of the judicial decision in a case of auxiliary based on machine learning
CN110858269A (en) * 2018-08-09 2020-03-03 清华大学 Criminal name prediction method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110858269A (en) * 2018-08-09 2020-03-03 清华大学 Criminal name prediction method and device
CN109241285A (en) * 2018-08-29 2019-01-18 东南大学 A kind of device of the judicial decision in a case of auxiliary based on machine learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033176A (en) * 2021-05-19 2021-06-25 苏州黑云智能科技有限公司 Court case judgment prediction method
CN113204567A (en) * 2021-05-31 2021-08-03 山东政法学院司法鉴定中心 Big data judicial case analysis and processing system

Similar Documents

Publication Publication Date Title
CN109033307B (en) CRP clustering-based word multi-prototype vector representation and word sense disambiguation method
CN111966917B (en) Event detection and summarization method based on pre-training language model
CN112231447B (en) Method and system for extracting Chinese document events
CN108536870A (en) A kind of text sentiment classification method of fusion affective characteristics and semantic feature
CN110717324B (en) Judgment document answer information extraction method, device, extractor, medium and equipment
CN113505586A (en) Seat-assisted question-answering method and system integrating semantic classification and knowledge graph
CN109918556B (en) Method for identifying depressed mood by integrating social relationship and text features of microblog users
Huang et al. Expert as a service: Software expert recommendation via knowledge domain embeddings in stack overflow
CN109446423B (en) System and method for judging sentiment of news and texts
CN112101027A (en) Chinese named entity recognition method based on reading understanding
CN113742733B (en) Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type
CN111753059A (en) Neural Embedding-based intelligent analysis method for judicial cases
CN116304035B (en) Multi-notice multi-crime name relation extraction method and device in complex case
CN113112239A (en) Portable post talent screening method
CN113946677A (en) Event identification and classification method based on bidirectional cyclic neural network and attention mechanism
CN112434533A (en) Entity disambiguation method, apparatus, electronic device, and computer-readable storage medium
CN115221864A (en) Multi-mode false news detection method and system
CN112329442A (en) Multi-task reading system and method for heterogeneous legal data
CN117115505A (en) Emotion enhancement continuous training method combining knowledge distillation and contrast learning
CN113836941B (en) Contract navigation method and device
CN114298047A (en) Chinese named entity recognition method and system based on stroke volume and word vector
CN112711700A (en) Method and system for recommending case for fair litigation
CN113590908A (en) Information recommendation method based on attention mechanism
CN113220850A (en) Case portrait mining method for court trial scoring
Liu IntelliExtract: An End-to-End Framework for Chinese Resume Information Extraction from Document Images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination