CN117708644A - Method and system for generating judicial judge document abstract - Google Patents

Method and system for generating judicial judge document abstract Download PDF

Info

Publication number
CN117708644A
CN117708644A CN202311592898.1A CN202311592898A CN117708644A CN 117708644 A CN117708644 A CN 117708644A CN 202311592898 A CN202311592898 A CN 202311592898A CN 117708644 A CN117708644 A CN 117708644A
Authority
CN
China
Prior art keywords
abstract
model
judicial
semantic
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311592898.1A
Other languages
Chinese (zh)
Inventor
高琰
吴杰
刘正涛
黎娟
许思琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202311592898.1A priority Critical patent/CN117708644A/en
Publication of CN117708644A publication Critical patent/CN117708644A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Tourism & Hospitality (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Economics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a system for generating a judicial referee document abstract, which specifically comprise the following steps: acquiring sentence semantic function labels; acquiring sentence key degree vectors containing semantic function information; generating an extraction type abstract; training a long text abstract model; generating a generated abstract; the invention fuses the legal semantic function information of the sentences into the classification of the key sentences by using the condition normalization, so that the selection of the key sentences is more logical. The method constructs the generated abstract model in a RoFormer+Unilm mode, and is more suitable for the common long text situation in the legal field. And meanwhile, the legal dictionary is fused into the generation of the generated abstract, so that the final generated abstract has more professionality.

Description

Method and system for generating judicial judge document abstract
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method and a system for generating a judicial referee document abstract.
Background
The judicial referee document has the characteristics of overlong article space, difficult understanding of French and French, complex text sentence pattern structure and the like. Meanwhile, as the disclosure amount of the judge document increases, the working difficulty of lawyers and judges in retrieving, reviewing and analyzing the judge document is increased. To solve this problem, the judicial referee document abstract has been developed. The method reflects the reasons, the judgment basis, the judgment result and the like of the cases in the case examination process by compressing, summarizing and summarizing the content of the document and using short-spread and popular and easy-to-understand languages, thereby being beneficial to improving the efficiency of judicial personnel in case processing links and popularizing laws.
The current automatic abstract technology mainly comprises an extraction type abstract and a generation type abstract, wherein the extraction type abstract method is used for screening key sentences which can most represent the core content of an article from the text to form an abstract, and the generation type abstract is used for extracting key information from an input text and re-combing the information to generate the abstract with strong specialty and readability.
Traditional extraction type abstract methods can be divided into two types, unsupervised and supervised. The unsupervised extraction type abstract algorithm mainly comprises an extraction algorithm of a topic model and an extraction algorithm based on graph ordering. Gong et al first applied the Latent Semantic Analysis (LSA) method to a summarization task that converted weighted word frequencies in each sentence into sentence vectors, singular value decomposed the sentence vector matrix, and selected sentences therefrom to compose a summary. The TextRank algorithm constructs a graph using text units and clause similarity, and selects appropriate nodes as digests by calculating importance scores for the nodes. The supervised extraction type abstract algorithm converts the extraction task into a sentence classification task; the FastText algorithm uses N-gram features and hierarchical softmax penalty functions to reduce computational complexity to achieve simple and efficient text classification. The SummaRunner model proposed by Nallapati et al uses a long and short term memory network (LSTM) to construct a double-layer encoder to generate a document-level encoded representation of a sentence, and introduces the relative position and absolute position characteristics of the sentence, so that the importance of text units can be clearly expressed, and the method has better interpretability.
With the continuous development of neural network technology, a generated abstract model based on a neural network is gradually focused on the academic community. Sutskever et al propose a Seq2Seq (Sequence to Sequence) model on the task of machine translation that employs an end-to-end sequence-to-sequence training strategy based on an Encoder-Decoder (Encoder-Decoder) structure. However, the performance of the Seq2Seq model typically decreases as the text length increases. To address this problem, bahdanau et al have introduced a mechanism of attention in the model that allows the decoder to focus on more comprehensive text information during decoding. Rush et al first applied the attention mechanism to the task of generating the summary, achieving a better effect on the sentence-level summary dataset.
With the continued advancement of pretraining technology, pretraining models specifically for text generation are becoming a hotspot for current text generation research, including MASS, uniLM, BART, GPT, T and PEGASUS, among others. Based on a pre-training model, researchers continuously improve and promote the abstract generation method. Zhu et al propose a FASum model, which fuses knowledge maps with abstract generation to identify and correct possible factual errors in the abstract. Xu et al then uses contrast learning to improve the digest quality by minimizing the similarity distance between the three sequences of the original text, the artificial digest, and the model generated digest. In general, the pre-training model provides a more efficient and reliable solution to the text excerpt generation task.
The existing automatic abstracting method often has the problems of semantic information missing, incoherence before and after sentences, logic errors, poor readability, insufficient speciality, text length limitation and the like when processing the judge document.
The traditional extraction type abstract method only carries out semantic coding on the information of sentences, and classifies the sentences according to the semantic coding; the method does not consider the structural characteristics of the judicial referee document, the extracted key sentences have no logic relation, only the common characteristics among the key sentences are considered, so that certain key sentences without the characteristics are difficult to identify by a model, and the extracted abstract also lacks part of key information to influence the effect of the finally generated abstract. Aiming at the problem, the invention provides a method for fusing legal and semantic functional information into an extraction model, and a condition normalization method is used for replacing an original layer normalization method in a Wobbe model, so that the extraction model fully considers legal and semantic functions of sentences and logical relations among sentences with different functions when identifying key sentences. Meanwhile, the pre-training model is often trained by using general corpus data, and even if the abstract generated by the fine-tuning model is performed by using law domain data, the abstract is often faced with the defect of insufficient professionality.
Disclosure of Invention
The invention aims to solve the technical problem of providing a judicial referee document abstract generating method and a judicial referee document abstract generating system aiming at the defects of the prior art, so that the length of an input text is not limited.
In order to solve the technical problems, the invention adopts the following technical scheme: a judicial judge document abstract generating method comprises the following steps:
s1, inputting sentences in judicial judge documents into a semantic function classification module to obtain semantic function labels R-Label of each sentence;
s2, inputting sentences in the judicial referee document and corresponding semantic function labels into a semantic information fusion module to obtain sentence key degree vectors;
s3, mapping the keyword vectors to probabilities, identifying key sentences in the judge document, and splicing all the key sentences to obtain an extraction type abstract;
and S4, taking the extracted abstract as the input of a long text abstract model, and training the long text abstract model to obtain an abstract generation model.
The method of the invention further comprises:
s5, taking the extracted abstract and the generated abstract sequence as the input of the abstract generation model, merging the probability of the next word predicted by the legal dictionary, and calculating the sequence with the largest cumulative probability by using the cluster search, wherein the sequence is the final generated abstract.
The specific implementation process of the step S5 comprises the following steps:
1) Dividing all words in the extracted abstract of the legal field dictionary and the input judicial judge document in an N-gram mode to generate an N-gram set D of the legal dictionary and an N-gram set S of the input text;
2) Calculating the weight of each N-gram according to the frequency of occurrence in each N-gram in the sets D and S;
3) Set Y {1:t-1} For candidate abstract sequences, the word Y being decoded is the t-th word of the abstract sequence, and the abstract and abstract sequences Y are extracted {1:t-1} Inputting a long text abstract model, and outputting the probability of each candidate word in a candidate word list as a word y by the model;
4) Calculating a word y comprehensive legal dictionary score K (y) by using a formula K (y) =beta×D (y) +gamma×S (y), wherein D (y) and S (y) respectively represent weights of y in a set D and a set S, and the weights are obtained by the weights of N-grams to which the word y obtained in the step 2) belongs (for example, a word with a position of t-1 in a candidate sequence is a "weight", and a word with a position of t is a "heavy" score; the term "weight" is a term that is split into 2-grams ("weights"), and the weights for 2-grams are found in the set of D, S, which is the corresponding D ("weights"), S ("weights"). ). Beta and gamma are the weights of adjusting parameters for balancing professional vocabulary in the legal field and extracted abstract vocabulary of the input judicial referee document;
5) Calculate the composite score F (y) for a given word y: f (y) =p (y) +λ×k (y); wherein P (y) is the probability of model generation, and lambda is the adjustment parameter;
selecting the Beam-Width (which can be set according to the need) words with the highest comprehensive score and the first t-1 words in the sequence to form Beam-Width candidate sequences;
6) Repeating the steps 3) to 5) until the current word y is the ending symbol or the length of the abstract reaches the maximum value, and selecting the sequence with the highest score in the candidate sequences as the final generated abstract.
The semantic function classification module comprises a semantic information encoder and a semantic function classifier;
the semantic information encoder adopts a Wobert model and is used for obtaining semantic information vectors R-Embedding of sentences:where n represents the number of words in the sentence, w i Representing word vectors obtained by encoding the ith word by the Wobrt model;
the semantic function classifier comprises a BiLSTM model, a linear layer and a CRF layer which are sequentially connected; the input of the BiLSTM model is semantic information vector R-Embedding, and the output of the CRF layer is semantic function Label R_Label.
In step S2, the semantic information fusion module includes a multi-layer transducer module, and a formula for normalizing the semantic function Label R-Label by the transducer module is as follows:
g c =g+W g .R-Label;
b c =b+W b .R-Label;
wherein a' represents the normalized parameter value, R_Label is a sentence semantic function Label, W g And W is b Representing a transformation matrix; l represents the first hidden layer, mu l Sum sigma l Respectively representing expected values and variances of the first layer parameters, g and b respectively representing gain parameters and bias parameters before adding legal semantic function label condition vectors, g c And b c And respectively representing the gain parameter and the bias parameter after the legal semantic function label condition vector is added.
In step S3, the keyword vector is mapped to probability through the full connection layer and the activation function, and a formula is calculatedThe method comprises the following steps: p is p i =σ(WM i +b); wherein sigma represents a sigmoid activation function, W is a weight matrix of a full connection layer, b is a bias vector, and p i Is the output probability of the classifier.
In step S4, the long text abstract model combines the rofumer pre-training model and the Unilm to construct a generative model, and the attention calculation formula of the model is as follows:
wherein M is a mask matrix, and the ith row and the jth column of elements M in M i,j The calculation formula of (2) is as follows:M i,j information indicating whether the ith character can acquire the jth character is Q, K, V are respectively a query vector, a key vector, a value vector, Q m And K n Respectively representing vectors after adding position information, d k Representing the dimension of the K vector, R m For the rotation matrix of position m, R n For the rotation matrix of position n, attention nm is the attention calculation result after adding the mask matrix M.
Specifically, in the rofomer model, a method called rotational position coding (RoPE) is employed, and relative position coding is achieved by coding absolute positions using a rotation matrix. Defining the rotation matrix of position m as R m The rotation matrix of the position n is R n The attention calculation formula containing the relative position information is:
wherein Q is m And K n Respectively representing vectors after adding position information, d k Representing the dimension of the K vector. Q, K, V are query vectors, key vectors, value vectors within the intent mechanism.
While UniLM adds a mask matrix to the original attention mechanism, which means that the vocabulary can only obtain its predecessor information when generating text, thereby enabling the model to simulate a one-way language model when processing the text generation task. The calculation formula of the attention mechanism at this time is:
wherein M is i,j Information indicating whether the ith character can acquire the jth character.
The invention also provides a judicial judge document digest generation system which comprises a memory and at least one processor, wherein the judicial judge document digest generation system comprises a storage unit and a processor; the memory has one or more programs stored thereon which, when executed by the one or more processors, cause the one or more processors to implement the steps of the above-described method of the present invention.
As an inventive concept, the present invention also provides a computer storage medium having stored thereon a computer program/instruction which, when executed by a processor, implements the steps of the above-described method of the present invention.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the invention, different semantic functions among sentences in the judicial referee document are fused into the extraction model by using conditional normalization, so that the extraction type abstract is more in accordance with the logic of the referee document, and the abstract can more accurately grasp key information in the document.
2. The method combines the RoFormer pre-training model with the Unilm to construct the generated model, so that the model is more suitable for processing the long text in the judicial field without forgetting phenomenon caused by overlong text length.
3. The invention improves the bundle searching algorithm, fuses legal dictionary in legal field into the bundle searching algorithm, and can greatly enhance the expertise and fluency of the abstract generated by the model.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention;
FIG. 2 is a diagram of a semantic function classifier according to an embodiment of the present invention
FIG. 3 is a block diagram of a bundle search module with prior knowledge fusion in accordance with an embodiment of the present invention
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
The embodiment 1 of the invention provides a judicial referee document abstract generating method, which comprises the following specific implementation steps:
step one: and inputting sentences in the judicial referee document into a semantic function classification module to obtain semantic function labels R-Label of each sentence. The method comprises the following specific steps:
the text of the judicial referee document comprises a head part, a fact, a reason, a referee basis, a referee main document and a tail part, and the semantic function classification module is used for identifying different semantic functions of sentences in the judicial referee document.
The semantic function classification module consists of a semantic information encoder and a semantic function classifier. The semantic information encoder is realized based on Wobert; the method comprises the specific steps of adding a classification layer to the data marked with semantic function labels after a Wobrt model for fine adjustment, removing the classification layer after training, coding each vocabulary in a sentence by using the weight of the trained pre-training model, and obtaining the semantic information vector (R-Embedding) of the sentence by an average pooling method.
Where n represents the number of words in the sentence, w i The word vector obtained by encoding the ith vocabulary by the Wobrt model is represented.
The semantic function classifier is implemented based on a BiLSTM+CRF model, and firstly, the obtained semantic information vector is input into a long-short-time memory network (BiLSTM) to obtain a new semantic vector (H-Embedding) of context semantic information among fusion sentences, and the process can be described as follows:
wherein,representing the forward hidden vector obtained by R-Embedding through BiLSTM network, < + >>And the backward hidden vector obtained by R-Embedding through a BiLSTM network is represented.
Then, the H-Embedding is converted into a vector s with the same dimension as the number of the semantic function labels by using a linear layer, wherein s (i,j) A score representing that the ith sentence is assigned to the jth semantic function label; finally, the CRF layer comprehensively considers the score of the current sentence and the transition probability between adjacent labels to predict the semantic function Label R_Label of each sentence.
Step two: and inputting sentences in the judicial referee document and corresponding semantic function labels into a semantic information fusion module to obtain sentence key degree vectors. The extraction type abstract model is also constructed based on a Wobert model and comprises 12 layers of transformers; in order to fuse the semantic function labels into the Wobrt model, a conditional normalization method is used to replace the original layer normalization method of the Transformer. The condition normalization method is to add a condition vector to control a calculation result on the basis of the layer normalization method, and the calculation process can be described as follows:
g c =g+W g .R-Label;
b c =b+W b .R-Label;
wherein a' represents the normalized parameter value, R_Label is a sentence semantic function Label, W g And W is b Representing a transformation matrix; l represents the first hidden layer, mu l Sum sigma l Representing the expected value and variance of the layer i parameter, respectively. g and b represent gain parameters and bias parameters before adding legal semantic function tag condition vector, g c And b c And the gain parameter and the bias parameter after the legal semantic function label condition vector is added are represented.
The WoBERT model, which fuses legal semantic information, converts each sentence of the judicial referee document into a keyword vector (K-Embedding).
Step three: after the semantic information fusion module, mapping the K-Embedding onto probability through a full connection layer and a sigmoid activation function, wherein the calculation mode is as follows:
p i =σ(WM i +b)
wherein sigma represents a sigmoid activation function, W is a weight matrix of a full connection layer, b is a bias vector, and p i Is the output probability of the classifier.
And judging whether the sentences belong to the key sentences according to the output probability by the model, and splicing all the key sentences in the judge document together to form the extraction type abstract.
Step four: and inputting the extracted abstract into a fine-tuned long text abstract model, and obtaining a final generated abstract by a bundle searching module fused with a legal dictionary according to the probability output by the long text abstract model. The method comprises the following specific steps:
firstly, fine tuning is carried out on a long text abstract model, the long text abstract model is designed based on a pre-training model RoFormer and a Unilm, and the RoFormer carries out pre-training on a general corpus, so that the model needs to be fine tuned for learning professional knowledge in the legal field;
the long text abstract model takes the extracted abstract and the generated abstract sequence as input, a candidate word list is built according to probability, a cluster search module fused with a legal dictionary selects a plurality of words with highest probability from the candidate word list and adds the words into the abstract sequence, and the process is repeated until the final generated abstract is generated. The method comprises the following specific steps:
s501: a professional dictionary of legal fields is collected and contains 1124 words commonly used in legal fields.
S502: and dividing all words in the extracted abstract of the legal field dictionary and the input judicial judge document in the mode of N-gram to generate an N-gram set D of the legal dictionary and an N-gram set S of the input text.
S503: weights are calculated for each N-gram based on the frequency of occurrence in each N-gram in sets D and S.
S504: set Y {1:t-1} For candidate abstract sequences, the word Y being decoded is the t-th word of the abstract sequence, and the abstract and abstract sequences Y are extracted {1:t-1} And inputting a long text abstract model.
S505: the comprehensive legal dictionary score for word y is calculated as follows:
K(y)=β×D(y)+γ×S(y)
wherein, D (y) and S (y) respectively represent the weights of y in the set D and the set S, and beta and gamma are adjusting parameters for balancing the weights of professional vocabulary in the legal field and extracted abstract vocabulary of the inputted judicial referee document.
S506: the long text abstract model generates a group of candidate words and probabilities thereof for the current sequence, and the comprehensive score of the word y is calculated as follows:
F(y)=P(y)+λ×K(y)
wherein F (y) is the comprehensive score of a given word y, P (y) is the probability of model generation, K (y) is the comprehensive legal dictionary score of the word y, and lambda is the adjustment parameter.
And selecting the Beam-Width words with the highest comprehensive score and the first t-1 words in the sequence to form Beam-Width candidate sequences.
S507: steps S504 and S506 are repeated until the current word y is the ending symbol or the length of the abstract reaches the maximum value, and the sequence with the highest score in the candidate sequences is selected as the final generated abstract.
In order to prove the effectiveness of the invention in the practical application process, a plurality of comparison models are built aiming at the extraction model and the generation model, and Rouge and bleu are adopted as evaluation indexes, which are common indexes for evaluating the similarity between the machine generated abstract and the manual written abstract. The comparison results are as follows:
the extraction type comparison model is introduced as follows:
(1) LEAD-3: the first three sentences are selected as text summaries. In this experiment, considering the characteristics of judicial referee documents, the improvement is to select continuous sentence fragments containing specific keywords (e.g. "litigation request") as digests.
(2) TextRank: a method for abstracting abstract based on graph ordering algorithm regards each sentence as a node in the graph, establishes edges by calculating the similarity between sentences, and calculates the weight score of each sentence. And finally, selecting N sentences with highest scores to form a abstract.
(3) FastText: an efficient and scalable text classification method focuses on utilizing subword information to speed training and improve processing power for rare vocabulary and misspellings.
(4) TextCNN: convolutional Neural Networks (CNNs) are applied to text classification tasks, and word vector sequences are processed with convolution kernels of different sizes to capture local features.
(5) SummaRunner: two double-layer GRU-RNNs are used for respectively encoding sentences and documents, document-level encoding characterization of the sentences is obtained, and text classification is carried out based on the characterization.
(6) BERT: the pre-trained language model BERT is fine-tuned using the criticality labels of sentences and the classification layer and applied to text classification.
(7) WoBERT: and fine-tuning a Chinese pre-training language model Wobert to perform text classification.
(8) BERTStum_EXT: a BERT-based extraction type abstract method is characterized in that [ CLS ] and [ SEP ] marks are inserted in front of and behind each sentence, different sentences are distinguished by using Segment Ids, and importance of the sentences in a document is evaluated through an abstract judgment layer.
Table 1 extraction model comparative experimental data:
the table compares the comparison results between the extracted abstract model fusing the legal and semantic information and other common extracted abstract models, so that the model is superior to other comparison models in a plurality of indexes, key legal information of the document can be more comprehensively captured, and high-quality input is provided for the subsequent abstract.
The generated comparative model is presented as follows:
(1) Seq2seq+attention: a summary baseline model is generated in which the encoder encodes the input text into vectors and the decoder generates a summary from the attention weights.
(2) PGN: the pointer generates a network model, which combines the Seq2Seq model, which is responsible for learning abstract concepts from the source text and generating abstracts, with a pointer network, which is used to copy key information directly from the source text.
(3) Pgn_trm: compared with PGN, PGN_Trm adopts a transducer as a sequence network, and copies on the basis of a self-attention mechanism, so that the method has stronger modeling capability and higher training efficiency.
(4) Bert+seq2Seq: the text summarization is performed by combining the pre-trained language model BERT with the Seq2Seq architecture.
(5) NEZHA+Seq2Seq: the pre-trained language model NEZHA is combined with the Seq2Seq architecture.
(6) Rofomer+seq 2Seq: the pre-trained language model rofomer is combined with the Seq2Seq architecture.
It should be noted that, the input of all the generated models is the key sentence extracted by the extraction model in the embodiment of the present invention.
Table 2 generates model comparison experimental data:
the comparison results between the generated abstract model fused with the legal dictionary and other common generated abstract models are compared in the table, so that the model is superior to other comparison models in all indexes, and the fact that the model can generate the judicial abstract with higher quality and stronger specialization is shown.
Table 3 ablation experimental results
When a semantic function information fusion module based on condition normalization is added in the extraction type abstract baseline model (Wobert), the recall rates of ROUGE-1, ROUGE-2 and ROUGE-L are obviously improved by 4.67 percent, 6.79 percent and 6.64 percent respectively. The result shows that the model can more effectively extract sentences containing the key information by integrating the legal and semantic information into the key sentence classification process in a layer normalization mode, so that the quality of the abstract is improved. The extraction type abstract model extracts key information (extraction type abstract) in the original text and takes the key information as input of the generation type abstract model, and the generation type abstract model is further refined and summarized. From table 3, it can be seen that the generated summary baseline model (Roformer) has a considerable improvement over the decimated summary, which further demonstrates the performance superiority of the decimated-generated two-stage summary model. When a bundle searching module (KBS) of a fused legal dictionary is added into a generated abstract baseline model (Roformer), all indexes are improved, which shows that the quality of the generated abstract can be effectively improved by fusing priori knowledge.
TABLE 4 sentence semantic function labeling specification
The basic structure of the judicial referee document comprises three parts of title, text and deposit. We divide the text part into according to the semantic functions that differ between sentences: head, fact, reason, referee basis, referee main text and tail. The header is a statement and confirmation of the validity of the examination case part program; the fact part is used for elaborating the related case facts after the case is examined; the reason part is the detailed description and demonstration of judgment reasons by judges; the judge basis is legal rules and related legal regulations according to which a judge is in the judging process; the judge main text aims at the judging result of the scheme and the corresponding legal effectiveness; the tail part is the signing and announcement of the referee document. In order to introduce legal semantic information on the basis of the abstract data set, a sentence semantic function labeling guide of the referee document is formulated, and fine and strict legal semantic information labeling work is performed on the original data. Table 4 details the sentence semantic function label specification formulated by the embodiments of the present invention.
Example 2
Embodiment 2 of the present invention provides a system corresponding to embodiment 1, including a memory, a processor, and a computer program stored in the memory; the processor executes the computer program on the memory to implement the steps of the method of embodiment 1 described above.
In some implementations, the memory may be high-speed random access memory (RAM: random Access Memory), and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
In other implementations, the processor may be a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or other general-purpose processor, which is not limited herein.
Example 3
Embodiment 3 of the present invention provides a computer-readable storage medium corresponding to embodiment 1 described above, on which a computer program/instructions is stored. The steps of the method of embodiment 1 described above are implemented when the computer program/instructions are executed by a processor.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any combination of the preceding.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The solutions in the embodiments of the present application may be implemented in various computer languages, for example, object-oriented programming language Java, and an transliterated scripting language JavaScript, etc.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (9)

1. The method for generating the judicial referee document abstract is characterized by comprising the following steps of:
s1, inputting sentences in judicial judge documents into a semantic function classification module to obtain semantic function labels R-Label of each sentence;
s2, inputting sentences in the judicial referee document and corresponding semantic function labels into a semantic information fusion module to obtain sentence key degree vectors;
s3, mapping the keyword vectors to probabilities, identifying key sentences in the judge document, and splicing all the key sentences to obtain an extraction type abstract;
and S4, taking the extracted abstract as the input of a long text abstract model, and training the long text abstract model to obtain an abstract generation model.
2. The judicial referee document digest generation method of claim 1, further comprising:
s5, taking the extracted abstract and the generated abstract sequence as the input of the abstract generation model, merging the probability of the next word predicted by the legal dictionary, and calculating the sequence with the largest cumulative probability by using the cluster search, wherein the sequence is the final generated abstract.
3. The judicial referee document digest generation method according to claim 2, wherein the specific implementation procedure of step S5 includes:
1) Dividing all words in the extracted abstract of the legal field dictionary and the input judicial judge document in an N-gram mode to generate an N-gram set D of the legal dictionary and an N-gram set S of the input text;
2) Set Y {1:t-1} For candidate abstract sequences, the word Y being decoded is the t-th word of the abstract sequence, and the abstract and abstract sequences Y are extracted {1:t-1} Inputting a long text abstract model to obtain the probability that each candidate word in the candidate word list is a word y;
3) Calculating a word y comprehensive legal dictionary score K (y) by using a formula K (y) =beta×D (y) +gamma×S (y), wherein D (y) and S (y) respectively represent the weights of y in a set D and a set S, and beta and gamma are regulating parameters;
4) Calculate the composite score F (y) for a given word y: f (y) =p (y) +λ×k (y); wherein P (y) is the probability of the output of the long text abstract model, and lambda is the adjustment parameter;
selecting the Beam-Width words with the highest comprehensive score and the first t-1 words in the sequence to form Beam-Width candidate sequences;
5) Repeating the steps 2) to 4) until the current word y is the ending symbol or the length of the abstract reaches the maximum value, and selecting the sequence with the highest score in the candidate sequences as the final generated abstract.
4. The judicial referee document digest generation method of claim 1, wherein the semantic function classification module includes a semantic information encoder and a semantic function classifier;
the semantic information encoder adopts a Wobert model and is used for obtaining semantic information vectors R-Embedding of sentences:wherein n represents the sentenceWord number, w i Representing word vectors obtained by encoding the ith word by the Wobrt model;
the semantic function classifier comprises a BiLSTM model, a linear layer and a CRF layer which are sequentially connected; the input of the BiLSTM model is a sequence formed by sentence semantic information vectors R-Embedding, and the output of the CRF layer is a semantic function Label R_Label.
5. The judicial referee document abstract generating method according to claim 1, wherein in step S2, the semantic information fusion module includes a multi-layer fransformer module, and a formula of normalizing the semantic function Label R-Label by the fransformer module is as follows:
g c =g+W g .R-Label;
b c =b+W b .R-Label;
wherein a' represents the normalized parameter value, R_Label is a sentence semantic function Label, W g And W is b Representing a transformation matrix; l represents the first hidden layer, mu l Sum sigma l Respectively representing expected values and variances of the first layer parameters, g and b respectively representing gain parameters and bias parameters before adding legal semantic function label condition vectors, g c And b c And respectively representing the gain parameter and the bias parameter after the legal semantic function label condition vector is added.
6. The method for generating a judicial referee document abstract according to claim 1, wherein in step S3, the criticality vector is mapped onto probability through a full connection layer and an activation function, and a calculation formula is: p is p i =σ(WM i +b); wherein sigma represents a sigmoid activation function, W is a weight matrix of a full connection layer, b is a bias vector, and p i Is the output probability of the classifier.
7. The method for generating a judicial referee document abstract according to claim 1, wherein in step S4, the long document abstract model includes a RoFormer model, and an attention calculation formula of the RoFormer model is:
wherein M is a mask matrix, and the ith row and the jth column of elements M in M i,j The calculation formula of (2) is as follows:M i,j information indicating whether the ith character can acquire the jth character is Q, K, V are respectively a query vector, a key vector, a value vector, Q m And K n Respectively representing vectors after adding position information, d k Representing the dimension of the K vector, R m For the rotation matrix of position m, R n For the rotation matrix of position n, attention nm is the attention calculation result after adding the mask matrix M.
8. A judicial referee document digest generation system comprising a memory and at least one processor; the memory having stored thereon one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the steps of the method of any of claims 1 to 7.
9. A computer storage medium having stored thereon a computer program/instruction, which when executed by a processor, implements the steps of the method according to any of the preceding claims 1-7.
CN202311592898.1A 2023-11-27 2023-11-27 Method and system for generating judicial judge document abstract Pending CN117708644A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311592898.1A CN117708644A (en) 2023-11-27 2023-11-27 Method and system for generating judicial judge document abstract

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311592898.1A CN117708644A (en) 2023-11-27 2023-11-27 Method and system for generating judicial judge document abstract

Publications (1)

Publication Number Publication Date
CN117708644A true CN117708644A (en) 2024-03-15

Family

ID=90157954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311592898.1A Pending CN117708644A (en) 2023-11-27 2023-11-27 Method and system for generating judicial judge document abstract

Country Status (1)

Country Link
CN (1) CN117708644A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118350462A (en) * 2024-06-14 2024-07-16 人民法院信息技术服务中心 Judicial relation element extraction method and device based on label vector orthogonal constraint

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118350462A (en) * 2024-06-14 2024-07-16 人民法院信息技术服务中心 Judicial relation element extraction method and device based on label vector orthogonal constraint

Similar Documents

Publication Publication Date Title
CN110348016B (en) Text abstract generation method based on sentence correlation attention mechanism
Bakhtin et al. Real or fake? learning to discriminate machine from human generated text
Du et al. Text classification research with attention-based recurrent neural networks
JP5128629B2 (en) Part-of-speech tagging system, part-of-speech tagging model training apparatus and method
CN110825848A (en) Text classification method based on phrase vectors
CN110807324A (en) Video entity identification method based on IDCNN-crf and knowledge graph
CN111814477B (en) Dispute focus discovery method and device based on dispute focus entity and terminal
CN112163089B (en) High-technology text classification method and system integrating named entity recognition
CN117708644A (en) Method and system for generating judicial judge document abstract
WO2024104438A1 (en) Multimedia retrieval method and apparatus, and device, medium and program product
CN115048511A (en) Bert-based passport layout analysis method
CN114265936A (en) Method for realizing text mining of science and technology project
CN113254586B (en) Unsupervised text retrieval method based on deep learning
CN113590827B (en) Scientific research project text classification device and method based on multiple angles
Tomer et al. STV-BEATS: skip thought vector and bi-encoder based automatic text summarizer
CN116611428A (en) Non-autoregressive decoding Vietnam text regularization method based on editing alignment algorithm
Jiang et al. A hierarchical bidirectional LSTM sequence model for extractive text summarization in electric power systems
CN115840815A (en) Automatic abstract generation method based on pointer key information
Wang et al. Fine-grained Chinese named entity recognition based on MacBERT-Attn-BiLSTM-CRF model
CN113901172B (en) Case-related microblog evaluation object extraction method based on keyword structural coding
CN115659172A (en) Generation type text summarization method based on key information mask and copy
CN114943216A (en) Case microblog attribute-level viewpoint mining method based on graph attention network
CN114970538A (en) Text error correction method and device
CN112784036A (en) Extraction type text summarization method based on unsupervised ensemble learning
Yu et al. Semantic extraction for sentence representation via reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination