CN110765889A

CN110765889A - Legal document feature extraction method, related device and storage medium

Info

Publication number: CN110765889A
Application number: CN201910936787.5A
Authority: CN
Inventors: 何芳芳; 邵博
Original assignee: Ping An Zhitong Consulting Co Ltd Shanghai Branch
Current assignee: Ping An Zhitong Consulting Co Ltd Shanghai Branch
Priority date: 2019-09-29
Filing date: 2019-09-29
Publication date: 2020-02-07
Anticipated expiration: 2039-09-29
Also published as: CN110765889B

Abstract

A legal document feature extraction method, a related device and a storage medium are provided, wherein the legal document is pre-identified, and a paragraph division model and a feature extraction model corresponding to the legal document are determined; wherein, the feature extraction model comprises the corresponding relation between the document paragraphs and the document elements; performing document paragraph segmentation on the legal document through the paragraph segmentation model; and extracting document elements corresponding to the document paragraphs from the legal documents divided from the document paragraphs through the feature extraction model, and outputting the extraction results of the document elements.

Description

Legal document feature extraction method, related device and storage medium

Technical Field

The present application relates to the field of electronic technologies, and in particular, to a method for extracting features of a legal document, a related apparatus, and a storage medium.

Background

With the continuous perfection of the law system in China, the right-maintaining consciousness of people is increasingly improved, the law service plays a very important role in daily life, the law service is an important component in various industries in the society, and various internet and law platforms are created and operated on line like bamboo shoots in spring after rain. However, legal services, as an industry with strong individuation and specialization, have higher requirements on the internet +.

Legal documents contain rich legal concepts and legal logic. By deconstructing the case, the case requesting element can be assisted to be rapidly mastered by the user.

In the prior art, legal documents are deconstructed, deconstructed elements are simple, only simple document type classification can be realized, complete legal logic is lacked, and effective case combing information is difficult to provide.

Disclosure of Invention

The embodiment of the application provides a legal document feature extraction method, an electronic device and a computer-readable storage medium, which are used for deconstructing the content of specific document elements of a legal document.

A first aspect of the embodiments of the present application provides a method for extracting features of a legal document, including:

pre-identifying a legal document, and determining a paragraph division model and a feature extraction model corresponding to the legal document; wherein, the feature extraction model comprises the corresponding relation between the document paragraphs and the document elements;

performing document paragraph segmentation on the legal document through the paragraph segmentation model;

and extracting document elements corresponding to the document paragraphs from the legal documents divided from the document paragraphs through the feature extraction model, and outputting the extraction results of the document elements.

In an implementation manner of the embodiment of the present application, before the pre-identifying the legal document, the method further includes:

pre-processing the legal instrument, the pre-processing comprising at least one of:

abnormal line feed processing, Chinese amount processing, Chinese number to Arabic number conversion, punctuation format unification, illegal character replacement and wrongly written or mispronounced character processing.

In an implementation manner of the embodiment of the present application, the pre-identifying the legal document and determining the paragraph segmentation model and the feature extraction model corresponding to the legal document includes:

identifying a document title of the legal document;

determining the document type corresponding to the legal document according to the document title;

and determining a paragraph division model corresponding to the document type and a feature extraction model corresponding to the paragraph division model.

In an implementation manner of the embodiment of the present application, the extracting, by the feature extraction model, document elements from a legal document obtained by segmenting a document paragraph includes:

obtaining a document paragraph obtained after paragraph division is performed on the legal document, and taking the document paragraph as an input object of the feature extraction model; the feature extraction model comprises a plurality of document element rules.

Segmenting the text paragraphs according to punctuation marks, and cutting a plurality of sentences to form a sentence sequence;

screening a document element rule corresponding to the document paragraph in the feature extraction model according to the document paragraph after paragraph division;

reading sentences one by one in sequence according to the sentence sequence, and performing feature matching on the read sentences by using the document element rule corresponding to the document paragraphs; and outputting the corresponding document element after matching a document element rule successfully, and matching the next sentence until all sentences in the sentence sequence are matched.

In an implementation manner of the embodiment of the present application, the feature extraction model includes: a TextCNN network, a TextRNN network, and a TextRCNN network;

the TextCNN network, the TextRNN network and the TextRCNN network are arranged in parallel, and the output ends of the three networks are connected;

the input information of the three networks is consistent and is a symbolic document paragraph; and the output information of the three networks is a label identification result, and the three label identification results are added and averaged to obtain the output of the feature extraction model. A second aspect of the embodiments of the present application provides a feature extraction device for a legal document, including:

the system comprises a pre-recognition unit, a feature extraction unit and a classification unit, wherein the pre-recognition unit is used for pre-recognizing a legal document and determining a paragraph division model and a feature extraction model corresponding to the legal document; wherein, the feature extraction model comprises the corresponding relation between the document paragraphs and the document elements;

a paragraph dividing unit for performing document paragraph division on the legal document through the paragraph dividing model;

and the feature extraction unit is used for extracting document elements corresponding to the document paragraphs from the legal documents divided by the document paragraphs through the feature extraction model and outputting the extraction results of the document elements.

In an implementation manner of the embodiment of the present application, the apparatus further includes: a pre-processing unit;

the preprocessing unit is used for preprocessing the legal documents, and the preprocessing comprises at least one of the following steps:

In an implementation manner of the embodiment of the present application, the feature extraction unit is specifically configured to:

reading sentences one by one in sequence according to the sentence sequence, and performing feature matching on the read sentences according to the document element rule; and outputting the corresponding document element after matching a document element rule successfully, and matching the next sentence until all sentences in the sentence sequence are matched.

A third aspect of the embodiments of the present application provides another electronic apparatus, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the feature extraction method of the legal document provided by the first aspect of the embodiment of the application.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for extracting features of a legal document provided in the first aspect of the embodiments of the present application.

In the scheme, the legal document is pre-identified, and the paragraph division model and the feature extraction model corresponding to the legal document are determined; then, document paragraph division is carried out on the legal document through the paragraph division model, and finally document elements are extracted from the legal document after the document paragraph division through the feature extraction model; the embodiment of the application utilizes the strong relevance between the document paragraphs and the document features (namely, some document elements exist in specific document paragraphs with high probability), so that for the content of complex document elements, the document elements can be quickly positioned in a plurality of document paragraphs with probable document element probability through a paragraph division mode, and then the document elements are extracted in the document paragraphs in a targeted manner, thereby improving the feature extraction efficiency of legal documents.

Drawings

Fig. 1 is a schematic flow chart illustrating an implementation of a method for extracting features of a legal document according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a feature extraction device for legal documents according to an embodiment of the present application;

fig. 3 is a schematic diagram of a hardware structure of an electronic device according to another embodiment of the present disclosure.

Detailed Description

In order to make the objects, features and advantages of the present invention more apparent and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Example one

The embodiment of the present application provides a method for extracting features of a legal document, where the method for extracting features of a legal document is applied to an electronic device, the electronic device may be a smart phone, a tablet computer, a computer, or a device with an application program installable thereon, and an operating system of the electronic device may be an ios, an android, a windows system, or another operating system, which is not limited herein.

Referring to fig. 1, the method for extracting features of legal documents mainly includes the following steps:

101. pre-identifying a legal document, and determining a paragraph division model and a feature extraction model corresponding to the legal document;

pre-identifying a legal document, and determining a paragraph division model and a feature extraction model corresponding to the legal document; the feature extraction model comprises the corresponding relation between the document paragraphs and the document elements.

Illustratively, the pre-recognition may be: identifying a document title of the legal document; determining the document type corresponding to the legal document according to the document title; and determining a paragraph division model corresponding to the document type and a feature extraction model corresponding to the paragraph division model.

In practice, there are many types of legal documents, such as law cases (civil, criminal, and administrative), arbitration, referees, and the like. Different legal documents correspond to different general deconstruction features and individual deconstruction features. Therefore, before the legal documents are deconstructed, the embodiment of the application can logically sort the legal documents into different legal documents, and determine the corresponding general deconstruction characteristics of the different legal documents; and setting a corresponding model for deconstruction according to the general deconstruction characteristics.

In the embodiment of the present application, different legal documents may correspond to different paragraph segmentation models and feature extraction models. Specifically, the embodiment of the present application trains different types of legal documents in a machine learning manner, such as "manner of paragraph division" and "what document elements are extracted in what document paragraph". The specific legal document has specific document paragraph characteristics (for example, the document paragraph characteristics can be divided into 5 paragraphs, and what is the paragraph content corresponding to the 5 paragraphs), and the mapping relationship is established through machine learning and is stored in a local processing terminal.

Illustratively, when a legal document to be identified is taken, a paragraph division model corresponding to the legal document and used for paragraph division and a feature extraction model (used for extracting document elements) corresponding to the paragraph division model are determined through identification of a specific position (such as a title position) in the legal document. There are two sets of mapping relationships, which are: the processing terminal locally stores two groups of mapping relations, and can acquire the paragraph division model and the feature extraction model corresponding to the legal document to be identified by searching the character identification and the mapping relation of the document title.

In practical applications, in order to improve the efficiency and accuracy of legal document processing, the legal document may be preprocessed before the legal document is pre-identified, the preprocessing including at least one of: abnormal line feed processing, Chinese amount processing, Chinese number to Arabic number conversion, punctuation format unification, illegal character replacement and wrongly written or mispronounced character processing.

102. Performing document paragraph segmentation on the legal document through the paragraph segmentation model;

and carrying out document paragraph division on the legal document after document pretreatment through a paragraph division model. The present application is described with referee documents (an examination and judgment) as an application example.

Illustratively, the paragraph categories to be divided include: title section, litigation subject section, trial passage section, litigation section, dialectic section, evidence section, trial finding section, court opinion section, decision result section, court staff section.

For example, the paragraph segmentation model for the pre-segmentation class may be a rule model in practical application, and the rule model belongs to a machine learning model of an unsupervised algorithm class. A crf (conditional random field) model, which belongs to a markov probabilistic graphical model and is widely used in the text sequence labeling problem, may also be used.

Regarding document paragraph division, in practical applications, a target paragraph category to be divided may be preset as an N (N is an integer greater than one) category, corresponding to N paragraph extraction rules, where the target paragraph and the paragraph extraction rules are in one-to-one correspondence. For example, there are 5 paragraph types in the "referee document", which correspond to 5 "paragraph extraction rules". An example of a "paragraph extraction rule": the paragraph start feature may be "what we believe in the hospital", the paragraph end feature may be "decide as follows", and the paragraph end feature may be "start feature of another paragraph extraction rule".

In practical application, paragraph features of the referee document are obvious and can be exhausted, and the rule extraction model is preferentially used. Paragraph extraction rules may be set in the paragraph segmentation model, the paragraph extraction rules including a paragraph start feature and a paragraph end feature. Taking "the section is regarded as the section in the hospital" as an example, the section start feature may be "regarded as the section in the hospital", the section end feature may be "decided as follows", and the section end feature may be "a start feature of another section extraction rule".

103. And extracting document elements corresponding to the document paragraphs from the legal documents divided from the document paragraphs through the feature extraction model, and outputting the extraction results of the document elements.

After the processing of step 102, N paragraphs divided according to the paragraph extraction rule are obtained, and M types of document elements are extracted according to the N paragraphs, where M is an integer greater than N.

Specifically, the document elements are logical elements related to cases in the legal document. Illustratively, M may be 7, and the document elements specifically include: parties, appeal items, dispute items, evidence items, fact items, dispute focus, court opinions, decision items.

Illustratively, the document elements of the party (citizens, companies, law entity information, etc.) may be extracted from litigation subject paragraphs; the paperwork elements of the main appeal items can be extracted from the appeal paragraphs; the document elements of the resolution item can be extracted from the resolution paragraph; document elements of the evidence items can be extracted from the evidence passage; document elements that can extract factual items from the trial finding paragraphs; document elements of dispute focus and court opinions can be extracted from the sections considered by the institute; the document elements of the decision items may be extracted from the decision results.

The document elements in the embodiment of the present application are mainly implemented by a machine learning model for rule extraction (see the following second embodiment), "rule extraction" is applicable to an obvious and exhaustible text feature extraction scenario, such as an explicit paragraph starting text (for example, "thought in the home institution"). For some unobvious and inexhaustible text feature extraction scenes, semantic recognition is required, and a supervised neural network model can be used for implementation (see the third embodiment below).

Example two

The embodiment of the present application mainly describes a scheme for implementing document element extraction through a rule-extracted machine learning model, where each document element corresponds to a document element rule, and the document element rule is used to read a feature corresponding to the document element in a sentence.

Step 1, obtaining a document paragraph obtained after paragraph division is performed on the legal document, and taking the document paragraph as an input object of the feature extraction model; the feature extraction model comprises a plurality of document element rules.

Step 2, segmenting the text paragraphs according to punctuation marks, and cutting a plurality of sentences to form a sentence sequence;

step 3, according to the document paragraphs obtained after paragraph division, screening document element rules corresponding to the document paragraphs in the feature extraction model;

step 4, reading sentences one by one in sequence according to the sentence sequence, and performing feature matching on the read sentences by using the document element rule corresponding to the document paragraphs; and outputting the corresponding document element after matching a document element rule successfully, and matching the next sentence until all sentences in the sentence sequence are matched.

Illustratively, the text elements such as "court opinions" are extracted, in the "what is considered by the institute" paragraph, in punctuation. And? | The! "sentence division, reading sentences one by one according to the corresponding document element rule of" court view ", locating to the sentence containing" judge as follows "(" court view "one of the document element rules), for example" through the judgment of the symposium, according to the regulations of the one hundred ninety six article of the contract law of the people's republic of China, the 6 th article of the opinions of the highest people's court about the loan of people's court trial, the one hundred forty-two article, the one hundred forty-four article and the one hundred fifty-two article of the litigation of people's republic of China "regulation of the one hundred forty-four article, the one hundred fifty-two article: "extraction of citation of law and regulation sentences in court opinions can be realized. Further, the law sentence can be extracted through the ' plus ' characteristic, and the extraction of the law items ' the one hundred and ninety six items of the contract law of the people's republic of China ', ' the several opinions of the highest people's court about the trial loan case of the people's court ' item ' 6 th item, ' the one hundred and forty two items of the law of the people's republic of China ', the one hundred and forty four items of the people's republic of China, and the one hundred and fifty items of the fifty items ' can be realized.

Illustratively, as extraction of the principal categories: plaintiff, defendant, plaintiff agent, defendant agent, statutory representative, etc. Such as for the principal paragraph "original: tang Xiao Lin, women, Han nationality. \ n proxy agent: zhongchunfang, Hubei law firm lawyer. \ n is reported: wuhan lucky real estate development and construction Limited, and Hubei province of residence, Hubei province, Wuhan city, Hanyang district Von Families. \ n statutory representatives: liu increased, the president of the company. \ n proxy agent: yunheng, lawyer of law firm of Beijing Yingke (Wuhan). \ n is reported: fengwangjie, male, Han nationality. \ n is reported: zhang Youling, male, Han nationality. \ n is reported: increased Liu Kai, male and Han nationality. \ n proxy agent: yunheng, lawyer of law firm of Beijing Yingke (Wuhan). The entity category corresponding to the 'down forest' is 'original' and the extracted rule is 'paragraph statement division + location of the entity starting position in the statement + search of the statement (first position, entity starting position)', and finally the punctuation mark in the middle is removed to confirm that the category is 'original'.

If the processing targets are:

"original notice: miss Wednesday;

is informed: mr. Lin "

"paragraph statement segmentation" refers to paragraph segmentation features, e.g., the introductory descriptions of the above two human categories are segmented, and therefore need to identify paragraph segmentation features, e.g., segmenters. The "entity start position" refers to a place with characters, i.e. without using punctuation marks as start bits. "removing the middle punctuation" means, for example, that the identified feature is ": miss weekly "removes the symbol of": and the remaining characters are reported.

Exemplary, court view: supporting original announcement, supporting defendant, refuting original announcement, and refuting defendant. Mainly by rules. Such as: "because the small Tang forest does not submit evidence to the home to verify that 48.8 ten thousand yuan of interest is derived from interest components respectively generated by borrowing the principal from 3000 ten thousand yuan and 1240 ten thousand yuan, the small Tang forest bears a litigation request with responsibility for all the interest generated by 4240 ten thousand yuan about Liuhua Kai, and has no evidence, and the home carries out refution according to law. "first, the entity type" source of Tang Xiaolin "," Liu Hua Kai | quilt "appearing in the sentence is identified, and the key sentence" so that Tang Xiaolin takes the litigation request with responsibility for all interest generated by 4240 ten thousand yuan in relation to Liu Hua Kai, there is no real basis, and the court law is refuted. The "rule feature appearing in" Tang Xiaolin "+" litigation request "+" refute ", and is classified as" refute Notice ".

EXAMPLE III

The embodiment of the present application mainly describes a scheme for realizing document element extraction by using a supervised neural network model, and specifically includes:

firstly, training a neural network model;

the neural network model of the embodiment of the application integrates three single models, 1: based on the TextCNN network, BatchNormal is added, and two fully connected layers are used in classification. Model 2: based on a TextRNN network, using bidirectional Long-Short Term Memory (LSTM), and classifying after K-Max boosting the hidden vector during classification; model 3: TextRCNN network. And (3) fusing the three model networks, namely adding the outputs of the three networks, and performing model training by using corresponding legal document samples (marked with document elements).

the input information of the three networks is consistent and is a symbolic document paragraph. For example, in practical applications, before the text data is input into the neural network, the text data is subjected to sentence breaking, word segmentation and part-of-speech analysis, and then related words are carried by using numeric characters, such as "1" for "you", "2" for "at" and "3" for "there". The symbolized text "where you are" is "123".

And the output information of the three networks is a label identification result, and the three label identification results are added and averaged to obtain the output of the feature extraction model. For example, it is assumed that the feature extraction model presets document elements requiring labels in 6 classes, and the label identification result may be a 6-bit number sequence, which respectively corresponds to the document elements in 6 classes. Wherein each bit represents a probability that the currently identified content is a particular document element; the three tag identification results are added, namely the probabilities on the corresponding bits are added, and then the results after addition on each bit are averaged.

Taking the document elements of the appeal items as an example, firstly dividing document paragraphs of each sample in legal document samples, then labeling the document paragraph contents possibly with the appeal items, and putting the labeled document paragraphs into a neural network model for training.

Secondly, extracting document elements;

and acquiring a document paragraph related to the appeal item, and inputting the document paragraph related to the appeal item into the neural network model to extract document elements.

For example, for the category "request repayment principal", the category-related complaint items mentioned in the small number of annotation cases are expressed as "request: 1. the mascot company is judged to immediately repay 4240 ten thousand yuan of borrowing principal of Tang Xiaolin and 965.2 ten thousand yuan of interest on the day of ending the appeal, and the mascot company is judged to repay the interest from the day of starting the appeal to the day of clearing the debt; ". Marking 10% of cases, learning the category characteristics in the related appeal item expression through a classification algorithm, and marking category labels on appeal items of other unmarked cases by using the trained characteristic model.

Example four

Please refer to fig. 2, which provides a feature extraction apparatus for legal documents according to an embodiment of the present application. The electronic device can be used for realizing the feature extraction method of the legal document provided by the embodiment shown in the figure 1. As shown in fig. 2, the feature extraction device of the legal document mainly includes:

the pre-recognition unit 201 is configured to pre-recognize a legal document, and determine a paragraph segmentation model and a feature extraction model corresponding to the legal document; wherein, the feature extraction model comprises the corresponding relation between the document paragraphs and the document elements;

a paragraph dividing unit 202, configured to perform document paragraph division on the legal document through the paragraph dividing model;

a feature extraction unit 203, configured to extract, by using the feature extraction model, a document element corresponding to a document paragraph from a legal document into which the document paragraph is divided, and output an extraction result of the document element.

In an implementation manner of the embodiment of the present application, the apparatus further includes: a preprocessing unit 204;

the preprocessing unit 204 is configured to preprocess the legal document, where the preprocessing includes at least one of:

In an implementation manner of the embodiment of the present application, the pre-recognition unit 201 is specifically configured to:

identifying a document title of the legal document;

In an implementation manner of the embodiment of the present application, the feature extraction unit 203 is specifically configured to:

It should be noted that, in the embodiment of the electronic device illustrated in fig. 2, the division of the functional modules is only an example, and in practical applications, the above functions may be distributed by different functional modules according to needs, for example, configuration requirements of corresponding hardware or convenience of implementation of software, that is, the internal structure of the electronic device is divided into different functional modules to complete all or part of the functions described above. In practical applications, the corresponding functional modules in this embodiment may be implemented by corresponding hardware, or may be implemented by corresponding hardware executing corresponding software. The above description principles can be applied to various embodiments provided in the present specification, and are not described in detail below.

For a specific process of each function module in the electronic device provided in this embodiment to implement each function, please refer to the specific content described in the embodiment shown in fig. 1, which is not described herein again.

EXAMPLE five

An embodiment of the present application provides an electronic device, please refer to fig. 3, which includes:

a memory 301, a processor 302 and a computer program stored in the memory 301 and capable of running on the processor 302, wherein the processor 302 executes the computer program to implement the method for extracting the features of the legal document described in the embodiment shown in fig. 1.

Further, the electronic device further includes:

at least one input device 303 and at least one output device 304.

The memory 301, the processor 302, the input device 303, and the output device 304 are connected via a bus 305.

The input device 303 may be a camera, a touch panel, a physical button, a mouse, or the like. The output device 304 may specifically be a display screen.

The Memory 301 may be a Random Access Memory (RAM) Memory or a non-volatile Memory (non-volatile Memory), such as a magnetic disk Memory. The memory 301 is used to store a set of executable program code, and the processor 302 is coupled to the memory 301.

Further, an embodiment of the present application also provides a computer-readable storage medium, where the computer-readable storage medium may be provided in an electronic device in the foregoing embodiments, and the computer-readable storage medium may be the memory in the foregoing embodiment shown in fig. 3. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method of feature extraction for legal documents described in the embodiment shown in fig. 1 above. Further, the computer-readable storage medium may be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk, or an optical disk.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a readable storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned readable storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The above description of the method for extracting features of legal documents, the electronic device and the computer-readable storage medium provided in this application will be apparent to those skilled in the art from the following description, wherein all changes can be made in the embodiments and applications of the method according to the teachings of the present application.

Claims

1. A method for extracting features of legal documents is characterized by comprising the following steps:

2. The method of extracting features of legal documents according to claim 1,

before the pre-recognition of the legal document, the method further comprises the following steps:

3. The method of extracting features of legal documents according to claim 1,

the pre-recognition of the legal document and the determination of the paragraph segmentation model and the feature extraction model corresponding to the legal document comprise:

identifying a document title of the legal document;

4. The method of extracting features of legal documents according to claim 1,

the method for extracting the document elements from the legal document after document paragraph division through the feature extraction model comprises the following steps:

5. The method of extracting features of legal documents according to claim 1,

the feature extraction model includes: a TextCNN network, a TextRNN network, and a TextRCNN network;

the input information of the three networks is consistent and is a symbolic document paragraph; and the output information of the three networks is a label identification result, and the three label identification results are added and averaged to obtain the output of the feature extraction model.

6. A feature extraction device of a legal document, comprising:

7. The legal document feature extraction apparatus of claim 6,

the device further comprises: a pre-processing unit;

8. The legal document feature extraction apparatus of claim 6,

the feature extraction unit is specifically configured to:

9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 5.