CN115936932A - Method and device for processing judicial documents, electronic equipment and storage medium - Google Patents

Method and device for processing judicial documents, electronic equipment and storage medium Download PDF

Info

Publication number
CN115936932A
CN115936932A CN202211704216.7A CN202211704216A CN115936932A CN 115936932 A CN115936932 A CN 115936932A CN 202211704216 A CN202211704216 A CN 202211704216A CN 115936932 A CN115936932 A CN 115936932A
Authority
CN
China
Prior art keywords
judicial
entity information
subject matter
target
classification system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211704216.7A
Other languages
Chinese (zh)
Inventor
郭曼
胡泽婷
张天宇
路兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing E Hualu Information Technology Co Ltd
Original Assignee
Beijing E Hualu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing E Hualu Information Technology Co Ltd filed Critical Beijing E Hualu Information Technology Co Ltd
Priority to CN202211704216.7A priority Critical patent/CN115936932A/en
Publication of CN115936932A publication Critical patent/CN115936932A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method and a device for processing a judicial essay, electronic equipment and a storage medium, wherein the method comprises the following steps: marking the content of each paragraph of the judicial literature to obtain a first content subject and a second content subject; performing regularization representation on a regular paragraph represented by the first content subject matter to obtain first entity information corresponding to the first content subject matter; inputting the irregular paragraphs represented by the second content subject matter into a text classification model of a classification system corresponding to the second content subject matter to obtain second entity information corresponding to the second content subject matter; and associating a first content subject matter with first entity information corresponding to the first content subject matter, and associating a classification system with second entity information corresponding to the classification system to obtain a structured representation result of the judicial literature. The technical scheme provided by the invention can improve the efficiency of structural representation of the judicial documents to a certain extent.

Description

Method and device for processing judicial documents, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of natural language processing, in particular to a processing method and device of a judicial writing, electronic equipment and a storage medium.
Background
The judicial documents are important resources for researching legal text information, and provide important element indexes for legal artificial intelligence application research based on judicial document class case recommendation, judgment result prediction, intelligent question answering and the like. However, the judicial writing is disclosed in substantially plain text, and is typically unstructured, making it difficult to accurately identify and extract information from the judicial writing. We need to use text recognition methods to structurally represent the judicial paperwork. In the prior art, corresponding marking is mainly performed by a judicial worker aiming at the structural representation of the judicial literature, then the manual marking efficiency is low due to the prominent contradiction problem of few persons in case, and the accuracy of the structural representation of the judicial literature is directly influenced by the personal experience of the marking worker.
Disclosure of Invention
In view of this, the embodiments of the present invention provide a method and an apparatus for processing a judicial literature, an electronic device, and a storage medium, which can improve the efficiency of structured representation of the judicial literature to a certain extent.
The invention provides a method for processing a judicial essay on one hand, which comprises the following steps: marking the content of each section of the obtained judicial essay to obtain a first content subject matter representing the regular section of the judicial essay and a second content subject matter representing the irregular section of the judicial essay; performing regularization representation on a regular paragraph represented by the first content subject matter to obtain first entity information corresponding to the first content subject matter; inputting the irregular paragraphs represented by the second content subject to a text classification model of a classification system corresponding to the second content subject to obtain second entity information corresponding to the second content subject; and associating a first content subject matter with first entity information corresponding to the first content subject matter, and associating a classification system with second entity information corresponding to the classification system to obtain a structured representation result of the judicial literature.
In one embodiment, the method for processing a judicial essay further comprises: carrying out segmentation processing on the obtained judicial documents, and removing blank lines and illegal characters in the judicial documents to obtain target judicial documents; correspondingly, marking the content of each paragraph of the target judicial essay.
In one embodiment, regularizing the regular segments of the first content subject matter representation to obtain first entity information corresponding to the first content subject matter comprises: inputting the regular paragraphs for the first content subject matter representation into a deep learning model based on context description labeling, and obtaining regular expressions for the regular paragraphs of the first content subject matter representation; extracting first entity information corresponding to the first content subject matter based on the regular expression.
In one embodiment, inputting the irregular passage of the second content subject matter representation into a text classification model of a classification system corresponding to the second content subject matter, and obtaining second entity information corresponding to the second content subject matter comprises: determining a target irregular paragraph corresponding to a preset classification system; and inputting the target irregular paragraphs into a text classification model corresponding to the preset classification system to obtain second entity information corresponding to the preset classification system.
In one embodiment, inputting the target irregular paragraph into a text classification model corresponding to the preset classification system, and obtaining second entity information corresponding to the preset classification system includes: performing word segmentation processing on the target irregular sections to obtain a plurality of target words; generating a plurality of target word vectors for a plurality of the target words, respectively; matching the target word vectors with a plurality of classification categories included in the preset classification system to obtain the matching degree between the target irregular paragraphs and the classification categories; and taking the classification category corresponding to the maximum matching degree as second entity information corresponding to the preset classification system.
In one embodiment, the second content theme includes case fact information, and the method for processing a judicial writing further includes: inputting the irregular paragraphs represented by the case situation fact information into a named entity recognition model, and recognizing case addresses in the irregular paragraphs to obtain third entity information; and taking the result obtained after the case address is associated with the third entity information as the structural representation result of the judicial writing.
In one embodiment, the method for processing a judicial writing further comprises: regularization processing is carried out on the obtained legal document, and the regularization processing result is stored according to the form of a graph structure, so that a legal provision knowledge base is generated; and constructing a judicial literature knowledge graph based on the legal provisions in the legal provisions knowledge base, the entities in the judicial literature and the relationship among the entities in the judicial literature.
In one embodiment, the method for processing a judicial writing further comprises: similarity calculation is carried out on a target irregular paragraph corresponding to a target classification system in a target judicial essay and an irregular paragraph corresponding to the target classification system in a preset judicial essay library, so that the similarity between the target irregular paragraph and the irregular paragraph corresponding to the target classification system in the judicial essay library is obtained; the preset forensic script library comprises a plurality of judicial scripts from which the second entity information is extracted; and taking the classification category of the irregular section corresponding to the maximum similarity in the target classification system as second entity information of the target judicial literature.
In one embodiment, the method for processing a judicial essay further comprises: extracting first entity information of a first content subject matter represented by the target judicial essay; and adding a result obtained by associating the first content subject of the target judicial essay with the first entity information into the judicial essay knowledge graph, and adding a result obtained by associating the target judicial essay with the second entity information corresponding to the classification system according to the classification system into the judicial essay knowledge graph.
The invention also provides a processing device of the judicial literature, which comprises a paragraph marking unit, a paragraph marking unit and a paragraph marking unit, wherein the paragraph marking unit is used for marking the contents of each obtained paragraph of the judicial literature to obtain a first content subject matter representing the regular paragraph of the judicial literature and a second content subject matter representing the irregular paragraph of the judicial literature; a first entity information extraction unit, configured to perform regularization representation on a regular paragraph represented by the first content subject matter to obtain first entity information corresponding to the first content subject matter; a second entity information extraction unit, configured to input the irregular paragraphs represented by the second content subject to a text classification model of a classification system corresponding to the second content subject, so as to obtain second entity information corresponding to the second content subject; and the structural representation unit is used for associating a first content theme with first entity information corresponding to the first content theme and associating a classification system with second entity information corresponding to the classification system to obtain a structural representation result of the judicial essay.
In another aspect, the present invention further provides an electronic device, where the electronic device includes a processor and a memory, where the memory is used to store a computer program, and when the computer program is executed by the processor, the electronic device implements the processing method of the judicial literature described above.
In another aspect, the present invention further provides a computer-readable storage medium for storing a computer program, which when executed by a processor, implements the processing method of the judicial documents described above.
The method comprises the steps of dividing a judicial document into regular paragraphs and irregular paragraphs, extracting first entity information of the judicial document by adopting a regular expression aiming at the regular paragraphs, extracting second entity information corresponding to a classification system based on a text classification model corresponding to the classification system and inputting the irregular paragraphs corresponding to different classification systems, associating a first content subject and the first entity information corresponding to the first content subject, associating the classification system and the second entity information corresponding to the classification system, and obtaining a structured representation result of the judicial document, so that the efficiency of structured processing of the judicial document can be improved to a certain extent.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:
FIG. 1 shows a flow diagram of a method of processing a judicial instrument according to an embodiment of the invention;
FIG. 2 shows a schematic view of a processing device for forensic documents in one embodiment of the present invention;
fig. 3 shows a schematic structural diagram of an electronic device in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
The judicial documents refer to special documents formed and used by judicial authorities such as investigation, inspection, judgment and notarization in each link and step of processing various cases, and mainly comprise documents with legal effectiveness, such as judgment documents and referents. The judicial documents are important resources for researching legal text information, and provide important element indexes for legal artificial intelligence application research based on judicial document class case recommendation, judgment result prediction, intelligent question answering and the like. However, the judicial writing is disclosed in substantially plain text, and is typically unstructured, making it difficult to accurately identify and extract information from the judicial writing. We need to use text recognition methods to structurally represent the judicial paperwork. In the prior art, corresponding labeling is mainly carried out by judicial workers aiming at the structural representation of the judicial writing so as to solve the contradiction of 'less persons in case of a plan', and an informatization means is required to be widely applied. In view of the fact that the number of the current judges is small, the case trial time is long, and the contradiction of 'a plurality of cases is small', the trial efficiency is difficult to meet the requirement, and the storage and classification are too complex, so that the data of judicial cases, paperwork and the like are disordered, and a series of problems exist in AI actual combat application in the current judicial field: case factor extraction is not accurate, information relation association is not tight, data analysis is not professional, and intelligent application is not in place. The current structural expression method of the judicial literature does not treat the judicial literature with an obvious structure differently, and the paragraphs with regular characteristics and the irregular paragraphs are processed in the same way, so that the processing efficiency is low. Therefore, a structural representation system of the judicial texts needs to be established to realize intelligent information extraction of the judicial texts, so that the efficiency of structural representation of the judicial texts in judicial practice can be improved.
Referring to fig. 1, a method for processing a judicial literature according to an embodiment of the present application may include the following steps.
S110: and marking the content of each section of the obtained judicial essay to obtain a first content subject matter representing the regular section of the judicial essay and a second content subject matter representing the irregular section of the judicial essay.
In this embodiment, the judicial essay includes regular and irregular passages. The first content theme is used to represent content characterized by a particular paragraph in the regular paragraph. Specifically, for example, a regular paragraph may include information such as document number, applicant, attorney, citation statute, etc., and the first content subject matter is "document number", "applicant", "attorney", "citation statute", etc.
In this embodiment, for the irregular paragraphs, since the judicial literature is presented according to a certain structure and characteristics of the judicial literature, the structure and characteristics of the judicial literature can be used to label each irregular paragraph to obtain a second content theme of each irregular paragraph, where the second content theme may include: a principal basic information paragraph, a case routing paragraph, a trial and error passage paragraph, a case fact paragraph, a processing (request) reason paragraph, a processing (request) opinion paragraph, and the like.
S120: and performing regularization representation on the regular paragraphs of the first content theme representation to obtain first entity information corresponding to the first content theme.
In this embodiment, the regular expression describes a pattern of character string matching, and may be used to check whether a string contains a certain substring, replace the matched substring, or extract a substring that meets a certain condition from a certain string, and therefore, for a regular paragraph, the regular expression corresponding to the first content theme may be used to extract the first entity information corresponding to the first content theme. Specifically, for example, the first content theme is "document number" of a judicial literature, and the regular paragraphs corresponding to the first content theme are "document number: 202212220001", then the regular expression can be used to extract the number part in the regular paragraph as the first entity information.
S130: and inputting the irregular paragraphs represented by the second content subject to a text classification model of a classification system corresponding to the second content subject to obtain second entity information corresponding to the second content subject.
In the present embodiment, in addition to identifying the content information in the judicial documents, it is necessary to classify and sort the types of the judicial documents. For different classification systems, query can be performed on irregular paragraphs corresponding to the judicial texts. Specifically, for example, if the classification system is an administration classification, and the second subject matter corresponding to the classification system is an applicant information section, the irregular section corresponding to the applicant information section may be input to the text classification model of the administration classification, and the output result of the text classification model of the administration classification may be used as the second entity information corresponding to the subject matter of the second content. The second entity information corresponding to the administrative management category may include police, national security, labor and social security, judicial administration, civil administration, land mine, environmental protection, agriculture, water conservancy, forestry, house removal, and the like. When the classification system is a request for a counteroffer, the second subject matter corresponding to the classification system is a reason section requested by the applicant; the second entity information corresponding to the application for the review item may include: information disclosure, administrative inaction, administrative penalties, administrative permissions, etc. When the classification system is a case result, the second subject content corresponding to the classification system is a processing conclusion paragraph; the second entity information corresponding to the case result may include: confirm violations, termination, refund, maintenance, partial revocation, partial maintenance, and the like. When the classification system is the cause of the complaint, the second subject content corresponding to the classification system is the processing (request) opinion paragraph; the second entity information corresponding to the cause of the complaint may include: violations of legal procedures, lack of evidence of misidentification, errors in applicable laws and regulations, failure to fulfill legal responsibilities, and the like. The information extraction of the judicial literature is carried out in an artificial intelligence mode, the working efficiency of the judicial information filling personnel can be improved, and the working key points can be excavated and the auxiliary decision can be provided based on the extraction of various entities in the text by utilizing various judicial materials.
S140: and associating a first content subject matter with first entity information corresponding to the first content subject matter, and associating a classification system with second entity information corresponding to the classification system to obtain a structured representation result of the judicial literature.
In the embodiment, the first entity information corresponding to the first content theme and the first content theme is associated, the classification system and the second entity information corresponding to the classification system are associated, and the result of the association is used as a method for structurally expressing the judicial documents, so that the method has a good auxiliary effect on classification and filing of the judicial documents. On the other hand, in the subsequent case management process, the judicial personnel can search the previously managed cases through the first entity information or the second entity information so as to provide corresponding auxiliary reference basis for managing the current cases. Specifically, for example, a first content theme "application number" is associated with first entity information "20221220001" corresponding to the first content theme to obtain a set of data < application number, 20221220001>, and then a classification system is set as an administration category, and second entity information corresponding to the classification system is associated with an environment to obtain a set of data < administration category, environment protection >. Later, judicial personnel can screen the case through the keyword for protecting the environment, check the information such as the judicial bases and the quotation laws of the case and provide auxiliary reference for the judicial jurisdictions of the case.
In one embodiment, the method for processing a judicial essay may further include: carrying out segmentation processing on the obtained judicial documents, and removing blank lines and illegal characters in the judicial documents to obtain target judicial documents; correspondingly, the content of each paragraph of the target judicial literature is labeled.
In the present embodiment, data cleaning of the judicial writing is required before the processing of the judicial writing, and when the text is segmented, some meaningless symbols such as blank lines, space characters, and irregular characters appear, and thus, the alignment is required to be removed. The data cleaning mode can be realized by removing the part of the character string formed by the character string being empty or the paragraph having illegal characters through the regular expression.
In one embodiment, regularizing the regular segments of the first content subject matter representation to obtain first entity information corresponding to the first content subject matter may include: inputting the regular paragraphs for the first content subject matter representation into a deep learning model based on context description labeling, and obtaining regular expressions for the regular paragraphs of the first content subject matter representation; extracting first entity information corresponding to the first content subject matter based on the regular expression.
In the present embodiment, since the first content is directed to the regular content, the first entity information corresponding to the first content can be obtained by extracting information using the regular expression. For the regular paragraphs, a regular expression of the corresponding entity of the judicial document is generated based on the context description labeled deep learning model, and then the document is structured. Obtaining entities such as: document number, applicant, attorney, citation act, etc. Based on the big data of the judicial documents, extracting the context and the description characteristics associated with the target entity by using a deep learning model, and automatically generating the regular expression according to the extraction result. The current mainstream judicial text processing method is based on regular matching of a large number of complex rules, even if the matching rules are accurately found, the regular expressions still need to be set manually, and the batch text processing efficiency is low.
In one embodiment, inputting the irregular passage of the second content subject matter representation into a text classification model of a classification system corresponding to the second content subject matter, and obtaining second entity information corresponding to the second content subject matter may include: determining a target irregular paragraph corresponding to a preset classification system; and inputting the target irregular paragraphs into a text classification model corresponding to the preset classification system to obtain second entity information corresponding to the preset classification system.
In this embodiment, it is necessary to determine the classification system corresponding to the judicial literature, and then search for the second subject content corresponding to the classification system in the judicial literature, for example, for the administrative classification system, the corresponding second subject content is the information paragraph of the applicant; for the application review item classification system, the corresponding second subject matter is a reason paragraph requested by the applicant; for the case result classification system, the corresponding second subject matter content is a processing conclusion paragraph; for the reason of failure, this classification system corresponds to the second main subject matter of processing (requesting) an opinion paragraph. And then, inputting the paragraph text corresponding to the second subject matter content into the classification system model to obtain the classification category of the judicial literature under the classification system.
In the embodiment, according to the characteristics of various entities and attributes thereof, a professional lexicon and an expert rule base are constructed, related paragraphs are labeled, a training data set is constructed, and then the text classification model can be trained by using methods such as LSTM-Attention, ERNIE and the like. In the present embodiment, the types of text classification models used for different classification systems may have the same structure or may be heterogeneous models. Specifically, for example, the classification system of the administration category and the application for the review item may be an LSTM-Attention model, and then the model is trained by using corresponding data to obtain an administration category text classification model and an application for the review item text classification model for the administration category. In addition, the number of hidden layers included in a model using the same structure may be different.
In one embodiment, inputting the target irregular paragraph into a text classification model corresponding to the preset classification system, and obtaining second entity information corresponding to the preset classification system includes: performing word segmentation processing on the target irregular sections to obtain a plurality of target words; generating a plurality of target word vectors for a plurality of the target words, respectively; matching the target word vectors with a plurality of classification categories included in the preset classification system to obtain the matching degree between the target irregular paragraphs and the classification categories; and taking the classification category corresponding to the maximum matching degree as second entity information corresponding to the preset classification system.
In the present embodiment, in the process of classifying the judicial documents, the word segmentation process needs to be performed on the paragraphs first. Before the words are segmented, a professional word bank can be constructed by introducing legal names and judicial related professional words; then, segmenting the judicial texts according to the constructed professional dictionary; and performing part-of-speech tagging, context description tagging and removal of stop words. And then, matching the target irregular paragraphs in a professional dictionary to obtain a plurality of target words. And then, respectively constructing word vectors for the target words to obtain a plurality of target word vectors, carrying out matching degree operation on the target word vectors and a plurality of category vectors formed by a plurality of classification categories included by the classification system, and taking the category with the highest matching degree as a text classification result of the judicial literature under the classification system, namely second entity information corresponding to the classification system.
In one embodiment, the second content theme includes case fact information, and the processing method of the judicial writing may further include: inputting the irregular paragraphs represented by the case situation fact information into a named entity recognition model, and recognizing case addresses in the irregular paragraphs to obtain third entity information; and taking the result obtained after the association of the case address and the third entity information as the structural representation result of the judicial literature.
In the present embodiment, the entity information of the judicial literature needs to identify address information in the judicial literature, such as case addresses of cases, in addition to the rule entity information and the classification information of the judicial literature. For the identification of the address class information, a named entity identification model can be used for identifying the address information. For case address, the case situation fact information can be searched. Therefore, the case fact information can be input into the named entity recognition model, and then the case address is obtained. Before that, BIO labeling is carried out on a paragraph where an entity is located (B represents an entity start mark, I represents the middle part of the entity, and O represents a word of other non-entity information), and a training data set is constructed; and (3) training the model by using an ALBERT + BilSTM + CRF method, wherein the ALBERT calculates to obtain a text word vector, and the BilSTM + CRF learns the context characteristic information to obtain a named entity recognition model.
In one embodiment, the method for processing a judicial essay may further include: regularization processing is carried out on the obtained legal document, and the regularization processing result is stored according to the form of a graph structure, so that a legal provision knowledge base is generated; and constructing a judicial literature knowledge graph based on the legal provisions in the legal provisions knowledge base, the entities in the judicial literature and the relationship among the entities in the judicial literature.
In the embodiment, in order to find the judicial information and the like quoted by the judicial documents or to find which judicial documents use the law according to the law, the legal documents can be found more intuitively and quickly in the form of the knowledge map. Firstly, legal documents to be identified, such as the criminal law of the people's republic of China, the constitution of the people's republic of China, the food safety law of the people's republic of China and the like, can be obtained, a rule base is constructed, and the legal documents are structured through regular expressions by taking the compilation, the seal, the section, the bar, the money, the item and the item of the legal provisions as structural units, so that a database with a legal provision graph structure is constructed. Then, the document number, the applicant to be applied, the case address, whether the case is delayed, the citation of the law, the administrative management type, the application review item, the case result, the cause of the complaint, the legal provision and other entities described in the above embodiment are used as nodes, and the incidence relation between various entities in the document and the citation of the law are used as relations, so that the judicial literature knowledge graph is constructed. And providing legal knowledge maps, intelligent law recommendation and the like for relevant judicial departments and lawyers.
In the embodiment, the judicial literature knowledge graph constructed in an artificial intelligence mode effectively and accurately excavates the association relationship, the law citation relationship and the similar and co-occurrence information of a plurality of judicial cases and citation articles which need excessive manual intervention in the past, so that the analysis speed and the analysis accuracy of judicial texts are greatly improved. After the judicial literature knowledge graph is generated, the judicial workers can search for similar cases through searching for keywords, so that citation laws, trial and administration bases and the like of previous judicial cases can be found, and reference is provided for the current cases. The law recommendation can help a new person with less proficiency in business to quickly apply laws according to the fact of the relevant case, and meanwhile, the law can be audited to judge whether the law quoted by manual case handling is correct or not. The judicial workers can also automatically comprehensively analyze materials such as work experience, case evaluation and the like and major difficult cases and document files to generate work key focuses and auxiliary decisions.
In one embodiment, the method for processing a judicial essay may further include: similarity calculation is carried out on a target irregular paragraph in a target judicial essay corresponding to a target classification system and an irregular paragraph in a preset judicial essay library corresponding to the target classification system, and the similarity between the target irregular paragraph and the irregular paragraph in the judicial essay library corresponding to the target classification system is obtained; the preset forensic script library comprises a plurality of forensic scripts from which the second entity information is extracted; and taking the classification category of the irregular section corresponding to the maximum similarity in the target classification system as second entity information of the target judicial literature.
In this embodiment, after the entity library of the judicial literature is constructed, in the case that a new judicial literature needs to be structurally represented, for irregular paragraph information, in some cases, since it is difficult to distinguish the category corresponding to the judicial literature under some classification systems through the matching degree between the judicial literature and the classification system, if there is a small difference between the top two matching degrees or the maximum matching degree is also small, the method can be obtained by performing corresponding similarity calculation in a manner of performing similarity matching with the paragraph corresponding to the text that has been structurally represented. Specifically, the administrative management category of the target judicial literature needs to be determined, similarity matching may be performed on the applicant information paragraphs in the target judicial literature and each of the judicial literatures which are already structurally represented, and then the administrative management category of the judicial literature corresponding to the maximum similarity value may be used as the administrative management category of the target judicial literature.
In one embodiment, the processing method of the judicial writing may further include: extracting first entity information of a first content subject matter represented by the target judicial essay; and adding a result obtained by associating the first content subject of the target judicial writing with the first entity information into the judicial writing knowledge graph, and adding a result obtained by associating the target judicial writing with the second entity information corresponding to the classification system according to the classification system into the judicial writing knowledge graph.
In the embodiment, in order to enrich the content information of the judicial literature knowledge graph, the result of the structured representation of the target judicial literature can be added to the judicial literature knowledge graph. Thus, in later judicial practice, the judicial worker can directly find the case through the knowledge graph. Moreover, as time goes on, the structured expression of the judicial literature contents is increased, the similarity matching model can be continuously trained and iteratively updated, and the accuracy in the classification of the subsequent judicial text corresponding system can be improved to a certain extent.
Referring to fig. 2, an embodiment of the present application further provides a processing apparatus for a judicial literature, including:
and the paragraph labeling unit is used for labeling the contents of each obtained paragraph of the judicial essay to obtain a first content subject matter for representing the regular paragraph of the judicial essay and a second content subject matter for representing the irregular paragraph of the judicial essay.
And the first entity information extraction unit is used for performing regularization representation on the regular paragraphs of the first content theme representation to obtain first entity information corresponding to the first content theme.
And a second entity information extraction unit, configured to input the irregular paragraphs represented by the second content theme into a text classification model of a classification system corresponding to the second content theme, so as to obtain second entity information corresponding to the second content theme.
And the structural representation unit is used for associating a first content theme with first entity information corresponding to the first content theme and associating a classification system with second entity information corresponding to the classification system to obtain a structural representation result of the judicial essay.
The specific functions and effects realized by the processing device of the judicial literature can be explained by referring to other embodiments in the specification, and are not described herein again. The various modules in the judicial literature processing device may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
Referring to fig. 3, an embodiment of the present application further provides an electronic device, where the electronic device includes a processor and a memory, and the memory is used for storing a computer program, and when the computer program is executed by the processor, the method for processing the judicial documents is implemented.
The processor may be a Central Processing Unit (CPU). The Processor may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or a combination thereof.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods of the embodiments of the present invention. The processor executes various functional applications and data processing of the processor by executing non-transitory software programs, instructions and modules stored in the memory, that is, the method in the above method embodiment is realized.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor, and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be coupled to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present application further provides a computer-readable storage medium for storing a computer program, which when executed by a processor, implements the processing method of the judicial essay described above.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include processes of the embodiments of the methods. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The various embodiments of the present disclosure are described in a progressive manner. The different embodiments focus on the different parts described compared to the other embodiments. After reading this specification, one skilled in the art can appreciate that many embodiments and many features disclosed in the embodiments can be combined in many different ways, and for the sake of brevity, all possible combinations of features in the embodiments are not described. However, as long as there is no contradiction between combinations of these technical features, the scope of the present specification should be considered as being described.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.
In the present specification, the embodiments are mainly intended to emphasize different portions from other embodiments, and the embodiments can be explained with reference to each other. Any combination of the embodiments in this specification based on general technical common knowledge by those skilled in the art is encompassed in the disclosure of the specification.
The above description is only an embodiment of the present disclosure, and is not intended to limit the scope of the claims of the present disclosure. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the scope of the claims of the present application.

Claims (12)

1. A method of processing a judicial essay, the method comprising:
marking the content of each section of the obtained judicial essay to obtain a first content subject matter representing the regular section of the judicial essay and a second content subject matter representing the irregular section of the judicial essay;
regularizing and expressing the regular paragraphs of the first content theme representation to obtain first entity information corresponding to the first content theme;
inputting the irregular paragraphs represented by the second content subject matter into a text classification model of a classification system corresponding to the second content subject matter to obtain second entity information corresponding to the second content subject matter;
and associating a first content subject with first entity information corresponding to the first content subject, and associating a classification system with second entity information corresponding to the classification system to obtain a structural representation result of the judicial writing.
2. The method of claim 1, further comprising:
carrying out segmentation processing on the obtained judicial documents, and removing blank lines and illegal characters in the judicial documents to obtain target judicial documents;
correspondingly, marking the content of each paragraph of the target judicial essay.
3. The method of claim 1, wherein regularizing the regular segments of the first content subject matter representation to obtain first entity information corresponding to the first content subject matter comprises:
inputting the regular paragraphs for the first content subject matter representation into a context description labeling-based deep learning model to obtain regular expressions for the regular paragraphs for the first content subject matter representation;
extracting first entity information corresponding to the first content subject matter based on the regular expression.
4. The method of claim 1, wherein inputting the irregular passage of the second content subject matter representation into a text classification model of a classification system corresponding to the second content subject matter, and obtaining second entity information corresponding to the second content subject matter comprises:
determining a target irregular paragraph corresponding to a preset classification system;
and inputting the target irregular paragraphs into a text classification model corresponding to the preset classification system to obtain second entity information corresponding to the preset classification system.
5. The method of claim 4, wherein inputting the target irregular passage into a text classification model corresponding to the preset classification system, and obtaining second entity information corresponding to the preset classification system comprises:
performing word segmentation processing on the target irregular sections to obtain a plurality of target words;
generating a plurality of target word vectors for a plurality of the target words, respectively;
matching the target word vectors with a plurality of classification categories included in the preset classification system to obtain matching degrees between the target irregular paragraphs and the classification categories;
and taking the classification category corresponding to the maximum matching degree as second entity information corresponding to the preset classification system.
6. The method of claim 1, the second content theme comprising case fact information, the method further comprising:
inputting the irregular paragraphs represented by the case situation fact information into a named entity recognition model, and recognizing case addresses in the irregular paragraphs to obtain third entity information;
and taking the result obtained after the association of the case address and the third entity information as the structural representation result of the judicial literature.
7. The method according to any one of claims 1-6, further comprising:
regularization processing is carried out on the obtained legal document, and the regularization processing result is stored according to the form of a graph structure, so that a legal provision knowledge base is generated;
and constructing a judicial literature knowledge graph based on the legal provisions in the legal provisions knowledge base, the entities in the judicial literature and the relationship among the entities in the judicial literature.
8. The method of claim 7, further comprising:
similarity calculation is carried out on a target irregular paragraph corresponding to a target classification system in a target judicial essay and an irregular paragraph corresponding to the target classification system in a preset judicial essay library, so that the similarity between the target irregular paragraph and the irregular paragraph corresponding to the target classification system in the judicial essay library is obtained; the preset forensic script library comprises a plurality of forensic scripts from which the second entity information is extracted;
and taking the classification category of the irregular section corresponding to the maximum similarity in the target classification system as second entity information of the target judicial literature.
9. The method of claim 8, further comprising:
extracting first entity information of a first content subject matter represented by the target judicial essay;
and adding a result obtained by associating the first content subject of the target judicial writing with the first entity information into the judicial writing knowledge graph, and adding a result obtained by associating the target judicial writing with the second entity information corresponding to the classification system according to the classification system into the judicial writing knowledge graph.
10. A judicial literature processing device, characterized in that it comprises:
the paragraph marking unit is used for marking the contents of each obtained paragraph of the judicial literature to obtain a first content subject matter representing the regular paragraph of the judicial literature and a second content subject matter representing the irregular paragraph of the judicial literature;
a first entity information extraction unit, configured to perform regularization representation on a regular paragraph represented by the first content subject matter to obtain first entity information corresponding to the first content subject matter;
a second entity information extraction unit, configured to input the irregular paragraphs represented by the second content subject to a text classification model of a classification system corresponding to the second content subject, so as to obtain second entity information corresponding to the second content subject;
and the structural representation unit is used for associating a first content theme with first entity information corresponding to the first content theme and associating a classification system with second entity information corresponding to the classification system to obtain a structural representation result of the judicial essay.
11. An electronic device, characterized in that the electronic device apparatus comprises a processor and a memory for storing a computer program which, when executed by the processor, implements the method of any of claims 1 to 9.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program which, when executed by a processor, implements the method of any one of claims 1 to 9.
CN202211704216.7A 2022-12-29 2022-12-29 Method and device for processing judicial documents, electronic equipment and storage medium Pending CN115936932A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211704216.7A CN115936932A (en) 2022-12-29 2022-12-29 Method and device for processing judicial documents, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211704216.7A CN115936932A (en) 2022-12-29 2022-12-29 Method and device for processing judicial documents, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115936932A true CN115936932A (en) 2023-04-07

Family

ID=86648881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211704216.7A Pending CN115936932A (en) 2022-12-29 2022-12-29 Method and device for processing judicial documents, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115936932A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116629258A (en) * 2023-07-24 2023-08-22 北明成功软件(山东)有限公司 Structured analysis method and system for judicial document based on complex information item data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116629258A (en) * 2023-07-24 2023-08-22 北明成功软件(山东)有限公司 Structured analysis method and system for judicial document based on complex information item data
CN116629258B (en) * 2023-07-24 2023-10-13 北明成功软件(山东)有限公司 Structured analysis method and system for judicial document based on complex information item data

Similar Documents

Publication Publication Date Title
US20200250139A1 (en) Methods, personal data analysis system for sensitive personal information detection, linking and purposes of personal data usage prediction
CA3098802C (en) Systems and methods for generating a contextually and conversationally correct response to a query
Abdullah et al. Fake news classification bimodal using convolutional neural network and long short-term memory
CN112749284B (en) Knowledge graph construction method, device, equipment and storage medium
CN110674274A (en) Knowledge graph construction method for food safety regulation question-answering system
CN112036842B (en) Intelligent matching device for scientific and technological service
CN112818093A (en) Evidence document retrieval method, system and storage medium based on semantic matching
CN110880142B (en) Risk entity acquisition method and device
CN113486158B (en) Case situation comparison-based case retrieval method, device, equipment and storage medium
CN109492097B (en) Enterprise news data risk classification method
CN113377916B (en) Extraction method of main relations in multiple relations facing legal text
CN111709225B (en) Event causal relationship discriminating method, device and computer readable storage medium
CN115936932A (en) Method and device for processing judicial documents, electronic equipment and storage medium
CN113971210B (en) Data dictionary generation method and device, electronic equipment and storage medium
CN116720515A (en) Sensitive word auditing method based on large language model, storage medium and electronic equipment
Jagdish et al. Identification of end-user economical relationship graph using lightweight blockchain-based BERT model
Panchenko et al. Detection of child sexual abuse media on p2p networks: Normalization and classification of associated filenames
WO2020065970A1 (en) Learning system, learning method, and program
CN111191413B (en) Method, device and system for automatically marking event core content based on graph sequencing model
Pinquié et al. Requirement mining for model-based product design
CN115618085A (en) Interface data exposure detection method based on dynamic label
WO2018058223A1 (en) Legal cognition method
CN113220843A (en) Method, device, storage medium and equipment for determining information association relation
Plachouras et al. Information extraction of regulatory enforcement actions: From anti-money laundering compliance to countering terrorism finance
CN110928985A (en) Scientific and technological project duplicate checking method for automatically extracting near-meaning words based on deep learning algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination