CN117453856A

CN117453856A - Method and device for extracting calendar and examination case series based on multi-source data fusion

Info

Publication number: CN117453856A
Application number: CN202311361252.2A
Authority: CN
Inventors: 聂欣慧; 辛国忠; 徐昊天; 张瑞; 许远航; 蔡志新
Original assignee: China Judicial Big Data Research Institute Co ltd
Current assignee: China Judicial Big Data Research Institute Co ltd
Priority date: 2023-10-19
Filing date: 2023-10-19
Publication date: 2024-01-26
Anticipated expiration: 2043-10-19
Also published as: CN117453856B

Abstract

The invention discloses a method and a device for extracting calendar and examination cases in series based on multi-source data fusion. The method comprises the following steps: 1) Acquiring cases from a plurality of data sources, extracting set core element information of the cases, obtaining structured case data of each case, and generating a case standard library according to the structured case data of each case; 2) Based on the relation between the case number and the previous examination case number stored in each case record in the case standard library, generating a case series tree of the calendar examination case; 3) Generating a visual field of each case according to the case serial tree and the service requirement, and checking case information of calendar in the corresponding case life cycle; then, the information of the calendar and the examination case is stored in a database and index items required by the service are established; 4) When a user's query is received, the queried case and its associated calendar review case are returned according to the index item. The method solves the problem of many-to-one case nodes in the previous examination and the subsequent examination, and is suitable for actual litigation business.

Description

Method and device for extracting calendar and examination case series based on multi-source data fusion

Technical Field

The invention belongs to the technical field of computer software, and relates to a method and a device for extracting calendar and examination cases in series based on multi-source data fusion.

Background

The "case" refers to the contradiction and dispute which are solved by the principal submitted by the national court, including civil affairs, criminals, administration and the like. "part" refers to the trial execution procedure that the party submitted to solve the "case" experienced after entering the national court. Including criminal, civil, administrative, judicial reimbursement for a review and subsequent cases derived therefrom. Since the cases and the derived cases thereof are possibly managed by different levels of courts, the judicial data in the existing business system is difficult to directly find out all the derived cases of a case from hundreds of millions of case information, and the correlation relationship between the previous trial and the subsequent trial of the cases cannot be clearly managed.

In application scenes such as complaint financial services, case portraits, trial quality evaluation and the like, litigation flows of all the derived cases of a certain case and the trial relation among the cases need to be searched or displayed, so that the method has urgent and significant extraction of the subsequent requirements of the case before trial.

The invention provides a method and a device for series connection of calendar and examination cases based on multi-source data fusion, which realize the device for optimizing storage and quick inquiry of case series connection results and meet the actual application requirements.

Patent publication number CN115730561a, "case association method, apparatus, computing device, and computer storage medium" discloses a case association method, apparatus, computing device, and computer storage medium. The method comprises the following steps: acquiring legal document data, dividing the legal document data into a plurality of domain block data, wherein the domain block comprises: a case number domain block and an approval process domain block; extracting a case number corresponding to legal document data from case number domain block data, and extracting a candidate case number corresponding to legal document data from aesthetic process domain block data; for any candidate case number, acquiring the position of the candidate case number in the block data of the examination process domain; screening out the calendar and examination case number corresponding to the legal document data from the candidate case number according to the position of the candidate case number in the examination process domain block data; and establishing an association relation between the case number and the calendar examination number.

The method is limited to an approval process block appearing in the case number, and if the approval process block is inaccurate, the association relationship between the case number and the history case number is inaccurate, so that the method cannot be applied to practical application to a great extent.

Patent disclosure of an authorized bulletin number CN110209760B, a method and a device for associating a calendar and a case, an electronic device, and a computer readable medium, provides a method for associating a calendar and a case, the method comprises: acquiring a case document, wherein the case number of a case corresponding to the case document is taken as the current case number; inquiring in the case document according to a preset case number template to obtain at least one candidate case number; determining at least one candidate case number as a front examination case number according to the related text of each candidate case number, wherein the related text of any candidate case number comprises a text with a preset length before the candidate case number and/or a text with a preset length after the candidate case number; and associating the current case number of the case document with the front examination case number to obtain a case number group and adding the case number group into a case number database.

The method uses text of a specified predetermined length to extract candidate case numbers, the case numbers that appear in the text are considered to be the case's pre-review case numbers, but the case numbers in the text may be associated case numbers, rather than the present case's pre-review. For example, reference is made in this document to "criminal B working on a case number: XXXXXX No. XXX", which would take "XXXX No. XXX" as the pre-review case number, resulting in an association error.

Patent publication number CN114036170a "associated case acquisition method, apparatus, device, storage medium, and program product" applies for an associated case acquisition method, apparatus, device, storage medium, and program product. The method comprises the following steps: acquiring change information of related data, wherein the change information carries a case identifier; analyzing the change information to update detail data corresponding to the case identifier; constructing a new case relation according to the detail data; acquiring a current case number group corresponding to the case identifier according to the new case relation; acquiring a history case number group corresponding to the case identifier; when the current case number set is different from the history case number set, deleting the history case number set and newly adding the current case number set.

The method can provide an associated case acquisition method capable of accelerating the changing efficiency of the case number group. However, the present invention emphasizes the efficiency approach of changing the set of numbers, unlike the solution of the present invention.

Patent publication number CN112948571a, "a method and apparatus for associating a calendar and a case, an electronic device, and a computer readable medium based on a referee document," relates to a method and apparatus for associating a calendar and a case, an electronic device, and a computer readable medium based on a referee document. The method comprises the following steps: acquiring a target judge document, and taking a case number of the target judge document as a current case number; inquiring all case numbers except the case number of the target referee document in the target referee document according to a preset case number template, and taking the case numbers as candidate case numbers after duplicate removal; screening each candidate case number according to preset screening conditions, and determining the candidate case number meeting the screening conditions as a front examination case number to obtain a group of association relations between the current case number and the front examination case number; acquiring a referee document corresponding to a front examination case number as a target referee document, taking the front examination case number as a current case number, and repeating the steps until no candidate case number or front examination case number is found; and combining the current case number and the preamble case number which are sequentially associated and returning the combined current case number and the preamble case number as an association result.

The invention is similar to the patent publication No. CN110209760B, and is based on a judge document, and the case association relation is extracted iteratively according to a preset case number template, so that whether the accuracy can reach an application level or not can not be estimated temporarily.

In summary, the existing technology at present is mainly based on a single data source, all the judge documents are used for extracting the calendar examination number information, the judge documents are in unstructured data form, the quality of the subsequent serial results depends on the accuracy of analyzing the judge documents, and the accuracy coverage rate of case correlation analysis performed on the basis cannot reach the application level. And a plurality of candidate case numbers can be extracted from the document, but the relation between one candidate case number group and the case number is extracted from the complaint patent, and no storage and visualization method of a case chain is mentioned.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to provide a calendar and examination case series extraction method and device based on multi-source data fusion. The invention not only uses the judge document to extract the case number information, but also uses various data sources to carry out data fusion on the case information, the method can correct and fill the front examination case number information of the case and the core elements of the case, and on the basis, a case series tree generation method is provided, the problem that the nodes of the case which are examined before and after are many to one is solved, so as to be suitable for actual litigation business; and provides a device capable of inquiring the case and the associated case of the case.

The method solves the problems of incomplete case serial information and nonstandard case serial information and the problem of generating the serial result of the calendar and examination cases. The invention also provides a calendar examination case indexing device.

The technical scheme of the invention is as follows:

a method for extracting calendar and examination cases in series based on multi-source data fusion comprises the following steps:

1) Acquiring cases from a plurality of data sources, extracting set core element information of the cases, obtaining structured case data of each case, and generating a case standard library according to the structured case data of each case; the set core element information comprises a case number, a front examination case number, a case date and a case date;

2) Based on the relation between the case number and the previous examination case number stored in each case record in the case standard library, generating a case series tree of the calendar examination case;

3) Generating a visual field of each case according to the case serial tree and the service requirement, and checking case information of calendar in the corresponding case life cycle; then, the information of the calendar and the examination case is stored in a database and index items required by the service are established;

4) When a user's query is received, the queried case and its associated calendar review case are returned according to the index item.

Further, the method for generating the case standard library comprises the following steps:

11 Extracting set core element information of the case according to the metadata information of each type of the data sources;

12 Generating a data structure of a standard library by table naming, field naming and data type required by law;

13 Establishing a mapping relation between table fields in the data structure of each type of the data source and table fields in the data structure of the standard library;

14 Correcting and removing nonstandard data in the extracted set core element information according to the constructed data cleaning rule base;

15 Filling the cleaned data in the step 14) into the standard library according to the mapping relation; and searching candidate cases for the cases with data loss according to the case numbers and the institutional laws, and supplementing the cases with data loss to generate the case standard library.

Further, the method for generating the case serial tree comprises the following steps:

21 Generating a front review follow-up relation of each case type according to a litigation flow of judicial case trial, and forming a case series rule base corresponding to each case type;

22 Aiming at each case type, acquiring the case sub type, the case number and the front examination case number of the case type from the case standard library, firstly connecting the case examination cases of the case type examination stage according to the case serial rule library of the case type, and then connecting the execution cases of the case type execution stage in series; and then, the last node of the approval stage is connected with the first node of the execution stage in series, and a case series tree of the case type is generated according to the series information of the nodes.

Further, the case series tree of each case type is generated by using the case tree generating function, and the method comprises the following steps: acquiring a pair of nodes which are related in pairs according to the serial information of the nodes corresponding to the same case type, generating a link information according to each pair of nodes, and marking the link information as c link; and then starting from the head node, generating an array according to a plurality of c_links corresponding to each node, generating a case series tree of the case type by using left and right tree production logic, and returning a map array containing information of all nodes, wherein the information of each node comprises case IDs of case numbers, case IDs of front examination case numbers, left and right node marking information of the nodes and the hierarchy to which the nodes belong.

Further, the calendar case information includes: the system comprises a case id, a case serial head node id, a case subtype, a case previous examination id, a case subsequent id, case tree information, a case progress stage, a manager, a case setting date, a case setting mode and a case setting mode.

Further, the visual field comprises case acceptance information and examination and conclusion information; the acceptance information comprises a manager court, a case setting date, a case number, a front examination number, a sponsor and a case setting result, and the examination information comprises an examination state, a case setting result, a case setting amount, a case setting mode and a case setting date.

Further, the visual field further comprises a case other information field and a record information field, wherein the case other information comprises whether to disclose, open the examination, deduct the examination days, apply for prolonging the examination days and send back the review reason; the recording information includes recording time, recording status, and partition.

Further, the data source comprises a review case, a judge document structured after analysis, case information for executing disclosure and case information for bankruptcy and reformation.

Further, the database is an elastsearch database.

The extraction device for the calendar and examination case series connection based on the multi-source data fusion is characterized by comprising a data source fusion module, a case series tree generation module and a database index module;

the data source fusion module is used for acquiring cases from a plurality of data sources and extracting set core element information of the cases to obtain structured case data of each case, and generating a case standard library according to the structured case data of each case; the set core element information comprises a case number, a front examination case number, a case date and a case date;

the case serial tree generation module is used for iteratively searching for the relation between cases based on the case number and the front examination case number stored in each case record in the case standard library to generate a case serial tree of the case of the examination and the history;

the database index module is used for generating a visual field of each case and calendar and examination case information in the corresponding case life cycle according to the case serial tree and the service requirement; the calendar and case information is then stored in a database and index items required by the business are created.

The invention has the following advantages:

the invention provides a method for carrying out data fusion on the data of a case, legal referee document data, execution disclosure data and bankruptcy reforming data by taking the case as a theme, which can improve the data quality of the case and better provide basic data for the case series connection; secondly, providing a generation method of the series relation of the calendar and the examination cases, wherein the method can realize incremental updating and store the association relation of the calendar and the examination cases; the service device for inquiring the case serial results provides data service capability for multiple application scenes such as financial service, case portrait and the like.

Firstly, a data fusion method of multi-source data judicial case information is provided. The key information of the same case is mutually verified, supplemented and fused through the information of multiple data sources, so as to provide a basic work for preparing the extraction of the follow-up examination case.

Secondly, a generation method of a case series tree of the calendar examination cases is provided. Based on the case data after the multi-source data fusion, the relation between the cases is iteratively searched according to the case number and the front examination case number stored in the case records, a left-right tree storage mode is adopted, the derivative relation of the cases is expanded into the form of a case series tree, and a user can conveniently know the litigation process of the whole case and the condition of each case node.

Finally, a service device for inquiring the case association results is provided. Based on the case series tree, interface service of case inquiry is formed, and case inquiry service is provided for the complaint finance.

Drawings

Fig. 1 is a method and a device for extracting case front-end review and subsequent serial connection based on multi-source data fusion.

Fig. 2 is a data fusion method of judicial case information of multi-source data.

Fig. 3 is a method of generating a case tandem tree.

Fig. 4 is a logic diagram of a case concatenation tree.

Fig. 5 is a service device for case related result query.

Detailed Description

The invention will now be described in further detail with reference to the accompanying drawings, which are given by way of illustration only and are not intended to limit the scope of the invention.

The invention relates to a case pre-examination subsequent series extraction method and device based on multi-source data fusion, wherein the generation flow of the method and device is shown in figure 1, and the method and device mainly comprise the following three aspects:

1. as shown in fig. 2, a data fusion method of multi-source judicial case information.

The data fusion method of the multi-source judicial case information comprises the following processing flows:

first, four types of case related information are collected from a service system, including: the national court gathers high-law examination case information, hereinafter referred to as law mark data, structured data after the analysis of the judge document, performs public information, and bankruptcy reforming information.

Secondly, data cleaning and standardization are carried out on the collected data, and a case standard library is constructed. Because the aggregate data source structure varies, for example: the structured data after the judge document is analyzed has a plurality of information related to the case and related to the principal, the public data is executed to the dimension of the principal (such as a person with a belief losing executor, a person with a limited height executor, etc.) and the field names are greatly different. The case standard library of four types of data sources is respectively formed by cleaning and structure standardization of case core information fields such as case number, front examination case number, case by, case date and the like. The cleaning standardization process comprises the following steps: 1. and analyzing the core service data, summarizing the service system data which needs to be input into the standard library, and identifying the core field. 2. A data structure of a standard library is generated. 3. And (3) defining the mapping relation between the source data structure and the standard library table field. 4. Analyzing the data problem and the current situation of the data in the existing data source field, and formulating and combing to form a cleaning rule base. 5. And according to the cleaning rule, four types of data sources are standardized to form four types of standard libraries of case dimensions. The following describes each implementation step in detail:

step 1, analyzing a data source and identifying a core field: the fields required by the standard library are extracted according to the information required by the core service through metadata information (metadata, data describing the data, field names describing the data, data types, service meanings and the like) of four types of data sources. In the four kinds of data source business data, extracting element fields of the cases according to different case types (jurisdictional cases, criminal cases, civil cases, administrative cases, national reimbursement and judicial rescue cases, regional judicial assistance cases, international judicial assistance cases, judicial sanctions, non-complaint security inspection cases, execution cases, forced clearing and bankruptcy cases 11 categories), including: case acceptance information (court, date of filing, case number, number of previous trial, contractor, list of filing, etc.), examination information (examination state, list of filing, sum of filing, mode of filing, date of filing), other information of case (whether examination is open, number of days of filing deduction, number of days of filing extension, number of review matters sent back, etc.), recording information (recording time, recording state, partition, etc.).

And 2, constructing a standard library and generating a data structure of the standard library. Because the legal standard data is the standard specification data of the convergent data with the highest legal requirement, four types of data sources are named by a table, a field, a data type and other standard specifications with the legal standard requirement, and a data structure of a standard library is generated according to the extracted core field.

And 3, defining the mapping relation between the source data structure and the standard library table field. Because the data structures, field names, etc. of the four types of data sources all have differences, a mapping relationship between the source and the target needs to be established. The description of the same business meaning information is that fields used in different data sources are different, and fields of different case types in the same data source are also different; for four types of data sources, 11 major cases are respectively defined with the mapping relation between each field and the target field. A metadata mapping information table is constructed and generated. The previous examination number of the metadata mapping table is a sample:

in the table, the data source type 1 represents legal standard data 2 represents execution public data 3 represents referee document data, and 4 represents bankruptcy data.

And 4, analyzing the problems of the existing data, constructing a data cleaning rule base, and correcting and eliminating the nonstandard data. Correcting cases of irregular filling of the case number, such as case cases of bracket cases or special characters, namely establishing a case number standardized cleaning rule; aiming at the problems of illegal codes of fields, such as illegal codes, inconsistent codes and data standards, etc., the problems of illegal values, such as value taking errors, format errors, redundant characters, messy codes, etc., are checked and corrected; some of the contents with characters not existing in the contents may only comprise a part of characters, such as a head, a tail and a space in the middle, and special symbols and messy codes may also appear in the names of parties, etc. the problems of digital symbols, chinese characters in the identification numbers, etc. exist.

And 5, generating a standard library. And producing the standard library according to the mapping relation between the source data structure and the standard library table field after cleaning.

And finally, carrying out case information correction and completion on the same case and different data sources. And (3) specifying that two fields of a case number and a institutional court are uniquely determined, if the case number of different data sources is the same as the institutional court, considering the case number as different data sources of the same case, and correcting and filling a standard library of the data of the examination case by using other data sources according to the constructed correction and filling rules by taking the examination case as a reference. Namely, searching candidate cases for the cases with data loss according to the case numbers and the institutional laws, and supplementing the cases with data loss; for example, if the front examination number in the examination file information is missing and the information of the same file data source of the case has the front examination number, the front examination information field of the case is filled with the case number in the file, and the filling rule of the field is recorded, so that the follow-up information is convenient to trace back, and a data fusion base taking the case information as an analysis main body is formed.

2. As shown in fig. 3, a method for generating a case serial tree for viewing cases.

The method for generating the case serial number of the calendar examination cases comprises the following steps:

firstly, generating a front review follow-up relation of each case type according to a litigation flow of judicial case trial to form a case series rule base corresponding to each case type, combing the front review follow-up relation of each case type of 7 case types to form the case series rule base, wherein the case series rule base is a basis for generating a review case tree, and accurately connecting the relation of each trial case node of the litigation flow on the premise of the business rule. The case series rule mainly carries out carding on the aesthetic relation of 7 types of cases, and comprises the following steps: civil cases, criminal cases, administrative compensation cases among national compensation and judicial help cases, non-prosecution security inspection cases, forced clearing and bankruptcy cases, execution cases, etc.

The case tandem rule base is specifically as follows (partial cases):

and secondly, acquiring information such as case sub-types, case numbers, front examination case numbers and the like from a case fusion library. According to the case association rule, firstly, carrying out the case study and examination case series connection of the cases in the examination stage, and then carrying out the execution case series connection of the execution stage; and then, connecting the last node of the approval stage with the first node of the execution stage in series to form a result of connecting two computing cases in series, and storing the series information into a plurality of temporary tables. The specific implementation is taking civil cases in series as an example: the method comprises the steps of firstly taking out a folk first-trial case from a standard library and storing the folk first-trial case as a head node into a temporary table-node 1 table, extracting the folk second-trial case according to a rule 1, storing head node information and the next second node information of the folk second-trial case into the temporary table-node 2 table, associating the result obtained by re-trial and re-trial case with folk second-trial in the node2 table, storing the head node and associated information into the temporary table-node 3 table, and the like to obtain a plurality of node information associated with each other. And finally, transmitting a plurality of previous and subsequent relations of the head nodes of the cases of the same type to a case tree generating function by taking the previous and subsequent relations as input parameters, giving out left and right node marking information of the case series tree, and storing the left and right node marking information into a case series information table. Case tree generation functions are self-developed.

The data processing steps of the case tree generating function are as follows: 1. generating link information by using node information (current node information and previous examination node information) related to each other, and marking the link information as c link;2. and taking a plurality of associated c_link production arrays of the same head node as an input parameter to be input into the function. The 3 function generates a case series tree by using left and right tree production logic according to input parameters, and returns a map array of all node information, wherein the information of each node comprises case ID of case number, case ID of front examination case number, left subscript (left value) and right subscript (right value) of the node and the hierarchy of the case tree to which the node belongs. The logical representation of the case concatenation tree is shown in fig. 4.

Fig. 4 is a case series tree generated by the case and all cases derived from the case, which is an 8-layer case relation tree, in which an elliptical node represents a case, and is described by a case number (the case number has been desensitized by removing the court code), and the arrow connection is the subsequent case of the node. Node "(2017) civil terminal 1600" has two subsequent case nodes. The rectangle indicates the left and right subscripts of the node. The meaning of the left and right subscripts is: 1. the node comprises left value and right value fields as the range of the child node; 2. the left value of the node is smaller than the left value of all the child nodes, and the right value is larger than the right value of all the child nodes. The case tree generated by the method solves the problem that in the prior art, only a case single chain is generated, and 1 node cannot be generated for a plurality of subsequent nodes; the case series tree is free from hierarchical limitation, high in query efficiency and low in modification efficiency.

3. As shown in FIG. 5, a case series result inquiry service device

Firstly, processing a case fusion library according to a case series tree to generate key information of each case and calendar and examination case information in the case life cycle. The calendar and case-examining piece information comprises: the system comprises a case id, a case serial head node id, a case subtype, a case previous examination id, a case subsequent id, case tree information, a case progress stage, a manager, a case setting date, a case setting mode and the like. The key information is a hook for hooking business demands and is used for providing visual fields, including case acceptance information, examination and conclusion information and the like; the case acceptance information comprises a manager court, a case setting date, a case number, a front examination number, a contractor, a case setting result and the like, and the examination information comprises an examination state, a case setting result, a case setting amount, a case setting mode and a case setting date; the other information of the case comprises whether to disclose, open the examination, deduct the examination limit days, apply for prolonging the examination limit days, send back to the re-examination affair, etc.; the recording information includes recording time, recording status, partition, and the like.

Then, the calendar case-examining information is stored in an elastsearch library, index items required by the business are established, query service is provided for the user, data are packaged into interface service, and the interface service is applied to services such as complaint finance, user image and the like on line.

The case serial connection method based on the fusion data has the accuracy rate of more than 95%, and can cover one subsequent case of a plurality of cases.

The method processes about 3 hundred million data in an actual application scene, generates a case tree for about half an hour, and meets the actual application requirement.

The invention is mainly applied to scenes such as complaint financial services, and the like, meets the requirements of users, provides business support and brings benefits to companies.

Although specific embodiments of the invention have been disclosed for illustrative purposes, it will be appreciated by those skilled in the art that the invention may be implemented with the help of a variety of examples: various alternatives, variations and modifications are possible without departing from the spirit and scope of the invention and the appended claims. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will have the scope indicated by the scope of the appended claims.

Claims

1. A method for extracting calendar and examination cases in series based on multi-source data fusion comprises the following steps:

2. The method of claim 1, wherein the method of generating the case criteria library is:

3. The method of claim 1, wherein the method of generating the case tandem tree is:

4. A method according to claim 3, wherein the case series tree for each case type is generated using a case tree generation function by: acquiring a pair of nodes which are related in pairs according to the serial information of the nodes corresponding to the same case type, generating a link information according to each pair of nodes, and marking the link information as c link; and then starting from the head node, generating an array according to a plurality of c_links corresponding to each node, generating a case series tree of the case type by using left and right tree production logic, and returning a map array containing information of all nodes, wherein the information of each node comprises case IDs of case numbers, case IDs of front examination case numbers, left and right node marking information of the nodes and the hierarchy to which the nodes belong.

5. A method according to claim 1 or 2 or 3, wherein the calendar trial information comprises: the system comprises a case id, a case serial head node id, a case subtype, a case previous examination id, a case subsequent id, case tree information, a case progress stage, a manager, a case setting date, a case setting mode and a case setting mode.

6. A method according to claim 1 or 2 or 3, wherein the visualization field comprises case acceptance information and review information; the acceptance information comprises a manager court, a case setting date, a case number, a front examination number, a sponsor and a case setting result, and the examination information comprises an examination state, a case setting result, a case setting amount, a case setting mode and a case setting date.

7. The method of claim 6, wherein the visualization field further comprises a case other information field and a record information field, the case other information including whether to disclose, whether to open an examination, deduct a number of examination days, apply for a number of extended examination days, send back a review reason; the recording information includes recording time, recording status, and partition.

8. The method of claim 1, wherein the data sources include a review case, a post-parsing structured referee document, a case information to perform overt case information and bankruptcy reformulation.

9. The method of claim 1, wherein the database is an elastisearch library.

10. The extraction device for the calendar and examination case series connection based on the multi-source data fusion is characterized by comprising a data source fusion module, a case series tree generation module and a database index module;