CN117407491A

CN117407491A - Intelligent pre-judging method and system for digital case treatment

Info

Publication number: CN117407491A
Application number: CN202311478370.1A
Authority: CN
Inventors: 林蓥; 胡玉梅; 高茜; 桂瑶; 罗双; 丘嘉苑; 周子健; 王建永; 陈颖璇
Original assignee: Guangdong Power Grid Co Ltd
Current assignee: Guangdong Power Grid Co Ltd
Priority date: 2023-11-07
Filing date: 2023-11-07
Publication date: 2024-01-16

Abstract

The invention provides an intelligent pre-judging method and system for digital case treatment, wherein the method comprises the following steps: acquiring legal case data based on a web page crawling algorithm, and constructing a legal knowledge base; acquiring text classification according to the input text, acquiring case categories corresponding to the input text based on a controllable tensor decomposition algorithm, and acquiring case data corresponding to classification results based on the text classification and the case categories; and searching in a legal knowledge base based on the case data, and constructing a case knowledge graph to realize analysis and prognosis of the case. According to the intelligent case pre-judging method for the digital method provided by the invention, the categories of the cases are obtained based on the controllable tensor decomposition algorithm and are retrieved from the constructed method treatment knowledge base, and the case knowledge graph is constructed to realize analysis and pre-judgment of the cases, so that reference analysis of case treatment is provided for users, and the working efficiency is greatly improved.

Description

Intelligent pre-judging method and system for digital case treatment

Technical Field

The invention relates to the technical field of big data processing, in particular to an intelligent pre-judging method and system for digital case treatment.

Background

At present, the traditional common law forms comprise release and propagation of legal information such as legal news and legal regulations, and people want to know the information and search and view the information mainly by going to each common law platform; and when the law enforcement personnel need to conduct professional works such as case analysis and processing, a large amount of repetitive works such as basic information extraction, case relation construction and the like are needed, and meanwhile, similar cases need to be searched for by going to each platform. However, the existing method treatment knowledge has the problems of inconvenience in use and poor use experience of users caused by scattered, unmatched and inaccurate information of each platform; when business cases are processed, a great deal of manpower is required for basic information extraction, case relation construction, legal and similar case searching and the like, and the cases cannot be prejudged.

Disclosure of Invention

The invention aims to provide an intelligent pre-judging method and system for digital case treatment, which are used for solving the technical problems, acquiring the types of cases based on a controllable tensor decomposition algorithm and searching from a constructed legal knowledge base, and constructing a case knowledge graph to realize analysis and pre-judgment of the cases, so that reference analysis of case treatment is provided for users, and the working efficiency is greatly improved.

In order to solve the technical problems, the invention provides an intelligent pre-judging method for a digital case, which comprises the following steps:

acquiring legal case data based on a web page crawling algorithm, and constructing a legal knowledge base;

acquiring text classification according to the input text, acquiring case categories corresponding to the input text based on a controllable tensor decomposition algorithm, and acquiring case data corresponding to classification results based on the text classification and the case categories;

and searching in a legal knowledge base based on the case data, and constructing a case knowledge graph to realize analysis and prognosis of the case.

According to the scheme, the categories of the cases are obtained based on the controllable tensor decomposition algorithm and are retrieved from the constructed legal knowledge base, and the case knowledge graph is constructed to realize analysis and prejudgment of the cases, so that reference analysis of case processing is provided for users, and the working efficiency is greatly improved.

Further, the method case data is obtained based on a web page crawling algorithm, and a method knowledge base is constructed, specifically:

analyzing a webpage into a text sequence based on a webpage crawling algorithm, wherein each html tag in the webpage is a subsequence of the text sequence;

scoring the text sequence based on a preset scoring rule to obtain a score sequence corresponding to the text sequence;

obtaining a subsequence with the maximum score sum from the score sequence, and obtaining a webpage text;

and constructing a legal knowledge base based on the text of the webpage text.

Further, the text classification is obtained according to the input text, the case category corresponding to the input text is obtained based on the controllable tensor decomposition algorithm, and the case data corresponding to the classification result is obtained based on the text classification and the case category, specifically:

performing vocabulary matching and word segmentation on the input text, and acquiring text classification based on a vocabulary matching result and a word segmentation result;

acquiring a case category corresponding to an input text based on a controllable tensor decomposition algorithm;

and determining a classification result of the input text based on the text classification and the case classification, and acquiring case data corresponding to the classification result.

Further, the case data is searched in a legal knowledge base, and a case knowledge graph is constructed to realize analysis and prognosis of the case, specifically:

searching in a legal knowledge base based on case data by adopting a position semantic searching algorithm to obtain cases related to the case data;

and constructing a case knowledge graph based on the associated cases to realize analysis and prejudgment of the cases.

The invention provides an intelligent pre-judging system for digital case treatment, which comprises:

the data crawling module is used for acquiring legal case data based on a web page crawling algorithm and constructing a legal knowledge base;

the text input module is used for inputting a text to be prejudged;

the case data acquisition module is used for acquiring text classification according to the input text, acquiring case categories corresponding to the input text based on a controllable tensor decomposition algorithm, and acquiring case data corresponding to classification results based on the text classification and the case categories;

the case pre-judging module is used for searching in the legal knowledge base based on the case data and constructing a case knowledge graph so as to analyze and pre-judge the case.

Further, the data crawling module is configured to obtain legal case data based on a web page crawling algorithm, and construct a legal knowledge base, specifically:

and constructing a legal knowledge base based on the text of the webpage text.

Further, the case data obtaining module is configured to obtain a text classification according to an input text and obtain a case category corresponding to the input text based on a controllable tensor decomposition algorithm, and obtain case data corresponding to a classification result based on the text classification and the case category, specifically:

Further, the case prejudging module is configured to search in a legal knowledge base based on case data, and construct a case knowledge graph to realize analysis and prejudgment of cases, specifically:

Further, the system also comprises a case retrieval module for retrieving in the legal knowledge base based on the input text and obtaining case data related to the input text.

Further, the case pre-judging module is further used for constructing a case knowledge graph based on the case input by the text input module so as to analyze and pre-judge the case.

Drawings

FIG. 1 is a schematic flow chart of a digital case intelligent pre-judging method according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a digital case intelligent pre-judgment system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a multi-node deployment according to an embodiment of the present invention;

fig. 4 is a flowchart of a controllable tensor decomposition algorithm according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the present embodiment provides an intelligent pre-judging method for a digital case, which includes the following steps:

s1: acquiring legal case data based on a web page crawling algorithm, and constructing a legal knowledge base;

s2: acquiring text classification according to the input text, acquiring case categories corresponding to the input text based on a controllable tensor decomposition algorithm, and acquiring case data corresponding to classification results based on the text classification and the case categories;

s3: and searching in a legal knowledge base based on the case data, and constructing a case knowledge graph to realize analysis and prognosis of the case.

According to the intelligent predicting method for the digital case, the categories of the cases are obtained based on the controllable tensor decomposition algorithm, the cases are retrieved from the constructed rule knowledge base, and the case knowledge graph is constructed to realize analysis and predicting of the cases, so that reference analysis of case processing is provided for users, and the working efficiency is greatly improved.

and constructing a legal knowledge base based on the text of the webpage text.

It should be noted that, the web page crawling algorithm periodically crawls the legal case data on the designated website or the columns to be crawled in the form of a timing task.

In order to more clearly illustrate the specific implementation process of the web crawling algorithm, the embodiment provides a specific implementation manner, namely, the web crawling algorithm based on heuristic rules and unsupervised learning is adopted to realize the web crawler service. The web page crawling algorithm based on heuristic rules and unsupervised learning has high universality and is suitable for crawling the content of web pages in different languages or different structures. Specifically:

the web page crawling algorithm may parse the web page into a token sequence, such as: label (body), label (div), text, text..8 times, label (/ div), label (div), text, text..500 times, label (/ div), label (div), text, text..6 times, label (/ div), label (/ body). Wherein, the tag refers to an html tag in a webpage; text refers to text contained in an html tag; the number of times of 8 times, 6 times, etc. indicates the number of times the corresponding content is repeated.

Then, a score is given to each token in the token sequence according to a preset scoring rule, wherein the scoring rule can be as follows:

one label is-3.25 minutes and one text is 1 minute.

Scoring the token sequence according to a scoring rule to obtain a score sequence, wherein each html tag is a subsequence, which can be specifically expressed as:

-3.25, -3.25,1,1,1..8 times, -3.25, -3.25,1,1,1..500 times, -3.25, -3.25,1,1,1..6 times, -3.25, -3.25.

And finally, finding out the subsequence with the largest sum from the score sequence, namely finding out one subsequence in the token sequence, wherein the subsequence is the text of the webpage.

It should be noted that, in this embodiment, a method capable of processing dynamic programming with overlapping sub-problems may be used to break down the problem into smaller sub-problems to solve the whole problem, and compared with directly processing the maximum sequence, the method has the advantages of higher efficiency, lower algorithm complexity, wider applicability and higher precision. The overlapping sub-problem refers to a problem that has been solved before, that is, when a certain html tag contains multiple layers, multiple tags or multiple pieces of text, a score needs to be calculated for each offspring tag, and finally, the score of the current html tag can be obtained.

The webpage crawler service provided by the embodiment can be deployed on a server in a DMZ area, the accuracy rate and the recall rate of text extraction in a test set webpage can reach more than 90%, and the text extraction effect is good; and semantic analysis, such as keyword extraction, can be performed on the extracted text, so that web pages which do not accord with the title or have empty text are eliminated, and the text extraction effect is improved.

It should be noted that, for the search in the legal knowledge base, a position semantic search algorithm based on vertex and edge labels may be used, so that the search of vertex and edge labels on the knowledge graph may be realized. The data of the class recommendation result in the knowledge graph is from the data under the legal column in the knowledge base, and each case has a corresponding graph. Specifically, the process of reversely indexing the keywords of the vertices and edges in the knowledge graph may be: and storing the vertexes or edges corresponding to each label by using a list so as to quickly find the vertexes and edges corresponding to the vertex query keywords.

In this embodiment, a queue may be constructed according to keywords in case data, and then a keyword score may be calculated by a scoring function, where the scoring function may be calculated according to a search engine algorithm, and specifically includes scoring functions such as a position, a density, a frequency, an importance, and the like of the keywords in a page, and is arranged in descending order according to the keyword score; and then, searching the optimal qualified position semantics by utilizing an optimal qualified position semantics algorithm based on a preset scoring threshold value, calculating a scoring function of each optimal qualified position semantics, comparing the scoring function value with the threshold value, if the scoring function value is smaller than the threshold value, not entering a queue, and if the scoring function value is larger than the threshold value, entering the queue and updating the threshold value. The method specifically comprises the following steps:

constructing a queue with an initial value of empty and a total number of elements of k according to keywords of case data, and performing descending arrangement according to grading function values of the semantics of the optimal qualified positions;

finding out the vertex and the edge corresponding to each query keyword by using the reverse index, and converting the vertex and the edge into a form of 'vertex/edge-keyword' by using a mapping structure, wherein each vertex/edge has a piece of text description information to obtain a set U;

presetting a threshold value theta and configuring an initial value as + -infinity, and setting the vertex and the edge lower than the threshold value in the set U;

calculating the scores of the keywords of each element in the set U by using the residual vertexes and edges in the set U through a TF-IDF algorithm;

and finally, the set U is in a sequence according to the score, and the previous TOP-k elements and the corresponding position vertexes thereof are obtained to obtain cases related to the case data.

It should be noted that, the location semantic retrieval algorithm needs to consider the number of nodes and the connection mode in the graph to ensure the accuracy and efficiency of the calculation result; when calculating the shortest path, the weight of the nodes in the graph needs to be considered so as to ensure that the length of the path can truly reflect the relationship between the entities; in processing large-scale graph data, the scalability and performance of the algorithm need to be considered to ensure that the algorithm can handle a large number of nodes and edges.

Referring to fig. 2, the present embodiment provides an intelligent predicting system for digital case treatment, including:

the text input module is used for inputting a text to be prejudged;

and constructing a legal knowledge base based on the text of the webpage text.

In the practical application process, the web crawling algorithm needs to realize the requirement of running the crawler task on a plurality of nodes at the same time, namely, managing the crawler on the distributed nodes due to the fact that related websites related to laws and regulations, judicial cases, contract texts, legal documents, law resources and the like are numerous. At this time, a crawler management platform for uniformly acquiring text of the web page text can be constructed, and the platform allows related crawler scripts to be run, monitored and operated on the server cluster of the related website, and is centrally viewed and managed. See in particular fig. 3.

Each crawler management platform service is deployed on an independent server, and the MongoDB and the Redis databases at the central position serve as communication media of all servers, and are connected with a Master node (Master) and each working node (Worker), wherein only one Master node is used. Therefore, a multi-node cluster can be formed based on a crawler management platform, and a web page crawling algorithm can be executed on any node in the cluster; the crawled data can be transmitted back to the main node through Redis and then presented to the front-end interface; the master node can also "issue a signaling" to the working node through Redis; the MongoDB also stores the related information of each node for the front-end interface. Redis is used to store task information including the time of execution of the crawler, days in a week, days in a month, months, hours, minutes. Finally, the title, body, time of publication, source may be extracted from the web page, and the web page may be text categorized, for example: treating hot spots by a method or treating hot spots by a method; the extracted data is stored in MongoDB, structured data in MongoDB is transmitted to an intranet for storage through an internal network channel and an external network channel by utilizing a timing task, so as to construct a legal knowledge base.

Furthermore, the data crawling module can be realized by adopting a web crawling algorithm based on heuristic rules and unsupervised learning, and crawling importing of news, information and the like customized by a plurality of authoritative legal news websites can be realized, so that a legal knowledge base is constructed, and a user can easily and conveniently review.

In this embodiment, the case data obtaining module may introduce a vocabulary, a word segmentation model and a text classification model, perform vocabulary matching, word segmentation and text classification of word segmentation results on the input text, and obtain a pre-judgment result by performing vocabulary matching, word segmentation text classification, and controllable tensor decomposition algorithm model pre-judgment result, and performing score addition. Referring specifically to fig. 4, the case result pre-judging algorithm based on the controllable tensor decomposition mainly comprises a legal case modeling method based on the controllable tensor decomposition and a regression algorithm with intermediate tensor optimization.

The legal case modeling method based on the legal case modeling method comprises the following steps:

step one, representing legal cases as three-dimensional original tensors by using TENR, wherein the intermediate tensors refer to tensors used for storing intermediate results in a calculation process, and the tensors can be of any shape and size;

step two, calculating a mapping matrix set by using a controllable tensor algorithm according to the relation among the original tensor, the intermediate tensor and the target tensor and the mapping matrix between the original tensor and the intermediate tensor;

and thirdly, solving a kernel tensor by using the mapping matrix set and the original tensor.

The regression algorithm with the intermediate tensor optimization is an optimization algorithm of the intermediate tensor through a loss function, different early tensor decomposition strategies are selected according to different values of the intermediate tensor, the value of the intermediate tensor is optimized, and then the early tensor decomposition process is guided, so that the obtained nuclear tensor represents tensor elements and structural information which are most favorable for improving the accuracy of the prediction algorithm. For example: if the intermediate tensor is a sparse matrix, then a sparse matrix decomposition algorithm may be chosen to decompose it to improve the efficiency and accuracy of the algorithm.

The objective function 1, the objective function 2, and the objective function 3 are defined according to the size, shape, type, dimension, constraint condition, and the like of the matrix. The original tensor refers to the tensor controlled in the controllable tensor algorithm, typically a matrix or vector, and the shape and size of the original tensor depend on the specific implementation of the algorithm. A kernel tensor refers to a tensor used in a controllable tensor algorithm to represent the relationship between tensors, which can be viewed as a special tensor with some special properties that can be used to describe the relationship between tensors. Intermediate tensors refer to tensors used in controllable tensor algorithms to represent the information passed between the tensors. The intermediate tensor is typically a matrix or vector that maps the original tensor to the core tensor and maps the core tensor back to the original tensor.

In this embodiment, the controllable tensor decomposition algorithm solves the disadvantages of the conventional case result pre-judgment algorithm. The modeling method solves the natural defects of the feature model. Meanwhile, the legal case modeling method based on controllable tensor decomposition can describe cases from multiple layers, capture the associated information among case modules, and is beneficial to improving the accuracy of a follow-up prediction algorithm. In the aspect of a prediction algorithm, a regression algorithm with intermediate tensor optimization controls a pre-tensor decomposition process through optimization of the intermediate tensor, so that the prediction algorithm captures tensor elements and structure information which are most beneficial to improving the accuracy of the tensor elements and structure information, and a more accurate pre-judgment result is obtained compared with a classification algorithm.

Specifically, extracting a map of a text in real time by inputting the text; classifying the text by using a controllable tensor decomposition algorithm; and returning a case prompt, a spring prejudgement, a prosecution/debate and evidence list corresponding to the preset category according to the classification result, and obtaining case data corresponding to the classification result.

It should be noted that, for the search in the legal knowledge base, a position semantic search algorithm based on vertex and edge labels may be used, so that the search of vertex and edge labels on the knowledge graph may be realized. The data of the class recommendation result in the knowledge graph is from the data under the legal column in the knowledge base, and each case has a corresponding graph.

In this embodiment, the retrieval of the legal knowledge base introduces a semantic analysis model for extracting keywords from the input text in addition to the position-based semantic retrieval algorithm, and performs entity, attribute, relationship, keyword extraction and document simultaneous warehousing on the document content by using a model trained by the labeling data before the document of the legal knowledge base is warehoused, so as to realize multi-field retrieval of the legal knowledge base and improve the retrieval result of the knowledge base. When the method is used for searching the knowledge base, a 'XX company king a certain electric shock personal injury liability dispute' judgment book is input, keywords are extracted from the input text, then a plurality of fields in the knowledge base, such as a title, a text, keywords, an entity and the like, are searched, the map result is searched by combining a position semantic search algorithm, the search result is accurately ordered, and the document after accurate ordering is returned.

In this embodiment, the analysis and the pre-judgment of the formulated case can be realized by directly constructing the case knowledge graph for the input case by the system.

According to the embodiment, a customized web crawler technology based on heuristic rules and an unsupervised learning web page extraction algorithm is adopted, and crawling importing of news, information and the like customized by a plurality of authoritative legal news websites is aimed, so that a user can easily and conveniently review by the same system; adopting a position semantic retrieval algorithm based on vertex and edge labels to realize accurate advanced retrieval of synonyms, legal strips, judicial cases (multi-retrieval entries) and the like of legal treatment information; and a controllable tensor decomposition algorithm is adopted to extract defined intervention parameters from the input cases, a case knowledge graph is constructed, further analysis results of case law prejudgment, case analysis and case recommendation are deduced, reference analysis of case processing is provided for law workers, and more repeated query work is saved.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims

1. The intelligent digital case pre-judging method is characterized by comprising the following steps of:

2. The intelligent prejudging method for digital case treatment according to claim 1, wherein the method case treatment data is obtained based on a web page crawling algorithm, and a method case treatment knowledge base is constructed, specifically:

and constructing a legal knowledge base based on the text of the webpage text.

3. The intelligent pre-judging method for digital case treatment according to claim 1, wherein the method is characterized in that the method comprises the steps of obtaining text classification according to an input text, obtaining case types corresponding to the input text based on a controllable tensor decomposition algorithm, and obtaining case data corresponding to classification results based on the text classification and the case types, and specifically comprises the following steps:

4. The intelligent case pre-judging method of claim 3, wherein the case knowledge map is constructed based on searching in a legal knowledge base to analyze and pre-judge cases, specifically:

5. An intelligent pre-judging system for digital case treatment, which is characterized by comprising:

the text input module is used for inputting a text to be prejudged;

6. The intelligent prejudging system for digital case treatment according to claim 5, wherein the data crawling module is configured to obtain case treatment data based on a web page crawling algorithm, and construct a knowledge base for treatment, specifically:

and constructing a legal knowledge base based on the text of the webpage text.

7. The intelligent digital case pre-judging system according to claim 5, wherein the case data obtaining module is configured to obtain a text classification according to an input text and obtain a case category corresponding to the input text based on a controllable tensor decomposition algorithm, and obtain case data corresponding to a classification result based on the text classification and the case category, specifically:

8. The intelligent case pre-judging system according to claim 7, wherein the case pre-judging module is configured to search in a legal knowledge base based on case data, and construct a case knowledge graph to analyze and pre-judge cases, specifically:

9. The intelligent digital legal case pre-judging system according to any one of claims 5-8, further comprising a case search module for searching in a legal knowledge base based on the input text to obtain case data related to the input text.

10. The intelligent case pre-judging system for digital case treatment according to claim 9, wherein the case pre-judging module is further configured to construct a case knowledge graph based on the case input by the text input module to analyze and pre-judge the case.