CN117407491A - Intelligent pre-judging method and system for digital case treatment - Google Patents

Intelligent pre-judging method and system for digital case treatment Download PDF

Info

Publication number
CN117407491A
CN117407491A CN202311478370.1A CN202311478370A CN117407491A CN 117407491 A CN117407491 A CN 117407491A CN 202311478370 A CN202311478370 A CN 202311478370A CN 117407491 A CN117407491 A CN 117407491A
Authority
CN
China
Prior art keywords
case
text
classification
acquiring
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311478370.1A
Other languages
Chinese (zh)
Inventor
林蓥
胡玉梅
高茜
桂瑶
罗双
丘嘉苑
周子健
王建永
陈颖璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Power Grid Co Ltd
Original Assignee
Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Power Grid Co Ltd filed Critical Guangdong Power Grid Co Ltd
Priority to CN202311478370.1A priority Critical patent/CN117407491A/en
Publication of CN117407491A publication Critical patent/CN117407491A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Technology Law (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an intelligent pre-judging method and system for digital case treatment, wherein the method comprises the following steps: acquiring legal case data based on a web page crawling algorithm, and constructing a legal knowledge base; acquiring text classification according to the input text, acquiring case categories corresponding to the input text based on a controllable tensor decomposition algorithm, and acquiring case data corresponding to classification results based on the text classification and the case categories; and searching in a legal knowledge base based on the case data, and constructing a case knowledge graph to realize analysis and prognosis of the case. According to the intelligent case pre-judging method for the digital method provided by the invention, the categories of the cases are obtained based on the controllable tensor decomposition algorithm and are retrieved from the constructed method treatment knowledge base, and the case knowledge graph is constructed to realize analysis and pre-judgment of the cases, so that reference analysis of case treatment is provided for users, and the working efficiency is greatly improved.

Description

Intelligent pre-judging method and system for digital case treatment
Technical Field
The invention relates to the technical field of big data processing, in particular to an intelligent pre-judging method and system for digital case treatment.
Background
At present, the traditional common law forms comprise release and propagation of legal information such as legal news and legal regulations, and people want to know the information and search and view the information mainly by going to each common law platform; and when the law enforcement personnel need to conduct professional works such as case analysis and processing, a large amount of repetitive works such as basic information extraction, case relation construction and the like are needed, and meanwhile, similar cases need to be searched for by going to each platform. However, the existing method treatment knowledge has the problems of inconvenience in use and poor use experience of users caused by scattered, unmatched and inaccurate information of each platform; when business cases are processed, a great deal of manpower is required for basic information extraction, case relation construction, legal and similar case searching and the like, and the cases cannot be prejudged.
Disclosure of Invention
The invention aims to provide an intelligent pre-judging method and system for digital case treatment, which are used for solving the technical problems, acquiring the types of cases based on a controllable tensor decomposition algorithm and searching from a constructed legal knowledge base, and constructing a case knowledge graph to realize analysis and pre-judgment of the cases, so that reference analysis of case treatment is provided for users, and the working efficiency is greatly improved.
In order to solve the technical problems, the invention provides an intelligent pre-judging method for a digital case, which comprises the following steps:
acquiring legal case data based on a web page crawling algorithm, and constructing a legal knowledge base;
acquiring text classification according to the input text, acquiring case categories corresponding to the input text based on a controllable tensor decomposition algorithm, and acquiring case data corresponding to classification results based on the text classification and the case categories;
and searching in a legal knowledge base based on the case data, and constructing a case knowledge graph to realize analysis and prognosis of the case.
According to the scheme, the categories of the cases are obtained based on the controllable tensor decomposition algorithm and are retrieved from the constructed legal knowledge base, and the case knowledge graph is constructed to realize analysis and prejudgment of the cases, so that reference analysis of case processing is provided for users, and the working efficiency is greatly improved.
Further, the method case data is obtained based on a web page crawling algorithm, and a method knowledge base is constructed, specifically:
analyzing a webpage into a text sequence based on a webpage crawling algorithm, wherein each html tag in the webpage is a subsequence of the text sequence;
scoring the text sequence based on a preset scoring rule to obtain a score sequence corresponding to the text sequence;
obtaining a subsequence with the maximum score sum from the score sequence, and obtaining a webpage text;
and constructing a legal knowledge base based on the text of the webpage text.
Further, the text classification is obtained according to the input text, the case category corresponding to the input text is obtained based on the controllable tensor decomposition algorithm, and the case data corresponding to the classification result is obtained based on the text classification and the case category, specifically:
performing vocabulary matching and word segmentation on the input text, and acquiring text classification based on a vocabulary matching result and a word segmentation result;
acquiring a case category corresponding to an input text based on a controllable tensor decomposition algorithm;
and determining a classification result of the input text based on the text classification and the case classification, and acquiring case data corresponding to the classification result.
Further, the case data is searched in a legal knowledge base, and a case knowledge graph is constructed to realize analysis and prognosis of the case, specifically:
searching in a legal knowledge base based on case data by adopting a position semantic searching algorithm to obtain cases related to the case data;
and constructing a case knowledge graph based on the associated cases to realize analysis and prejudgment of the cases.
The invention provides an intelligent pre-judging system for digital case treatment, which comprises:
the data crawling module is used for acquiring legal case data based on a web page crawling algorithm and constructing a legal knowledge base;
the text input module is used for inputting a text to be prejudged;
the case data acquisition module is used for acquiring text classification according to the input text, acquiring case categories corresponding to the input text based on a controllable tensor decomposition algorithm, and acquiring case data corresponding to classification results based on the text classification and the case categories;
the case pre-judging module is used for searching in the legal knowledge base based on the case data and constructing a case knowledge graph so as to analyze and pre-judge the case.
Further, the data crawling module is configured to obtain legal case data based on a web page crawling algorithm, and construct a legal knowledge base, specifically:
analyzing a webpage into a text sequence based on a webpage crawling algorithm, wherein each html tag in the webpage is a subsequence of the text sequence;
scoring the text sequence based on a preset scoring rule to obtain a score sequence corresponding to the text sequence;
obtaining a subsequence with the maximum score sum from the score sequence, and obtaining a webpage text;
and constructing a legal knowledge base based on the text of the webpage text.
Further, the case data obtaining module is configured to obtain a text classification according to an input text and obtain a case category corresponding to the input text based on a controllable tensor decomposition algorithm, and obtain case data corresponding to a classification result based on the text classification and the case category, specifically:
performing vocabulary matching and word segmentation on the input text, and acquiring text classification based on a vocabulary matching result and a word segmentation result;
acquiring a case category corresponding to an input text based on a controllable tensor decomposition algorithm;
and determining a classification result of the input text based on the text classification and the case classification, and acquiring case data corresponding to the classification result.
Further, the case prejudging module is configured to search in a legal knowledge base based on case data, and construct a case knowledge graph to realize analysis and prejudgment of cases, specifically:
searching in a legal knowledge base based on case data by adopting a position semantic searching algorithm to obtain cases related to the case data;
and constructing a case knowledge graph based on the associated cases to realize analysis and prejudgment of the cases.
Further, the system also comprises a case retrieval module for retrieving in the legal knowledge base based on the input text and obtaining case data related to the input text.
Further, the case pre-judging module is further used for constructing a case knowledge graph based on the case input by the text input module so as to analyze and pre-judge the case.
Drawings
FIG. 1 is a schematic flow chart of a digital case intelligent pre-judging method according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a digital case intelligent pre-judgment system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a multi-node deployment according to an embodiment of the present invention;
fig. 4 is a flowchart of a controllable tensor decomposition algorithm according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the present embodiment provides an intelligent pre-judging method for a digital case, which includes the following steps:
s1: acquiring legal case data based on a web page crawling algorithm, and constructing a legal knowledge base;
s2: acquiring text classification according to the input text, acquiring case categories corresponding to the input text based on a controllable tensor decomposition algorithm, and acquiring case data corresponding to classification results based on the text classification and the case categories;
s3: and searching in a legal knowledge base based on the case data, and constructing a case knowledge graph to realize analysis and prognosis of the case.
According to the intelligent predicting method for the digital case, the categories of the cases are obtained based on the controllable tensor decomposition algorithm, the cases are retrieved from the constructed rule knowledge base, and the case knowledge graph is constructed to realize analysis and predicting of the cases, so that reference analysis of case processing is provided for users, and the working efficiency is greatly improved.
Further, the method case data is obtained based on a web page crawling algorithm, and a method knowledge base is constructed, specifically:
analyzing a webpage into a text sequence based on a webpage crawling algorithm, wherein each html tag in the webpage is a subsequence of the text sequence;
scoring the text sequence based on a preset scoring rule to obtain a score sequence corresponding to the text sequence;
obtaining a subsequence with the maximum score sum from the score sequence, and obtaining a webpage text;
and constructing a legal knowledge base based on the text of the webpage text.
It should be noted that, the web page crawling algorithm periodically crawls the legal case data on the designated website or the columns to be crawled in the form of a timing task.
In order to more clearly illustrate the specific implementation process of the web crawling algorithm, the embodiment provides a specific implementation manner, namely, the web crawling algorithm based on heuristic rules and unsupervised learning is adopted to realize the web crawler service. The web page crawling algorithm based on heuristic rules and unsupervised learning has high universality and is suitable for crawling the content of web pages in different languages or different structures. Specifically:
the web page crawling algorithm may parse the web page into a token sequence, such as: label (body), label (div), text, text..8 times, label (/ div), label (div), text, text..500 times, label (/ div), label (div), text, text..6 times, label (/ div), label (/ body). Wherein, the tag refers to an html tag in a webpage; text refers to text contained in an html tag; the number of times of 8 times, 6 times, etc. indicates the number of times the corresponding content is repeated.
Then, a score is given to each token in the token sequence according to a preset scoring rule, wherein the scoring rule can be as follows:
one label is-3.25 minutes and one text is 1 minute.
Scoring the token sequence according to a scoring rule to obtain a score sequence, wherein each html tag is a subsequence, which can be specifically expressed as:
-3.25, -3.25,1,1,1..8 times, -3.25, -3.25,1,1,1..500 times, -3.25, -3.25,1,1,1..6 times, -3.25, -3.25.
And finally, finding out the subsequence with the largest sum from the score sequence, namely finding out one subsequence in the token sequence, wherein the subsequence is the text of the webpage.
It should be noted that, in this embodiment, a method capable of processing dynamic programming with overlapping sub-problems may be used to break down the problem into smaller sub-problems to solve the whole problem, and compared with directly processing the maximum sequence, the method has the advantages of higher efficiency, lower algorithm complexity, wider applicability and higher precision. The overlapping sub-problem refers to a problem that has been solved before, that is, when a certain html tag contains multiple layers, multiple tags or multiple pieces of text, a score needs to be calculated for each offspring tag, and finally, the score of the current html tag can be obtained.
The webpage crawler service provided by the embodiment can be deployed on a server in a DMZ area, the accuracy rate and the recall rate of text extraction in a test set webpage can reach more than 90%, and the text extraction effect is good; and semantic analysis, such as keyword extraction, can be performed on the extracted text, so that web pages which do not accord with the title or have empty text are eliminated, and the text extraction effect is improved.
Further, the text classification is obtained according to the input text, the case category corresponding to the input text is obtained based on the controllable tensor decomposition algorithm, and the case data corresponding to the classification result is obtained based on the text classification and the case category, specifically:
performing vocabulary matching and word segmentation on the input text, and acquiring text classification based on a vocabulary matching result and a word segmentation result;
acquiring a case category corresponding to an input text based on a controllable tensor decomposition algorithm;
and determining a classification result of the input text based on the text classification and the case classification, and acquiring case data corresponding to the classification result.
Further, the case data is searched in a legal knowledge base, and a case knowledge graph is constructed to realize analysis and prognosis of the case, specifically:
searching in a legal knowledge base based on case data by adopting a position semantic searching algorithm to obtain cases related to the case data;
and constructing a case knowledge graph based on the associated cases to realize analysis and prejudgment of the cases.
It should be noted that, for the search in the legal knowledge base, a position semantic search algorithm based on vertex and edge labels may be used, so that the search of vertex and edge labels on the knowledge graph may be realized. The data of the class recommendation result in the knowledge graph is from the data under the legal column in the knowledge base, and each case has a corresponding graph. Specifically, the process of reversely indexing the keywords of the vertices and edges in the knowledge graph may be: and storing the vertexes or edges corresponding to each label by using a list so as to quickly find the vertexes and edges corresponding to the vertex query keywords.
In this embodiment, a queue may be constructed according to keywords in case data, and then a keyword score may be calculated by a scoring function, where the scoring function may be calculated according to a search engine algorithm, and specifically includes scoring functions such as a position, a density, a frequency, an importance, and the like of the keywords in a page, and is arranged in descending order according to the keyword score; and then, searching the optimal qualified position semantics by utilizing an optimal qualified position semantics algorithm based on a preset scoring threshold value, calculating a scoring function of each optimal qualified position semantics, comparing the scoring function value with the threshold value, if the scoring function value is smaller than the threshold value, not entering a queue, and if the scoring function value is larger than the threshold value, entering the queue and updating the threshold value. The method specifically comprises the following steps:
constructing a queue with an initial value of empty and a total number of elements of k according to keywords of case data, and performing descending arrangement according to grading function values of the semantics of the optimal qualified positions;
finding out the vertex and the edge corresponding to each query keyword by using the reverse index, and converting the vertex and the edge into a form of 'vertex/edge-keyword' by using a mapping structure, wherein each vertex/edge has a piece of text description information to obtain a set U;
presetting a threshold value theta and configuring an initial value as + -infinity, and setting the vertex and the edge lower than the threshold value in the set U;
calculating the scores of the keywords of each element in the set U by using the residual vertexes and edges in the set U through a TF-IDF algorithm;
and finally, the set U is in a sequence according to the score, and the previous TOP-k elements and the corresponding position vertexes thereof are obtained to obtain cases related to the case data.
It should be noted that, the location semantic retrieval algorithm needs to consider the number of nodes and the connection mode in the graph to ensure the accuracy and efficiency of the calculation result; when calculating the shortest path, the weight of the nodes in the graph needs to be considered so as to ensure that the length of the path can truly reflect the relationship between the entities; in processing large-scale graph data, the scalability and performance of the algorithm need to be considered to ensure that the algorithm can handle a large number of nodes and edges.
Referring to fig. 2, the present embodiment provides an intelligent predicting system for digital case treatment, including:
the data crawling module is used for acquiring legal case data based on a web page crawling algorithm and constructing a legal knowledge base;
the text input module is used for inputting a text to be prejudged;
the case data acquisition module is used for acquiring text classification according to the input text, acquiring case categories corresponding to the input text based on a controllable tensor decomposition algorithm, and acquiring case data corresponding to classification results based on the text classification and the case categories;
the case pre-judging module is used for searching in the legal knowledge base based on the case data and constructing a case knowledge graph so as to analyze and pre-judge the case.
Further, the data crawling module is configured to obtain legal case data based on a web page crawling algorithm, and construct a legal knowledge base, specifically:
analyzing a webpage into a text sequence based on a webpage crawling algorithm, wherein each html tag in the webpage is a subsequence of the text sequence;
scoring the text sequence based on a preset scoring rule to obtain a score sequence corresponding to the text sequence;
obtaining a subsequence with the maximum score sum from the score sequence, and obtaining a webpage text;
and constructing a legal knowledge base based on the text of the webpage text.
In the practical application process, the web crawling algorithm needs to realize the requirement of running the crawler task on a plurality of nodes at the same time, namely, managing the crawler on the distributed nodes due to the fact that related websites related to laws and regulations, judicial cases, contract texts, legal documents, law resources and the like are numerous. At this time, a crawler management platform for uniformly acquiring text of the web page text can be constructed, and the platform allows related crawler scripts to be run, monitored and operated on the server cluster of the related website, and is centrally viewed and managed. See in particular fig. 3.
Each crawler management platform service is deployed on an independent server, and the MongoDB and the Redis databases at the central position serve as communication media of all servers, and are connected with a Master node (Master) and each working node (Worker), wherein only one Master node is used. Therefore, a multi-node cluster can be formed based on a crawler management platform, and a web page crawling algorithm can be executed on any node in the cluster; the crawled data can be transmitted back to the main node through Redis and then presented to the front-end interface; the master node can also "issue a signaling" to the working node through Redis; the MongoDB also stores the related information of each node for the front-end interface. Redis is used to store task information including the time of execution of the crawler, days in a week, days in a month, months, hours, minutes. Finally, the title, body, time of publication, source may be extracted from the web page, and the web page may be text categorized, for example: treating hot spots by a method or treating hot spots by a method; the extracted data is stored in MongoDB, structured data in MongoDB is transmitted to an intranet for storage through an internal network channel and an external network channel by utilizing a timing task, so as to construct a legal knowledge base.
Furthermore, the data crawling module can be realized by adopting a web crawling algorithm based on heuristic rules and unsupervised learning, and crawling importing of news, information and the like customized by a plurality of authoritative legal news websites can be realized, so that a legal knowledge base is constructed, and a user can easily and conveniently review.
Further, the case data obtaining module is configured to obtain a text classification according to an input text and obtain a case category corresponding to the input text based on a controllable tensor decomposition algorithm, and obtain case data corresponding to a classification result based on the text classification and the case category, specifically:
performing vocabulary matching and word segmentation on the input text, and acquiring text classification based on a vocabulary matching result and a word segmentation result;
acquiring a case category corresponding to an input text based on a controllable tensor decomposition algorithm;
and determining a classification result of the input text based on the text classification and the case classification, and acquiring case data corresponding to the classification result.
In this embodiment, the case data obtaining module may introduce a vocabulary, a word segmentation model and a text classification model, perform vocabulary matching, word segmentation and text classification of word segmentation results on the input text, and obtain a pre-judgment result by performing vocabulary matching, word segmentation text classification, and controllable tensor decomposition algorithm model pre-judgment result, and performing score addition. Referring specifically to fig. 4, the case result pre-judging algorithm based on the controllable tensor decomposition mainly comprises a legal case modeling method based on the controllable tensor decomposition and a regression algorithm with intermediate tensor optimization.
The legal case modeling method based on the legal case modeling method comprises the following steps:
step one, representing legal cases as three-dimensional original tensors by using TENR, wherein the intermediate tensors refer to tensors used for storing intermediate results in a calculation process, and the tensors can be of any shape and size;
step two, calculating a mapping matrix set by using a controllable tensor algorithm according to the relation among the original tensor, the intermediate tensor and the target tensor and the mapping matrix between the original tensor and the intermediate tensor;
and thirdly, solving a kernel tensor by using the mapping matrix set and the original tensor.
The regression algorithm with the intermediate tensor optimization is an optimization algorithm of the intermediate tensor through a loss function, different early tensor decomposition strategies are selected according to different values of the intermediate tensor, the value of the intermediate tensor is optimized, and then the early tensor decomposition process is guided, so that the obtained nuclear tensor represents tensor elements and structural information which are most favorable for improving the accuracy of the prediction algorithm. For example: if the intermediate tensor is a sparse matrix, then a sparse matrix decomposition algorithm may be chosen to decompose it to improve the efficiency and accuracy of the algorithm.
The objective function 1, the objective function 2, and the objective function 3 are defined according to the size, shape, type, dimension, constraint condition, and the like of the matrix. The original tensor refers to the tensor controlled in the controllable tensor algorithm, typically a matrix or vector, and the shape and size of the original tensor depend on the specific implementation of the algorithm. A kernel tensor refers to a tensor used in a controllable tensor algorithm to represent the relationship between tensors, which can be viewed as a special tensor with some special properties that can be used to describe the relationship between tensors. Intermediate tensors refer to tensors used in controllable tensor algorithms to represent the information passed between the tensors. The intermediate tensor is typically a matrix or vector that maps the original tensor to the core tensor and maps the core tensor back to the original tensor.
In this embodiment, the controllable tensor decomposition algorithm solves the disadvantages of the conventional case result pre-judgment algorithm. The modeling method solves the natural defects of the feature model. Meanwhile, the legal case modeling method based on controllable tensor decomposition can describe cases from multiple layers, capture the associated information among case modules, and is beneficial to improving the accuracy of a follow-up prediction algorithm. In the aspect of a prediction algorithm, a regression algorithm with intermediate tensor optimization controls a pre-tensor decomposition process through optimization of the intermediate tensor, so that the prediction algorithm captures tensor elements and structure information which are most beneficial to improving the accuracy of the tensor elements and structure information, and a more accurate pre-judgment result is obtained compared with a classification algorithm.
Specifically, extracting a map of a text in real time by inputting the text; classifying the text by using a controllable tensor decomposition algorithm; and returning a case prompt, a spring prejudgement, a prosecution/debate and evidence list corresponding to the preset category according to the classification result, and obtaining case data corresponding to the classification result.
Further, the case prejudging module is configured to search in a legal knowledge base based on case data, and construct a case knowledge graph to realize analysis and prejudgment of cases, specifically:
searching in a legal knowledge base based on case data by adopting a position semantic searching algorithm to obtain cases related to the case data;
and constructing a case knowledge graph based on the associated cases to realize analysis and prejudgment of the cases.
Further, the system also comprises a case retrieval module for retrieving in the legal knowledge base based on the input text and obtaining case data related to the input text.
It should be noted that, for the search in the legal knowledge base, a position semantic search algorithm based on vertex and edge labels may be used, so that the search of vertex and edge labels on the knowledge graph may be realized. The data of the class recommendation result in the knowledge graph is from the data under the legal column in the knowledge base, and each case has a corresponding graph.
In this embodiment, the retrieval of the legal knowledge base introduces a semantic analysis model for extracting keywords from the input text in addition to the position-based semantic retrieval algorithm, and performs entity, attribute, relationship, keyword extraction and document simultaneous warehousing on the document content by using a model trained by the labeling data before the document of the legal knowledge base is warehoused, so as to realize multi-field retrieval of the legal knowledge base and improve the retrieval result of the knowledge base. When the method is used for searching the knowledge base, a 'XX company king a certain electric shock personal injury liability dispute' judgment book is input, keywords are extracted from the input text, then a plurality of fields in the knowledge base, such as a title, a text, keywords, an entity and the like, are searched, the map result is searched by combining a position semantic search algorithm, the search result is accurately ordered, and the document after accurate ordering is returned.
Further, the case pre-judging module is further used for constructing a case knowledge graph based on the case input by the text input module so as to analyze and pre-judge the case.
In this embodiment, the analysis and the pre-judgment of the formulated case can be realized by directly constructing the case knowledge graph for the input case by the system.
According to the embodiment, a customized web crawler technology based on heuristic rules and an unsupervised learning web page extraction algorithm is adopted, and crawling importing of news, information and the like customized by a plurality of authoritative legal news websites is aimed, so that a user can easily and conveniently review by the same system; adopting a position semantic retrieval algorithm based on vertex and edge labels to realize accurate advanced retrieval of synonyms, legal strips, judicial cases (multi-retrieval entries) and the like of legal treatment information; and a controllable tensor decomposition algorithm is adopted to extract defined intervention parameters from the input cases, a case knowledge graph is constructed, further analysis results of case law prejudgment, case analysis and case recommendation are deduced, reference analysis of case processing is provided for law workers, and more repeated query work is saved.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims (10)

1. The intelligent digital case pre-judging method is characterized by comprising the following steps of:
acquiring legal case data based on a web page crawling algorithm, and constructing a legal knowledge base;
acquiring text classification according to the input text, acquiring case categories corresponding to the input text based on a controllable tensor decomposition algorithm, and acquiring case data corresponding to classification results based on the text classification and the case categories;
and searching in a legal knowledge base based on the case data, and constructing a case knowledge graph to realize analysis and prognosis of the case.
2. The intelligent prejudging method for digital case treatment according to claim 1, wherein the method case treatment data is obtained based on a web page crawling algorithm, and a method case treatment knowledge base is constructed, specifically:
analyzing a webpage into a text sequence based on a webpage crawling algorithm, wherein each html tag in the webpage is a subsequence of the text sequence;
scoring the text sequence based on a preset scoring rule to obtain a score sequence corresponding to the text sequence;
obtaining a subsequence with the maximum score sum from the score sequence, and obtaining a webpage text;
and constructing a legal knowledge base based on the text of the webpage text.
3. The intelligent pre-judging method for digital case treatment according to claim 1, wherein the method is characterized in that the method comprises the steps of obtaining text classification according to an input text, obtaining case types corresponding to the input text based on a controllable tensor decomposition algorithm, and obtaining case data corresponding to classification results based on the text classification and the case types, and specifically comprises the following steps:
performing vocabulary matching and word segmentation on the input text, and acquiring text classification based on a vocabulary matching result and a word segmentation result;
acquiring a case category corresponding to an input text based on a controllable tensor decomposition algorithm;
and determining a classification result of the input text based on the text classification and the case classification, and acquiring case data corresponding to the classification result.
4. The intelligent case pre-judging method of claim 3, wherein the case knowledge map is constructed based on searching in a legal knowledge base to analyze and pre-judge cases, specifically:
searching in a legal knowledge base based on case data by adopting a position semantic searching algorithm to obtain cases related to the case data;
and constructing a case knowledge graph based on the associated cases to realize analysis and prejudgment of the cases.
5. An intelligent pre-judging system for digital case treatment, which is characterized by comprising:
the data crawling module is used for acquiring legal case data based on a web page crawling algorithm and constructing a legal knowledge base;
the text input module is used for inputting a text to be prejudged;
the case data acquisition module is used for acquiring text classification according to the input text, acquiring case categories corresponding to the input text based on a controllable tensor decomposition algorithm, and acquiring case data corresponding to classification results based on the text classification and the case categories;
the case pre-judging module is used for searching in the legal knowledge base based on the case data and constructing a case knowledge graph so as to analyze and pre-judge the case.
6. The intelligent prejudging system for digital case treatment according to claim 5, wherein the data crawling module is configured to obtain case treatment data based on a web page crawling algorithm, and construct a knowledge base for treatment, specifically:
analyzing a webpage into a text sequence based on a webpage crawling algorithm, wherein each html tag in the webpage is a subsequence of the text sequence;
scoring the text sequence based on a preset scoring rule to obtain a score sequence corresponding to the text sequence;
obtaining a subsequence with the maximum score sum from the score sequence, and obtaining a webpage text;
and constructing a legal knowledge base based on the text of the webpage text.
7. The intelligent digital case pre-judging system according to claim 5, wherein the case data obtaining module is configured to obtain a text classification according to an input text and obtain a case category corresponding to the input text based on a controllable tensor decomposition algorithm, and obtain case data corresponding to a classification result based on the text classification and the case category, specifically:
performing vocabulary matching and word segmentation on the input text, and acquiring text classification based on a vocabulary matching result and a word segmentation result;
acquiring a case category corresponding to an input text based on a controllable tensor decomposition algorithm;
and determining a classification result of the input text based on the text classification and the case classification, and acquiring case data corresponding to the classification result.
8. The intelligent case pre-judging system according to claim 7, wherein the case pre-judging module is configured to search in a legal knowledge base based on case data, and construct a case knowledge graph to analyze and pre-judge cases, specifically:
searching in a legal knowledge base based on case data by adopting a position semantic searching algorithm to obtain cases related to the case data;
and constructing a case knowledge graph based on the associated cases to realize analysis and prejudgment of the cases.
9. The intelligent digital legal case pre-judging system according to any one of claims 5-8, further comprising a case search module for searching in a legal knowledge base based on the input text to obtain case data related to the input text.
10. The intelligent case pre-judging system for digital case treatment according to claim 9, wherein the case pre-judging module is further configured to construct a case knowledge graph based on the case input by the text input module to analyze and pre-judge the case.
CN202311478370.1A 2023-11-07 2023-11-07 Intelligent pre-judging method and system for digital case treatment Pending CN117407491A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311478370.1A CN117407491A (en) 2023-11-07 2023-11-07 Intelligent pre-judging method and system for digital case treatment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311478370.1A CN117407491A (en) 2023-11-07 2023-11-07 Intelligent pre-judging method and system for digital case treatment

Publications (1)

Publication Number Publication Date
CN117407491A true CN117407491A (en) 2024-01-16

Family

ID=89490682

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311478370.1A Pending CN117407491A (en) 2023-11-07 2023-11-07 Intelligent pre-judging method and system for digital case treatment

Country Status (1)

Country Link
CN (1) CN117407491A (en)

Similar Documents

Publication Publication Date Title
US10146862B2 (en) Context-based metadata generation and automatic annotation of electronic media in a computer network
US11663254B2 (en) System and engine for seeded clustering of news events
US9715493B2 (en) Method and system for monitoring social media and analyzing text to automate classification of user posts using a facet based relevance assessment model
US20210089563A1 (en) Systems and methods for performing a computer-implemented prior art search
CN108280114B (en) Deep learning-based user literature reading interest analysis method
WO2017097231A1 (en) Topic processing method and device
US20140279622A1 (en) System and method for semantic processing of personalized social data and generating probability models of personal context to generate recommendations in searching applications
CN103838833A (en) Full-text retrieval system based on semantic analysis of relevant words
Yin et al. Facto: a fact lookup engine based on web tables
US10747759B2 (en) System and method for conducting a textual data search
WO2020233344A1 (en) Searching method and apparatus, and storage medium
DE102012221251A1 (en) Semantic and contextual search of knowledge stores
EP3147801A1 (en) System and method for concept-based search summaries
CA2956627A1 (en) System and engine for seeded clustering of news events
US11886477B2 (en) System and method for quote-based search summaries
CN111325018A (en) Domain dictionary construction method based on web retrieval and new word discovery
CN116010552A (en) Engineering cost data analysis system and method based on keyword word library
CN116775972A (en) Remote resource arrangement service method and system based on information technology
CN117407491A (en) Intelligent pre-judging method and system for digital case treatment
CN113342844A (en) Industrial intelligent search system
US11726972B2 (en) Directed data indexing based on conceptual relevance
Zhang Web news data extraction technology based on text keywords
Al-Hamami et al. Development of an opinion blog mining system
Coviaux Optimization of the search engine ElasticSearch
Selvan et al. ASE: Automatic search engine for dynamic information retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination