CN111522955A - Litigation case classification method and device, computer equipment and storage medium - Google Patents

Litigation case classification method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111522955A
CN111522955A CN202010358113.4A CN202010358113A CN111522955A CN 111522955 A CN111522955 A CN 111522955A CN 202010358113 A CN202010358113 A CN 202010358113A CN 111522955 A CN111522955 A CN 111522955A
Authority
CN
China
Prior art keywords
case
category
cases
litigation
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010358113.4A
Other languages
Chinese (zh)
Other versions
CN111522955B (en
Inventor
董卓达
张亦龙
芦惠娟
顾正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huayun Zhongsheng Technology Co ltd
Original Assignee
Shenzhen Huayun Zhongsheng Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huayun Zhongsheng Technology Co ltd filed Critical Shenzhen Huayun Zhongsheng Technology Co ltd
Priority to CN202010358113.4A priority Critical patent/CN111522955B/en
Publication of CN111522955A publication Critical patent/CN111522955A/en
Application granted granted Critical
Publication of CN111522955B publication Critical patent/CN111522955B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a litigation case classification method, a litigation case classification device, computer equipment and a storage medium, wherein the method comprises the steps of determining an acquisition rule; acquiring a case updating channel to obtain a case source; crawling litigation cases from case sources according to acquisition rules to obtain cases to be classified; according to cases to be classified, document source classification, case segmentation classification and keyword classification are adopted to obtain corresponding classification results; judging whether the corresponding classification results are of the same category; if yes, taking the classification result as a case type; storing case categories and corresponding litigation cases to form a database; if not, the case to be classified is input into the classification model for classification to obtain the case type, and the case type and the corresponding litigation case are stored to form a database. The method and the device realize accurate classification of the existing litigation cases, so that related litigation cases can be directly and quickly searched by related operators from the classified categories, and the information acquisition efficiency is improved.

Description

Litigation case classification method and device, computer equipment and storage medium
Technical Field
The invention relates to a case classification method, in particular to a litigation case classification method, a litigation case classification device, a computer device and a storage medium.
Background
Fair litigation includes civil and administrative fair litigation, which is divided by the nature of the applicable litigation law or the difference in the subject being referred to. The law of litigation considers that benefits are damaged, victims are entitled to appeal to the court and request judicial relief, and the subject commonweal litigation raised by the inspection organ can be divided into commonweal litigation raised by the inspection organ, commonweal litigation raised by other social groups and individuals, the former is called civil or administrative commonweal complaint, and the latter is called general commonweal litigation.
The conventional public welfare litigation cases are uniformly stored and cannot be classified, when a surveyor needs to acquire related information when handling the public welfare litigation cases, related file materials or past history cases need to be manually inquired, after the contents are acquired, whether the current welfare litigation cases belong to the field which the surveyor wants to consult is manually determined, and the current welfare litigation cases are submitted to the surveyor after the determination, so that a large amount of manpower is needed, and the efficiency is low.
Therefore, there is a need to design a new method for accurately classifying existing litigation cases so that the related operators can directly and quickly find out the litigation cases from the classified categories, thereby improving the information acquisition efficiency of the related operators of the litigation cases.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a litigation case classification method, a litigation case classification device, a computer device and a storage medium.
In order to achieve the purpose, the invention adopts the following technical scheme: a method of litigation case classification, comprising:
determining an acquisition rule according to the sources of different litigation cases;
acquiring an update channel of litigation cases needing to be classified so as to obtain case sources;
crawling related litigation cases from case sources according to the acquisition rules to obtain cases to be classified;
classifying cases to be classified by adopting document sources, classifying cases by segmentation and classifying keywords to obtain corresponding classification results;
judging whether the corresponding classification results are of the same category;
if the corresponding classification results are of the same category, taking the classification results as case categories;
storing the case categories and the corresponding litigation cases to form a database, wherein the database is used for operators to call the related litigation cases according to the categories;
and if the corresponding classification results are not the same categories, inputting the cases to be classified into a classification model for classification to obtain case categories, and executing the case categories and the corresponding litigation cases to be stored to form a database, wherein the database is used for operators to call related litigation cases according to the categories.
The further technical scheme is as follows: the collection rules comprise a data interface collection mode, a website customized crawler mode, a data import mode and a centralized collection mode.
The further technical scheme is as follows: the method comprises the following steps of classifying cases to be classified by document sources, classifying cases by segmentation and classifying keywords to obtain corresponding classification results, wherein the classification results comprise:
classifying the cases to be classified according to document sources to obtain source types;
extracting case related keywords from the cases to be classified, and determining case routes according to the case related keywords to obtain case route categories;
extracting keywords related to the field of the fair litigation from the case to be classified, and classifying the extracted keywords to obtain a field category;
and integrating the source category, case category and field category to form a corresponding classification result.
The further technical scheme is as follows: the determining whether the corresponding classification results are of the same category includes:
judging whether at least two of the source category, case category and field category belong to the same category;
if at least two of the source category, case category and field category belong to the same category, the corresponding classification results are the same category;
if at least two of the source category, case category and field category do not belong to the same category, the corresponding classification results are not the same category.
The further technical scheme is as follows: the classification model is obtained by training a machine learning algorithm by using a plurality of litigation cases with category labels as a sample set.
The invention also provides a litigation case classification device, which comprises:
the rule determining unit is used for determining the acquisition rule according to the sources of different litigation cases;
the channel acquisition unit is used for acquiring an update channel of the litigation cases needing to be classified so as to obtain case sources;
the case acquisition unit is used for crawling related litigation cases from a case source according to the acquisition rules to obtain cases to be classified;
the result obtaining unit is used for adopting document source classification, case segmentation classification and keyword classification according to the cases to be classified so as to obtain corresponding classification results;
a judging unit for judging whether the corresponding classification results are of the same category;
the classification forming unit is used for taking the classification result as a case classification if the corresponding classification results are the same;
the storage unit is used for storing the case categories and the corresponding litigation cases to form a database, wherein the database is used for operators to call the related litigation cases according to the categories;
and the model classification unit is used for inputting the cases to be classified into the classification model to be classified to obtain case categories if the corresponding classification results are not of the same category, and executing the case categories and the corresponding litigation cases to form a database, wherein the database is used for operators to call the related litigation cases according to the categories.
The further technical scheme is as follows: the result acquisition unit includes:
the source classification subunit is used for classifying the cases to be classified according to document sources to obtain source classes;
the case classification subunit is used for extracting case related keywords from the cases to be classified and determining case routes according to the case related keywords so as to obtain case route categories;
the keyword classification subunit is used for extracting keywords related to the field of the public welfare actions from the cases to be classified and classifying the extracted keywords to obtain the field category;
and the integration subunit is used for integrating the source category, the case category and the field category to form a corresponding classification result.
The further technical scheme is as follows: the judging unit is used for judging whether at least two of the source type, the case type and the field type belong to the same type; if at least two of the source category, case category and field category belong to the same category, the corresponding classification results are the same category; if at least two of the source category, case category and field category do not belong to the same category, the corresponding classification results are not the same category.
The invention also provides computer equipment which comprises a memory and a processor, wherein the memory is stored with a computer program, and the processor realizes the method when executing the computer program.
The invention also provides a storage medium storing a computer program which, when executed by a processor, is operable to carry out the method as described above.
Compared with the prior art, the invention has the beneficial effects that: according to the method, the collection rules are set firstly, collection in different modes is carried out on litigation cases in different updating channels, document source classification is carried out on the collected litigation cases, classification is carried out on the cases through three classification modes of segmentation classification and keyword classification, if two results belong to the same category, the category is used as the category of the litigation cases, otherwise, a classification model obtained through machine learning algorithm training is adopted for automatic classification, the category and the corresponding litigation cases are combined to form a database, accurate classification of the existing litigation cases is achieved, related operators can directly and quickly find the related litigation cases from the classified categories, and the information acquisition efficiency of the related operators of the litigation cases is improved.
The invention is further described below with reference to the accompanying drawings and specific embodiments.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of an application scenario of the method for classifying litigation cases provided by the embodiment of the invention;
FIG. 2 is a schematic flow chart of a method for classifying litigation cases according to an embodiment of the present invention;
FIG. 3 is a sub-flow diagram of a method for classifying litigation cases according to an embodiment of the present invention;
FIG. 4 is a sub-flow diagram of a method for classifying litigation cases according to an embodiment of the present invention;
FIG. 5 is a schematic block diagram of a litigation case classification device provided by an embodiment of the present invention;
FIG. 6 is a schematic block diagram of a result acquisition unit of a litigation case classification device provided by an embodiment of the present invention;
FIG. 7 is a schematic block diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic view illustrating an application scenario of the method for classifying litigation cases according to the embodiment of the present invention. Fig. 2 is a schematic flow chart of a litigation case classification method provided by the embodiment of the invention. The litigation case classification method is applied to a server. The server sets collection rules, collects and classifies litigation cases from different sources to form a database, the server is communicated with the terminal, the input categories of the terminal can be called by the server to obtain corresponding litigation cases in the database and displayed on the terminal, so that related operators can directly and quickly find related litigation cases from the classified categories, and the information acquisition efficiency of the related operators of the litigation cases is improved.
Fig. 2 is a schematic flow chart of a method for classifying litigation cases according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S180.
And S110, determining acquisition rules according to the sources of different litigation cases.
In this embodiment, the acquisition rule refers to an acquisition rule of a fair litigation case for different information sources.
Specifically, the collection rules include a data interface collection mode, a website customized crawler mode, a data import mode and a centralized collection mode.
The sources of the public welfare litigation cases mainly comprise three parts: the information from the internet covers public opinion, news, post bar, microblog and other resources. The difficulty of obtaining information from a two-method connection platform and a service system in a detection hospital is to connect complex and various interfaces, namely an interface tool aiming at the service system needs to be developed, so that the difficulty of connection is reduced, and cases from such sources need to be collected in a data interface collection mode. Aiming at information from a website of an administrative organ, a directional real-time monitoring mode is adopted, and the latest information is acquired in a quasi real-time manner while the information is released; aiming at information from the Internet, a technology for acquiring effective information from mass data is a key for information collection, and cases can be acquired by adopting a website customized crawler mode; the knowledge information acquisition technology based on semantic understanding is researched, the search range in the information acquisition process is actively narrowed, and accurate retrieval and acquisition are carried out in a small range.
Specifically, the information acquisition characteristics of different sources are summarized and analyzed, an information acquisition mechanism, that is, an acquisition rule is determined, and of course, an adaptive information acquisition model can be constructed according to the acquisition rule to acquire cases.
S120, obtaining an updating channel of the litigation cases needing to be classified so as to obtain case sources.
In this embodiment, the source of the case is the source of acquisition of the litigation case.
S130, crawling related litigation cases from case sources according to the acquisition rules to obtain cases to be classified.
In the embodiment, the case to be classified refers to a case obtained by crawling litigation needing to be classified.
At present, information sources acquire numerous channels and have huge data volume, so that when new data exist in any information source, data acquisition is started, for example, when content of concerned big V or super call is updated in a social media information source such as a microblog information source, a server starts an information acquisition mechanism, at the moment, which channel in the information source is updated needs to be identified, for example, the microblog is updated, at the moment, a microblog acquisition mode, namely a website customized crawler mode is started, and a crawler starts according to the updated content; on the whole, the channel is determined and updated, the acquisition rule is determined according to the channel, the corresponding tool is used for capturing cases, the cases are stored after being captured, and the acquisition of the public welfare litigation cases is realized and the full storage is carried out.
S140, according to the cases to be classified, document source classification, case segmentation classification and keyword classification are adopted to obtain corresponding classification results.
In this embodiment, the classification result refers to a result obtained by classifying the litigation case from the perspective of the source, the case origin, and the keyword.
In an embodiment, referring to fig. 3, the step S140 may include steps S141 to S144.
And S141, classifying the cases to be classified according to document sources to obtain source types.
In this embodiment, the source type is a type formed by classifying cases to be classified according to sources of documents related to the cases.
Cases to be classified are provided with corresponding source documents, and different source documents are divided according to different sources, such as documents from the ecological environment bureau, and the documents are preliminarily judged to belong to the attribute of 'environmental resources'. The method is realized according to authority lists issued by different regions and different organizations, and is analyzed according to the public welfare litigation range mentioned in the authority list corresponding to the cases to be classified, and the municipal ecological environment bureau total responsibility list is taken as an example for explanation:
the list of responsibility can be updated along with the management of the city government, for example, if the law that the penalty mechanism here is the ecological environment bureau punishment on the actions of causing the dispersion, loss and leakage of dangerous wastes or causing other environmental pollution without taking corresponding precautionary measures is mentioned, the used law is the ' solid waste pollution environment prevention and control law of the people's republic of China ', the law is a special law in the aspect of environment, and the ' city ecological environment bureau ' is finally determined to have the management authority of environmental resources through keyword analysis, and the issued document belongs to the field of ' environmental resources ', namely, the source category is the field of environmental resources.
S142, extracting case and route related keywords from the cases to be classified, and determining the case and route according to the case and route related keywords to obtain case and route categories.
In this embodiment, the case classification means a classification formed by analyzing cases to be classified according to case classification and then determining a specific case in the field.
Specifically, for case information such as past public welfare litigation cases, because the number of cases is large and the cases are complicated, and meanwhile, case sources may include multi-channel sources such as criminal cases, civil cases, administrative cases and the like, in order to save a large amount of manpower and operation cost, cases are preferentially classified, and corresponding case categories are formed in the fields of the related cases. If there are many criminal names in criminal law, there are many laws which can relate to the food and drug safety field, one of them is "producing and selling counterfeit drugs guilty", the related information of public welfare action case is obtained, and the case summary key word under the case is analyzed to know whether the case relates to the drug range, and after the analysis, the case is found to be covered, the case "producing and selling counterfeit drugs guilty" is classified as the food and drug safety field, that is, the case is classified as the food and drug safety field.
S143, extracting keywords related to the field of the public welfare litigation from the cases to be classified, and classifying the extracted keywords to obtain the field category.
In this embodiment, the domain category refers to a category that is extracted from keywords related to five major domains related to fair litigation and then located to the corresponding category.
Researching the characteristics and the structural expression of main clue elements in different fields of the public welfare actions, constructing a sample semantic dictionary suitable for describing case clues, namely determining keywords related to the five fields related to the public welfare actions, and particularly forming keyword division according to the respective characteristics of the five fields of the public welfare actions, for example: the field of environmental pollution includes many sub-items, such as water pollution, soil pollution, solid pollution, air pollution, etc., wherein the category of soil pollution requires to know the content of pollution condition of soil pollution, the type of discharged dumping dangerous objects, weight, name of toxic pollutants, etc., and the summary is continued to be continued according to the category, such as "name of toxic substance", the content that can be limited below the category may be "mercury, copper, zinc, lead, cadmium, beryllium, barium, nickel, arsenic, total chromium, hexavalent chromium, selenium", and if the name of the above-mentioned pollutant is touched, the information is classified into the field of environmental resource with a high probability, that is, the field category is the field of environmental resource.
S144, integrating the source category, case category and field category to form a corresponding classification result.
S150, judging whether the corresponding classification results are of the same category or not;
in an embodiment, referring to fig. 4, the step S150 may include steps S151 to S153.
S151, judging whether at least two of the source type, the case type and the field type belong to the same type;
s152, if at least two of the source category, case category and field category belong to the same category, the corresponding classification results are the same category;
s153, if at least two items in the source category, case category and field category do not belong to the same category, the corresponding classification results are not the same category.
S160, if the corresponding classification results are of the same type, taking the classification results as case types;
s170, storing the case categories and the corresponding litigation cases to form a database, wherein the database is used for operators to call the related litigation cases according to the categories;
s180, if the corresponding classification results are not the same types, inputting the cases to be classified into a classification model for classification to obtain case types;
the step S170 is performed.
Specifically, the database is used for the operator to call the related litigation cases after inputting the category through the terminal and displaying the related litigation cases on the terminal.
Wherein the classification model is obtained by training a machine learning algorithm by using a plurality of litigation cases with category labels as a sample set.
The method is characterized in that the method comprises the steps of mining different field elements of case study and litigation, identifying and extracting from multiple dimensions such as fields, time and the like based on semantic features, realizing division of case classification and preparing for a subsequent unsupervised learning classification technology.
The classification model takes completely determined classification content as sample data, namely litigation cases which are accurately classified as a sample set, and model training is carried out until the model can automatically and accurately identify the specific class of the input cases; for example: if the sample set is a public welfare litigation case and the corresponding category label is the field of environmental resources, the cases are collected to form a sample set, wherein all text contents represent the public welfare litigation-environmental resource direction by default and a fixed environmental resource algorithm model is constructed through a machine learning algorithm; at the moment, a newly added case appears, the type of the case does not need to be subdivided or known, text information in the newly added case is directly analyzed into text vectors through a machine learning algorithm, the text vectors are compared with the environmental resource type model with the concentrated sample, a similarity is formed after comparison, if the similarity reaches a certain set threshold value, the setting of the threshold value is obtained through feedback synthesis according to the actual situation, the similarity represents the environmental resource field, and if the similarity is lower than the threshold value, the similarity does not belong to the environmental resource field.
Researching a fusion mechanism of similar clue characteristic semantic association, matching the relevant information of individual cases and similar case contents from discovered case clues, researching information aggregation analysis technologies such as public law enforcement facts and invasion consequences facing the case clues by using methods such as sequence pattern mining analysis and the like, and realizing the fusion of case clue information.
According to the multiple classification modes, each litigation case is marked with labels in different fields, judgment can be carried out according to the document source classification, case segmentation classification and keyword classification, if two of the three items are consistent, a classification model does not need to be determined, if three different classification results exist in the three classification modes, the classification mode of the classification model is automatically executed, relevant contents are displayed, meanwhile, the information is submitted to a professional, judgment is carried out, the final judgment result is fed back to a database, and the accuracy of data judgment is further improved. Each litigation case obtained from the information source is divided into ' commonweal litigation information ' and ' not ' commonweal litigation information ', and for the ' commonweal litigation ' information, which type of the five fields of the commonweal litigation is further obtained, for example, the environmental resource, the litigation cases of the environmental resource are collected and pushed to the front of the operator through the commonweal litigation information source platform, namely, the platform on the terminal, and the operator only needs to search the commonweal litigation information to be obtained in the platform, for example, soil pollution in the content of the search environmental resource type, and then related information can be inquired in a result page, so that the difficulty of information obtaining and information classification when the operator transacts the commonweal litigation cases is greatly facilitated.
The method comprises the steps of taking a plurality of litigation cases as samples, carrying out cluster analysis on the collected sample data from the perspective of clue case formation, carrying out comprehensive analysis and evaluation on clues of the cases from the aspects of clue integrity, damage severity, social influence range and the like, realizing classification and classification of the clues of the cases, and laying a solid foundation for subsequently constructing case handling.
The system can inform the department of civil administrative inspection in time to transfer related case clues to the corresponding department in time when finding that public interests are infringed in the case handling process, form a normalized mechanism of regularly inspecting and transferring the clues at any time to allow more case clues to enter a supervision view and effectively acquire the clues in time; the relation between the prosecution procedure and the litigation can be processed, the request report can be formed in an auxiliary mode, the understanding and the support of the welfare prosecution work of the whole social level are formed, the trial range and the conditions can be strictly grasped, the examination and approval procedure can be strictly fulfilled, the work specification development and the orderly promotion are ensured, the focus problems such as illegal performance of administrative behaviors, social public damage degree and the like can be quickly positioned, the argumentation can be expanded, and the core problem of case handling can be held.
According to the litigation case classification method, the collection rules are set firstly, the litigation cases in different modes are collected according to different update channels, the collected litigation cases are classified according to document sources, cases are classified according to three classification modes of segmentation classification and keyword classification, if two results belong to the same category, the category is used as the category of the litigation cases, otherwise, the classification model obtained by machine learning algorithm training is adopted for automatic classification, the category and the corresponding litigation cases are combined to form a database, the existing litigation cases are accurately classified, related operators can directly and quickly find the related litigation cases from the classified categories, and the information acquisition efficiency of the related operators of the litigation cases is improved.
Fig. 5 is a schematic block diagram of a litigation case classification device 300 provided by an embodiment of the invention. As shown in FIG. 5, the present invention also provides a litigation case classifying device 300 corresponding to the above litigation case classifying method. The litigation case classification apparatus 300 includes means for executing the aforementioned litigation case classification method, and may be configured in a server. Specifically, referring to fig. 5, the litigation case classifying device 300 includes a rule determining unit 301, a channel obtaining unit 302, a case obtaining unit 303, a result obtaining unit 304, a judging unit 305, a category forming unit 306, a storage unit 307, and a model classifying unit 308.
A rule determining unit 301, configured to determine an acquisition rule according to the sources of different litigation cases; a channel obtaining unit 302, configured to obtain an update channel of litigation cases that need to be classified, so as to obtain a case source; the case obtaining unit 303 is configured to crawl related litigation cases from a case source according to the collection rule to obtain cases to be classified; a result obtaining unit 304, configured to obtain a corresponding classification result by document source classification, case segmentation classification, and keyword classification according to the case to be classified; a judging unit 305 for judging whether the corresponding classification results are of the same category; a category forming unit 306, configured to take the classification result as a case category if the corresponding classification results are of the same category; the storage unit 307 is configured to store the case categories and corresponding litigation cases to form a database, where the database is used for an operator to retrieve related litigation cases according to the categories; and the model classification unit 308 is configured to, if the corresponding classification results are not of the same category, input the case to be classified into a classification model for classification to obtain a case category, and execute the storing of the case category and the corresponding litigation case to form a database, where the database is used for an operator to call up a relevant litigation case according to the category.
In one embodiment, as shown in FIG. 6, the result obtaining unit 304 includes a source classification subunit 3041, a pattern classification subunit 3042, a keyword classification subunit 3043, and an integration subunit 3044.
A source classification subunit 3041, configured to classify the cases to be classified according to document sources to obtain source categories; a case classification subunit 3042, configured to extract related keywords from the cases to be classified, and determine a case from the related keywords, so as to obtain a case category; a keyword classification subunit 3043, configured to extract keywords related to the field of the fair litigation from the case to be classified, and classify the extracted keywords to obtain a field category; the integrating subunit 3044 is configured to integrate the source category, the case category, and the field category to form a corresponding classification result.
In addition, the determining unit 305 is configured to determine whether at least two of the source category, case category, and domain category belong to the same category; if at least two of the source category, case category and field category belong to the same category, the corresponding classification results are the same category; if at least two of the source category, case category and field category do not belong to the same category, the corresponding classification results are not the same category.
It should be noted that, as will be clear to those skilled in the art, the detailed implementation process of the litigation case classifying device 300 and each unit can refer to the corresponding description in the foregoing method embodiment, and for convenience and brevity of description, no further description is provided herein.
The litigation case sorting apparatus 300 may be embodied as a computer program that may be executed on a computer device as shown in fig. 7.
Referring to fig. 7, fig. 7 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a server, wherein the server may be an independent server or a server cluster composed of a plurality of servers.
Referring to fig. 7, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer programs 5032 include program instructions that, when executed, cause the processor 502 to perform a method of litigation case classification.
The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the execution of the computer program 5032 on the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be caused to execute a litigation case classification method.
The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 7 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation of the computer device 500 to which the present application may be applied, and that a particular computer device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following steps:
determining an acquisition rule according to the sources of different litigation cases; acquiring an update channel of litigation cases needing to be classified so as to obtain case sources; crawling related litigation cases from case sources according to the acquisition rules to obtain cases to be classified; classifying cases to be classified by adopting document sources, classifying cases by segmentation and classifying keywords to obtain corresponding classification results; judging whether the corresponding classification results are of the same category; if the corresponding classification results are of the same category, taking the classification results as case categories; storing the case categories and the corresponding litigation cases to form a database, wherein the database is used for operators to call the related litigation cases according to the categories; and if the corresponding classification results are not the same categories, inputting the cases to be classified into a classification model for classification to obtain case categories, and executing the case categories and the corresponding litigation cases to be stored to form a database, wherein the database is used for operators to call related litigation cases according to the categories.
The collection rules comprise a data interface collection mode, a website customized crawler mode, a data import mode and a centralized collection mode.
The classification model is obtained by training a machine learning algorithm by using a plurality of litigation cases with category labels as a sample set.
In an embodiment, when implementing the step of obtaining the corresponding classification result by document source classification, case segmentation classification and keyword classification according to the case to be classified, the processor 502 specifically implements the following steps:
classifying the cases to be classified according to document sources to obtain source types; extracting case related keywords from the cases to be classified, and determining case routes according to the case related keywords to obtain case route categories; extracting keywords related to the field of the fair litigation from the case to be classified, and classifying the extracted keywords to obtain a field category; and integrating the source category, case category and field category to form a corresponding classification result.
In an embodiment, when the processor 502 implements the step of determining whether the corresponding classification results are the same category, the following steps are specifically implemented:
judging whether at least two of the source category, case category and field category belong to the same category; if at least two of the source category, case category and field category belong to the same category, the corresponding classification results are the same category; if at least two of the source category, case category and field category do not belong to the same category, the corresponding classification results are not the same category.
It should be understood that, in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program includes program instructions, and the computer program may be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.
Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program, wherein the computer program, when executed by a processor, causes the processor to perform the steps of:
determining an acquisition rule according to the sources of different litigation cases; acquiring an update channel of litigation cases needing to be classified so as to obtain case sources; crawling related litigation cases from case sources according to the acquisition rules to obtain cases to be classified; classifying cases to be classified by adopting document sources, classifying cases by segmentation and classifying keywords to obtain corresponding classification results; judging whether the corresponding classification results are of the same category; if the corresponding classification results are of the same category, taking the classification results as case categories; storing the case categories and the corresponding litigation cases to form a database, wherein the database is used for operators to call the related litigation cases according to the categories; and if the corresponding classification results are not the same categories, inputting the cases to be classified into a classification model for classification to obtain case categories, and executing the case categories and the corresponding litigation cases to be stored to form a database, wherein the database is used for operators to call related litigation cases according to the categories.
The collection rules comprise a data interface collection mode, a website customized crawler mode, a data import mode and a centralized collection mode.
The classification model is obtained by training a machine learning algorithm by using a plurality of litigation cases with category labels as a sample set.
In an embodiment, when the processor executes the computer program to implement the step of obtaining the corresponding classification result by document source classification, case segmentation classification and keyword classification according to the case to be classified, the following steps are specifically implemented:
classifying the cases to be classified according to document sources to obtain source types; extracting case related keywords from the cases to be classified, and determining case routes according to the case related keywords to obtain case route categories; extracting keywords related to the field of the fair litigation from the case to be classified, and classifying the extracted keywords to obtain a field category; and integrating the source category, case category and field category to form a corresponding classification result.
In an embodiment, when the processor executes the computer program to implement the step of determining whether the corresponding classification results are of the same category, the following steps are specifically implemented:
judging whether at least two of the source category, case category and field category belong to the same category; if at least two of the source category, case category and field category belong to the same category, the corresponding classification results are the same category; if at least two of the source category, case category and field category do not belong to the same category, the corresponding classification results are not the same category.
The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of litigation case classification, comprising:
determining an acquisition rule according to the sources of different litigation cases;
acquiring an update channel of litigation cases needing to be classified so as to obtain case sources;
crawling related litigation cases from case sources according to the acquisition rules to obtain cases to be classified;
classifying cases to be classified by adopting document sources, classifying cases by segmentation and classifying keywords to obtain corresponding classification results;
judging whether the corresponding classification results are of the same category;
if the corresponding classification results are of the same category, taking the classification results as case categories;
storing the case categories and the corresponding litigation cases to form a database, wherein the database is used for operators to call the related litigation cases according to the categories;
and if the corresponding classification results are not the same categories, inputting the cases to be classified into a classification model for classification to obtain case categories, and executing the case categories and the corresponding litigation cases to be stored to form a database, wherein the database is used for operators to call related litigation cases according to the categories.
2. The method of claim 1, wherein the collection rules comprise a data interface collection method, a web site customized crawler collection method, a data import method, and a centralized collection method.
3. The method for classifying litigation cases as recited in claim 1, wherein the step of classifying cases to be classified by document source, case by case segmentation and keyword to obtain corresponding classification results comprises:
classifying the cases to be classified according to document sources to obtain source types;
extracting case related keywords from the cases to be classified, and determining case routes according to the case related keywords to obtain case route categories;
extracting keywords related to the field of the fair litigation from the case to be classified, and classifying the extracted keywords to obtain a field category;
and integrating the source category, case category and field category to form a corresponding classification result.
4. The method of litigation case classification recited in claim 1, wherein the determining whether the corresponding classification results are of the same category comprises:
judging whether at least two of the source category, case category and field category belong to the same category;
if at least two of the source category, case category and field category belong to the same category, the corresponding classification results are the same category;
if at least two of the source category, case category and field category do not belong to the same category, the corresponding classification results are not the same category.
5. The method of claim 1, wherein the classification model is a model obtained by training a machine learning algorithm with a number of litigation cases with category labels as a sample set.
6. A litigation case sorting apparatus, comprising:
the rule determining unit is used for determining the acquisition rule according to the sources of different litigation cases;
the channel acquisition unit is used for acquiring an update channel of the litigation cases needing to be classified so as to obtain case sources;
the case acquisition unit is used for crawling related litigation cases from a case source according to the acquisition rules to obtain cases to be classified;
the result obtaining unit is used for adopting document source classification, case segmentation classification and keyword classification according to the cases to be classified so as to obtain corresponding classification results;
a judging unit for judging whether the corresponding classification results are of the same category;
the classification forming unit is used for taking the classification result as a case classification if the corresponding classification results are the same;
the storage unit is used for storing the case categories and the corresponding litigation cases to form a database, wherein the database is used for operators to call the related litigation cases according to the categories;
and the model classification unit is used for inputting the cases to be classified into the classification model to be classified to obtain case categories if the corresponding classification results are not of the same category, and executing the case categories and the corresponding litigation cases to form a database, wherein the database is used for operators to call the related litigation cases according to the categories.
7. The litigation case classification apparatus of claim 6, wherein the result obtaining unit comprises:
the source classification subunit is used for classifying the cases to be classified according to document sources to obtain source classes;
the case classification subunit is used for extracting case related keywords from the cases to be classified and determining case routes according to the case related keywords so as to obtain case route categories;
the keyword classification subunit is used for extracting keywords related to the field of the public welfare actions from the cases to be classified and classifying the extracted keywords to obtain the field category;
and the integration subunit is used for integrating the source category, the case category and the field category to form a corresponding classification result.
8. The litigation case classifying device of claim 7, wherein the determining unit is configured to determine whether at least two of the source category, case category and domain category belong to the same category; if at least two of the source category, case category and field category belong to the same category, the corresponding classification results are the same category; if at least two of the source category, case category and field category do not belong to the same category, the corresponding classification results are not the same category.
9. A computer arrangement, characterized in that the computer arrangement comprises a memory having stored thereon a computer program and a processor implementing the method according to any of claims 1-5 when executing the computer program.
10. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 5.
CN202010358113.4A 2020-04-29 2020-04-29 Litigation case classification method, litigation case classification device, computer equipment and storage medium Active CN111522955B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010358113.4A CN111522955B (en) 2020-04-29 2020-04-29 Litigation case classification method, litigation case classification device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010358113.4A CN111522955B (en) 2020-04-29 2020-04-29 Litigation case classification method, litigation case classification device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111522955A true CN111522955A (en) 2020-08-11
CN111522955B CN111522955B (en) 2023-10-03

Family

ID=71903213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010358113.4A Active CN111522955B (en) 2020-04-29 2020-04-29 Litigation case classification method, litigation case classification device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111522955B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270633A (en) * 2020-10-26 2021-01-26 河南金明源信息技术有限公司 Public welfare litigation clue studying and judging system and method based on big data drive
CN112801489A (en) * 2021-01-21 2021-05-14 招商银行股份有限公司 Litigation case risk detection method, litigation case risk detection device, litigation case risk detection equipment and readable storage medium
CN113222443A (en) * 2021-05-25 2021-08-06 支付宝(杭州)信息技术有限公司 Case shunting method and device
CN113220888A (en) * 2021-06-01 2021-08-06 上海交通大学 Case clue element extraction method and system based on Ernie model
CN115511668A (en) * 2022-10-12 2022-12-23 金华智扬信息技术有限公司 Case supervision method, device, equipment and medium based on artificial intelligence
CN112801489B (en) * 2021-01-21 2024-05-31 招商银行股份有限公司 Litigation case risk detection method, litigation case risk detection device, litigation case risk detection equipment and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170011481A1 (en) * 2014-02-04 2017-01-12 Ubic, Inc. Document analysis system, document analysis method, and document analysis program
CN108021605A (en) * 2017-10-30 2018-05-11 北京奇艺世纪科技有限公司 A kind of keyword classification method and apparatus
CN110909914A (en) * 2019-10-12 2020-03-24 中国平安财产保险股份有限公司 Litigation success rate prediction method, litigation success rate prediction device, computer device, and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170011481A1 (en) * 2014-02-04 2017-01-12 Ubic, Inc. Document analysis system, document analysis method, and document analysis program
CN108021605A (en) * 2017-10-30 2018-05-11 北京奇艺世纪科技有限公司 A kind of keyword classification method and apparatus
CN110909914A (en) * 2019-10-12 2020-03-24 中国平安财产保险股份有限公司 Litigation success rate prediction method, litigation success rate prediction device, computer device, and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270633A (en) * 2020-10-26 2021-01-26 河南金明源信息技术有限公司 Public welfare litigation clue studying and judging system and method based on big data drive
CN112270633B (en) * 2020-10-26 2024-02-06 河南金明源信息技术有限公司 Public welfare litigation clue studying and judging system and method based on big data driving
CN112801489A (en) * 2021-01-21 2021-05-14 招商银行股份有限公司 Litigation case risk detection method, litigation case risk detection device, litigation case risk detection equipment and readable storage medium
CN112801489B (en) * 2021-01-21 2024-05-31 招商银行股份有限公司 Litigation case risk detection method, litigation case risk detection device, litigation case risk detection equipment and readable storage medium
CN113222443A (en) * 2021-05-25 2021-08-06 支付宝(杭州)信息技术有限公司 Case shunting method and device
CN113220888A (en) * 2021-06-01 2021-08-06 上海交通大学 Case clue element extraction method and system based on Ernie model
CN115511668A (en) * 2022-10-12 2022-12-23 金华智扬信息技术有限公司 Case supervision method, device, equipment and medium based on artificial intelligence
CN115511668B (en) * 2022-10-12 2023-09-08 金华智扬信息技术有限公司 Case supervision method, device, equipment and medium based on artificial intelligence

Also Published As

Publication number Publication date
CN111522955B (en) 2023-10-03

Similar Documents

Publication Publication Date Title
CN111522955A (en) Litigation case classification method and device, computer equipment and storage medium
Abarenkov et al. Protax‐fungi: a web‐based tool for probabilistic taxonomic placement of fungal internal transcribed spacer sequences
CN100520782C (en) News keyword abstraction method based on word frequency and multi-component grammar
CN113282955B (en) Method, system, terminal and medium for extracting privacy information in privacy policy
US10860565B2 (en) Database update and analytics system
CN110287292B (en) Judgment criminal measuring deviation degree prediction method and device
CN108733791B (en) Network event detection method
CN110737821B (en) Similar event query method, device, storage medium and terminal equipment
CN109104421B (en) Website content tampering detection method, device, equipment and readable storage medium
CN108520007B (en) Web page information extracting method, storage medium and computer equipment
CN113239130A (en) Criminal judicial literature-based knowledge graph construction method and device, electronic equipment and storage medium
TWI518631B (en) File classification survey system, document classification survey method and file classification survey program
CN111026948A (en) Data query system serving monitoring authorities
US11715204B2 (en) Adaptive machine learning system for image-based biological sample constituent analysis
CN115936932A (en) Method and device for processing judicial documents, electronic equipment and storage medium
CN114860903A (en) Event extraction, classification and fusion method oriented to network security field
CN114238735A (en) Intelligent internet data acquisition method
CN113159363A (en) Event trend prediction method based on historical news reports
CN113158686A (en) Network culture management compliance detection method and device, readable medium and electronic equipment
CN109558418B (en) Method for automatically identifying information
CN111507868A (en) Network right-maintaining system and method
Tsikrika et al. Focussed crawling of environmental web resources: A pilot study on the combination of multimedia evidence.
CN114880588B (en) News heat prediction method based on knowledge graph
Tempelmeier et al. Ovid: A Machine Learning Approach for Automated Vandalism Detection in OpenStreetMap
CN114610982B (en) Computer network data acquisition, analysis and management method, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant