CN113641825A - Smart court system big data processing method and device based on objective information theory - Google Patents

Smart court system big data processing method and device based on objective information theory Download PDF

Info

Publication number
CN113641825A
CN113641825A CN202111201097.9A CN202111201097A CN113641825A CN 113641825 A CN113641825 A CN 113641825A CN 202111201097 A CN202111201097 A CN 202111201097A CN 113641825 A CN113641825 A CN 113641825A
Authority
CN
China
Prior art keywords
data
text
subdata
measurement
text set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111201097.9A
Other languages
Chinese (zh)
Other versions
CN113641825B (en
Inventor
许建峰
孙福辉
陈奇伟
李晓慧
刘振宇
陈宝贵
余超
王晓燕
张雅雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
People's Court Information Technology Service Center
Original Assignee
People's Court Information Technology Service Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by People's Court Information Technology Service Center filed Critical People's Court Information Technology Service Center
Priority to CN202111201097.9A priority Critical patent/CN113641825B/en
Publication of CN113641825A publication Critical patent/CN113641825A/en
Application granted granted Critical
Publication of CN113641825B publication Critical patent/CN113641825B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Technology Law (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The method comprises the steps of obtaining data text sets from a target court system, and determining a plurality of subdata text sets in each data text set according to a court data model; according to a measurement model of an objective information theory, a measurement data set is extracted from the subdata text set, an information value of each subdata text set in the target court system is obtained through calculation, a data scoring result of the target court system is obtained through calculation according to the information value of the subdata text set, and when the data scoring result does not meet a preset requirement, an adjusting instruction is output to the target court system to optimize data in the target court system.

Description

Smart court system big data processing method and device based on objective information theory
Technical Field
The text belongs to the technical field of data processing, and particularly relates to a big data processing method and device of a smart court system based on objective information theory.
Background
Currently, a 5V metric method is commonly used to evaluate the quality of big data, and the 5V metric means: volume (Volume), speed (Velocity), richness (Variety), Value (Value), and authenticity (Veracity). However, the 5V metrology method has the following drawbacks: (1) metric definition is unclear: because the big data has wide sources, different industries, different systems and different personnel can understand the same data differently, and the same big data can form different evaluation conclusions, so that complete unification is difficult to achieve. (2) The measurement dimensions are not comprehensive enough: the big data of the intelligent court relates to a plurality of links such as gathering, quality inspection, treatment, application and the like; the data is composed of data in various forms such as texts, images, audios and videos, the data is in various formats such as structured, semi-structured and unstructured, and the existing 5V measurement method cannot meet the requirement of comprehensive evaluation of big data of a smart court. (3) The measurement index implementation is difficult to land: the big data generated by different industries are different, and the specific implementation mode should be different when the big data generated by each industry is measured. However, the existing 5V measurement system is not specially designed for measuring large data of a certain industry, measurement requirements of specific industries are not considered, and measurement results are difficult to show the real current situation of large data of the industry.
In view of this, in the prior art, it is difficult to perform reliable, comprehensive and quantitative analysis on data of the smart court system, so that effective guidance suggestions cannot be given to the operation of the smart court, and therefore a technical scheme capable of realizing quantitative processing of the data of the smart court and further improving the data quality of the smart court is urgently needed in the prior art.
Disclosure of Invention
In view of the above problems in the prior art, an object of the present disclosure is to provide a method and an apparatus for processing big data of a smart court system based on objective information theory, which can improve reliability of evaluation of operational data of the court system.
In order to solve the technical problems, the specific technical scheme is as follows:
in one aspect, provided herein is a method for processing big data of a smart court system based on objective information theory, the method comprising:
acquiring a data text set in a specified time period from a target court system, wherein the data text set comprises a rule text set, an entity text set and a case text set;
determining a plurality of subdata text sets in each data text set according to the data text sets and a court data model;
according to a measurement model of an objective information theory, extracting measurement data corresponding to each measurement item in the measurement model from the sub-data text set to obtain a measurement data set;
determining different measure item combinations corresponding to different sub-data text sets and based on the measurement model;
according to the measure item combination, performing cluster analysis on the measure data sets to obtain a measure data combination for each subdata text set;
calculating and obtaining an information value of each subdata text set in the target court system according to the measurement data combination, wherein the information value is used for representing the value of the subdata text set;
calculating to obtain a data scoring result of the target court system according to the information value of each subdata text set in the target court system and by combining the data text set and the subdata text set in a court data model;
and when the data scoring result does not meet the preset requirement, outputting an adjusting instruction to the target court system to optimize the data in the target court system.
Further, the acquiring a data text set in a specified time period from the target court system includes:
determining the data type of a data text set to be extracted;
determining the storage position of the text set of the data to be extracted according to the data type of the text set of the data to be extracted;
and extracting the data text in a specified time period according to the storage position of the data text set to be extracted to form the data text set.
Further, the measurement model of the information theory comprises the breadth, the fineness, the duration, the richness, the volume, the delay, the coverage, the reality and the adaptation degree;
the method for extracting the measurement data corresponding to each measurement item in the measurement model from the sub-data text set according to the measurement model of the objective information theory to obtain a measurement data set includes:
determining a measurement calculation formula of each measurement item according to a measurement model of the objective information theory;
determining the measurement data required by each measurement item according to the measurement calculation formula;
and extracting the measurement data corresponding to each measurement item from the sub-data text set to obtain a measurement data set.
Further, the extracting metric data corresponding to each measure item from the sub-data text set to obtain a metric data set includes:
for each child data text set:
acquiring subdata texts in the subdata text set;
sequentially extracting the measurement data corresponding to each measurement item from the subdata text to obtain an initial measurement data set;
calculating the standard deviation of the measurement data corresponding to each measure item;
and screening out the metric data meeting preset conditions from the initial metric data set according to the standard deviation of the metric data corresponding to each metric item to obtain a metric data set.
Further, the preset conditions are as follows:
Figure 100002_DEST_PATH_IMAGE002
wherein,
Figure 100002_DEST_PATH_IMAGE004
is as followsiThe average value of the measurement data corresponding to the individual measurement items,
Figure 100002_DEST_PATH_IMAGE006
first, theiThe standard deviation of the metric data corresponding to the individual measure items,
Figure 100002_DEST_PATH_IMAGE008
is as followsiThe individual measure items correspond tojAnd (4) measuring data.
Further, the determining different combinations of measure items based on the metric model corresponding to different sub-data text sets includes:
acquiring the attribute of the text content of the subdata text set;
determining a metric attribute of each measure item in a metric model of the information theory;
calculating to obtain the attribute association degree of the sub data text set and each measure item according to the attribute of the text content and the measure attribute;
determining the measure items with the attribute relevance exceeding a preset value as measure items corresponding to the sub-data text sets so as to obtain measure item combinations corresponding to the sub-data text sets.
Further, the calculating and obtaining the information value of each sub-data text set in the target court system according to the metric data combination includes:
determining a metric calculation formula of each measure item;
determining a calculation function of each sub-data text set information value according to a measurement calculation formula of each measurement item and a measurement item combination corresponding to each sub-data text set;
and calculating to obtain the information value of each subdata text set according to the measurement data combination and the calculation function of the subdata text set information value.
Further, according to the information value of each subdata text set in the target court system, combining the data text set and the subdata text set in the court data model, calculating to obtain a data scoring result of the target court system, including:
constructing a hierarchical big data scoring system, wherein the hierarchical big data scoring system comprises multi-level scoring indexes, the multi-level scoring indexes correspond to the hierarchical structure of the court data model, the data types corresponding to the top level scoring indexes are a regular text set, an entity text set and a case text set, the bottom level scoring indexes are the measure item combinations corresponding to the sub-data types, and the scoring indexes at the same level are provided with corresponding scoring index weight combinations;
and calculating to obtain a data scoring result of the target court system according to the information value of each subdata text set in the target court system and the hierarchical big data scoring system.
Further, the step of calculating a data scoring result of the target court system according to the information value of each subdata text set in the target court system and the hierarchical big data scoring system includes:
defining the information value of the subdata text set as a bottom-level scoring index corresponding to the subdata text;
and according to the bottom-level scoring indexes corresponding to the subdata texts, calculating by weighted summation to obtain a data scoring result of the target court system.
Further, when the data scoring result does not meet the preset requirement, outputting an adjustment instruction to the target court system to optimize the data in the target court system, including:
when the data scoring result does not meet the preset requirement, acquiring a weight combination of a subdata text set corresponding to the entity text set;
and according to the weight combination, sequentially reducing the adjustment proportion of the sub-data text set from high to low according to the weight, and outputting the adjustment proportion to the target court system so as to optimize the data of the intelligent court in the target court system.
In another aspect, this document also provides a device for processing big data of a smart court system based on objective information theory, the device comprising:
the system comprises a data acquisition module, a data analysis module and a data analysis module, wherein the data acquisition module is used for acquiring a data text set in a specified time period from a target court system, and the data text set comprises a rule text set, an entity text set and a case text set;
the classification module is used for determining a plurality of subdata text sets in each data text set according to the data text sets and a court data model;
the measurement data acquisition module is used for extracting measurement data corresponding to each measurement item in the measurement model from the sub-data text set according to the measurement model of the objective information theory to obtain a measurement data set;
the measurement item combination determining module is used for determining different measurement item combinations corresponding to different sub-data text sets and based on the measurement model;
the clustering module is used for carrying out clustering analysis on the measurement data sets according to the measurement item combination to obtain a measurement data combination aiming at each subdata text set;
an information value calculation module, configured to calculate and obtain an information value of each sub-data text set in the target court system according to the metric data combination, where the information value is used to represent a value attribute of the sub-data text set;
the evaluation module is used for calculating and obtaining a data scoring result of the target court system according to the information value of each subdata text set in the target court system by combining the data text set and the subdata text set in the court data model;
and the optimization module is used for outputting an adjusting instruction to the target court system when the data scoring result does not meet the preset requirement so as to optimize the data in the target court system.
In another aspect, a computer device is also provided herein, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the computer program.
Finally, a computer-readable storage medium is also provided herein, which stores a computer program that, when executed by a processor, implements the method as described above.
By adopting the technical scheme, the method, the device and the equipment for processing the big data of the intelligent court system based on the objective information theory determine a plurality of subdata text sets in each data text set according to a court data model by acquiring the data text sets of the target court system in a specified time period; then, according to a measurement model of an objective information theory, extracting measurement data corresponding to each measurement item in the measurement model from the sub-data text set to obtain a measurement data set; determining different measure item combinations corresponding to different sub-data text sets and based on the measurement model; according to the measure item combination, performing cluster analysis on the measure data sets to obtain a measure data combination for each subdata text set; calculating to obtain an information value of each subdata text set in the target court system according to the measurement data combination, and calculating to obtain a data scoring result of the target court system according to the information value of each subdata text set in the target court system; and when the data scoring result does not meet the preset requirement, outputting an adjusting instruction to the target court system to optimize the data in the target court system, wherein the data in the target court system can be comprehensively and quantitatively analyzed and evaluated, the reliability of analyzing and evaluating the court system is improved, and an adjusting suggestion is further provided for the court system.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 shows a schematic representation of an implementation environment for a method provided by embodiments herein;
FIG. 2 is a schematic diagram illustrating steps of a smart court system big data processing method based on objective information theory according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating the metric data set determination step in an embodiment herein;
FIG. 4 is a diagram illustrating the combination determination step of the measure items in the embodiment of the present invention;
FIG. 5 is a schematic diagram showing the data scoring result calculation step in the embodiment herein;
FIG. 6 shows a schematic diagram of a court data model in an embodiment herein;
FIG. 7 is a schematic diagram illustrating classification of rule-type data in an embodiment herein;
FIG. 8 is a diagram illustrating classification of entity-type data in an embodiment herein;
FIG. 9 is a diagram illustrating classification of process type data in an embodiment herein;
FIG. 10 is a schematic structural diagram illustrating a smart court system big data processing device based on objective information theory according to an embodiment of the present disclosure;
fig. 11 shows a schematic structural diagram of a computer device provided in an embodiment herein.
Description of the symbols of the drawings:
10. a court system;
20. a database;
30. a server;
100. a data acquisition module;
200. a classification module;
300. a metric data acquisition module;
400. a measure item combination determination module;
500. a clustering module;
600. an information value calculation module;
700. an evaluation module;
800. an optimization module;
1102. a computer device;
1104. a processor;
1106. a memory;
1108. a drive mechanism;
1110. an input/output module;
1112. an input device;
1114. an output device;
1116. a presentation device;
1118. a graphical user interface;
1120. a network interface;
1122. a communication link;
1124. a communication bus.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments herein without making any creative effort, shall fall within the scope of protection.
It should be noted that the terms "first," "second," and the like in the description and claims herein and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments herein described are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or device.
The embodiment of the specification provides a big data processing method of a smart court system based on objective information theory, which can realize comprehensive quantitative processing of court system data and further improve the reliability of court system evaluation. As shown in fig. 1, the method is a schematic environment for implementing the method, and may include a court system 10, a database 20, and a server 30, where the court system 10, the database 20, and the server 30 may interact with each other.
The database 20 may be a database 20 configured inside the court system 10 for storing data, or may be a database 20 specially storing the data text of the court system 10, where the database 20 may also be a database 20 storing data texts of a plurality of court systems 10, for example, the data texts of a plurality of court systems 10 divided by regions are stored in the database 20, and a specific expression form of the database 20 is not limited in the embodiment of the present specification.
The server 30 may extract the data texts of the court system 10 stored in the database 20 to form data text sets, and then determine a plurality of sub data text sets in each data text set according to the data text sets and a court data model; according to a measurement model of an objective information theory, extracting measurement data corresponding to each measurement item in the measurement model from the sub-data text set to obtain a measurement data set; determining different measure item combinations corresponding to different sub-data text sets and based on the measurement model; according to the measure item combination, performing cluster analysis on the measure data sets to obtain a measure data combination for each subdata text set; calculating and obtaining an information value of each subdata text set in the target court system according to the measurement data combination, wherein the information value is used for representing the value of the subdata text set; calculating to obtain a data scoring result of the target court system according to the information value of each subdata text set in the target court system, wherein the scoring system comprises a plurality of layers of scoring indexes with preset weights; and when the data scoring result does not meet the preset requirement, outputting an adjusting instruction to the target court system to optimize the data in the target court system.
In an optional embodiment, the server 30 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.
In addition, it should be noted that fig. 1 shows only one application environment provided by the present disclosure, and in practical applications, other application environments may also be included, for example, acquisition of a metric data set may also be implemented by other terminal devices.
Specifically, the embodiment of the invention provides a big data processing method of a smart court system based on objective information theory, which can realize comprehensive quantitative analysis of court system data and improve the reliability of data evaluation. Fig. 2 is a schematic diagram of steps of a method for processing big data of a smart court system based on objective information theory provided in the embodiments, and the present specification provides the operation steps of the method as described in the embodiments or the flowchart, but may include more or less operation steps based on conventional or non-creative labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual system or apparatus product executes, it can execute sequentially or in parallel according to the method shown in the embodiment or the figures. Specifically, as shown in fig. 2, the method may include:
s101: acquiring a data text set in a specified time period from a target court system, wherein the data text set comprises a rule text set, an entity text set and a case text set;
s102: determining a plurality of subdata text sets in each data text set according to the data text sets and a court data model;
s103: according to a measurement model of an objective information theory, extracting measurement data corresponding to each measurement item in the measurement model from the sub-data text set to obtain a measurement data set;
s104: determining different measure item combinations corresponding to different sub-data text sets and based on the measurement model;
s105: according to the measure item combination, performing cluster analysis on the measure data sets to obtain a measure data combination for each subdata text set;
s106: calculating and obtaining an information value of each subdata text set in the target court system according to the measurement data combination, wherein the information value is used for representing the value of the subdata text set;
s107: calculating to obtain a data scoring result of the target court system according to the information value of each subdata text set in the target court system and by combining the data text set and the subdata text set in a court data model;
s108: and when the data scoring result does not meet the preset requirement, outputting an adjusting instruction to the target court system to optimize the data in the target court system.
It can be understood that the objective information theory-based measurement model is used as a basis for evaluating the court system data, and the value of the court system data, namely the value of the expression information in the data, can be comprehensively reflected through different measurement items in the measurement model.
Furthermore, by classifying the court system data texts, data of different levels, namely the data texts and a plurality of subdata texts corresponding to the data texts, can be obtained, and different measure item combinations are set for each subdata text, so that the pertinence evaluation of the expressed content of the data texts can be realized, and the reliability of the data evaluation is improved.
The information value of the sub data text set can be understood as a measurement value of the sub data text set, and the data value of the sub data text set, that is, the data value of the data type corresponding to the sub data text set, can be directly judged, and the data value can be the strength and accuracy of data expression information.
In view of the fact that the court system has multiple data types and large data volume, judicial big data in the court information system is mainly unstructured or semi-structured text data, a multi-level data type can be obtained by constructing a court data model, such as a top-level data type and a bottom-level sub-data type, and accordingly a multi-level data text can be obtained, namely the data text and a plurality of sub-data texts corresponding to the data text, wherein the data text can comprise a regular text, an entity text and a case text, and the regular text can be regular data and is used for describing information of operation rules and incidence relations between entities, between entities and processes, and between processes; the entity text can be entity type data used for describing the information of the subject related to the court business, such as court system personnel data, organization data, equipment asset data and the like; the case text can be process data, namely data generated by taking a case as a center in the operation process of the court system, and is specifically used for describing data generated by an entity in various court business activities, including behavior information, result information and business process information among the entities, and the accurate and complete description of the court system data can be realized through the data.
In some optional embodiments, the court data model may also be a three-level data model, because there are many types of court system data and the content of the first-level data type (i.e., data text) is more, and it is difficult to directly divide the court data type into different second-level data types (i.e., sub-data text), so that an intermediate data type may be set between the first-level data type and the second-level data type, and the classification efficiency of the data text may be improved by a step-by-step classification manner, which is convenient for processing the data text, as shown in table 1 below, which is a division manner of the court data model in an embodiment of this specification.
TABLE 1 court data model
Figure 100002_DEST_PATH_IMAGE010
Figure 100002_DEST_PATH_IMAGE012
Through the establishment of the court data model, the types of different secondary data types (namely, subdata texts) can be determined, and the subdata texts in the same subdata text set represent the data of the same type, so that the information value of each subdata text set is evaluated conveniently, the evaluation of all subdata text sets of the target court system is integrated, and the evaluation result of the whole target court system can be obtained.
The measurement model of the objective information theory is an evaluation model of data information value, and can comprise nine dimensions of breadth, fineness, persistence, richness, volume, delay, coverage, reality and adaptation, and can comprehensively and reliably evaluate and analyze data information, wherein the meaning of each dimension is expressed as follows:
1) broad width: data is an objective description of information, and the value of data is directly related to the coverage of data. Generally, the wider the coverage of data under the same other conditions, the higher the value of data representation; conversely, the lower the value of the data representation.
2) Fineness: the value of the data representation is also directly related to the degree of detail of the description, and intuitively, the degree of detail can be understood as the degree of thickness of the particles into which the data can be decomposed. Generally speaking, under the same other conditions, the finer the particles described by the data, the higher the value embodied by the data; conversely, the lower the value of the data representation.
3) The persistence degree: the value of the data representation is also related to the degree of persistence of its description, which is intuitively reflected in the density of the occurrence time distribution and its span. Generally speaking, under the same other conditions, the denser the occurrence time distribution and the larger the span, the higher the value of data representation; conversely, the lower the value of the data representation.
4) And (3) richness: the value of data representation is also directly related to the richness of its description, which is actually reflected in the abundance of data content. Generally speaking, under the same other conditions, the richer the content of the data is, the higher the value of the data is; conversely, the lower the value of the data representation.
5) Volume degree: the value of data representation is also closely related to its inherent inclusion, which is reflected in the demand for carrier capacity. Generally speaking, under the same other conditions, the smaller the capacity requirement of the data is, the higher the value of the data is embodied; conversely, the lower the value of the data representation.
6) Degree of retardation: the value of the data representation is of course also closely related to its aging, which is reflected in the delay of the reflection time versus the occurrence time of the data content. Generally speaking, under the same other conditions, the smaller the delay of the reflection time to the occurrence time of the data content, the higher the value of the data representation; conversely, the lower the value of the data representation.
7) Degree of coverage: the value of data is in many cases closely related to its popularity, which is reflected in the spread of information carriers. For some data, the wider the carrier is distributed in a certain range, the higher the value is; for other data, the narrower the carrier distribution range, the higher the value.
8) The degree of truth: the value of the data depends to a large extent on whether a suitable semantic mapping can be found, so that the semantic state of the data is as close as possible to the actual state of the ontology. Generally speaking, under the condition that other conditions are the same, the smaller the difference between the semantic state and the actual state reflected by the data is, the higher the value of the data is; conversely, the lower the value of the data.
9) The adaptation degree is as follows: the value of the data ultimately must depend on the degree of satisfaction of the user requirements, which is reflected in the overall suitability of the various constituent elements of the data to the user requirements. Generally, the stronger the overall suitability, the higher the value of the data; conversely, the lower the value of the data.
Through the description, each measure item in the measurement model based on the objective information theory can represent the value degree of different dimensionalities of data, namely the value degree of the data in each subdata text set can be obtained through the measurement model for each subdata text set, and then the reliable and effective evaluation on the court system is realized.
In an embodiment of the present specification, the acquiring a data text set in a specified time period from a target court system includes:
determining the data type of a data text set to be extracted;
determining the storage position of the text set of the data to be extracted according to the data type of the text set of the data to be extracted;
and extracting the data text in a specified time period according to the storage position of the data text set to be extracted to form the data text set.
It can be understood that, since the data text is complete data information in the target court system, different data information can be stored in different storage locations, and the storage locations can be determined by determining the data types of the different data texts, thereby facilitating extraction of the data text.
The method provided by the embodiment of the specification judges whether the court system needs to be adjusted or not on the basis of obtaining the target court system data evaluation result, so that the design efficiency of the intelligent court is improved, and the design effect is improved. Therefore, the evaluation result (i.e., the value degree of the data) of the target court system data in the specified time period is calculated to determine the operation efficiency of the target court system in the specified time period (the higher the value is, the higher the operation efficiency is, and the validity and the value of the data are both higher), so that reference can be made for the subsequent adjustment of the target court system, the specified time period is set according to the actual situation, for example, the time period can be 3 months, 6 months, and the specific value is not limited in the embodiment of the present specification.
In an embodiment of this specification, as shown in fig. 3, the extracting, from the sub-data text set, metric data corresponding to each measure item in the metric model according to the metric model of objective information theory to obtain a metric data set includes:
s201: determining a measurement calculation formula of each measurement item according to a measurement model of the objective information theory;
s202: determining the measurement data required by each measurement item according to the measurement calculation formula;
s203: and extracting the measurement data corresponding to each measurement item from the sub-data text set to obtain a measurement data set.
It can be understood that different measure items (such as breadth) need attributes (such as coverage degree) of different dimensions in the data text to be obtained, and the attributes are metric data, so that by determining the metric data corresponding to each measure item in each sub-data text set to obtain a metric data set, the measure item corresponding to each sub-data text set can be calculated through the metric data set.
In actual operation, because each sub-data text set has a large number of sub-data texts, a set of metric data can be obtained by determining the metric data corresponding to each measure item for each sub-data text, and each set of metric data includes the metric data corresponding to nine measure items, so that all the metric data corresponding to all the sub-data texts in each sub-data text set form a metric data combination.
Because each measure item corresponds to different measure data, semantic information (for example, related to the number of court systems, data types, data formats, and the like) of the sub-data text can be acquired from different dimensions by using a text recognition technology, measure data corresponding to different measure items are obtained through the semantic information, and a specific recognition process is not limited in the embodiment of the description.
In the embodiment of the present specification, the metric calculation formulas of different measure items are different, and exemplarily:
1) the formula for the calculation of the breadth can be:
Figure 100002_DEST_PATH_IMAGE014
wherein: let O be the subdata text, C be a constant, i.e. the total number of national court systems,
Figure 100002_DEST_PATH_IMAGE016
is the first of subdata text overlayiThe system of each court of court is,
Figure 100002_DEST_PATH_IMAGE018
is the first of subdata text overlayiValue weight coefficients for individual court systems;nthe court system total is overlaid for the subdata text.
2) The calculation formula for the fineness may be:
Figure 100002_DEST_PATH_IMAGE020
wherein G is the fineness, O is the subdata text, C is a constant, the total number of data types,
Figure 100002_DEST_PATH_IMAGE022
for the sub-data text, the number one is referred toi(1≤in) The type of the data is one of,
Figure 855720DEST_PATH_IMAGE018
for the sub-data text, the number one is referred toiThe value weight coefficient for each data type,nthe total number of data types involved in the sub data text.
3) The calculation formula for persistence may be:
Figure 100002_DEST_PATH_IMAGE024
wherein, T is set as time T,
Figure 100002_DEST_PATH_IMAGE026
for the time span, O is the subdata text,nfor the total number of different format data in the sub data text,
Figure 100002_DEST_PATH_IMAGE028
is as followsiWhether the format data keeps continuously changing in the time set T or not, if so, then
Figure 100002_DEST_PATH_IMAGE030
(ii) a Otherwise
Figure 100002_DEST_PATH_IMAGE032
(ii) a SUS as subdata text from
Figure 100002_DEST_PATH_IMAGE034
To
Figure 100002_DEST_PATH_IMAGE036
Data duration within a time span.
4) The formula for the richness calculation may be:
Figure 100002_DEST_PATH_IMAGE038
wherein R is the richness, O is the subdata text, n is the total number of cases involved in the subdata text,
Figure 100002_DEST_PATH_IMAGE040
for the total number of entities involved in the subdata text,
Figure 100002_DEST_PATH_IMAGE042
is as followsiThe number of entities associated between the cases.
5) The formula for the volumetric calculation may be:
Figure 100002_DEST_PATH_IMAGE044
wherein V is the volume, O is the subdata text,
Figure 100002_DEST_PATH_IMAGE046
is as followsiThe value weight coefficient of the sub-data text,
Figure 100002_DEST_PATH_IMAGE048
is as followsiThe size of the physical space required for the sub-data text.
6) The calculation formula for the delay degree may be:
Figure 100002_DEST_PATH_IMAGE050
wherein D is delay, T is time set, O is subdata text,
Figure 897756DEST_PATH_IMAGE046
is as followsiThe value weight coefficient of the sub-data text,
Figure 100002_DEST_PATH_IMAGE052
t isiThe total number of sub-data texts aggregated data in time (i.e. data updates),
Figure 633631DEST_PATH_IMAGE040
the total number of updates for all the sub data text data.
7) The calculation formula for the degree of spread may be:
Figure 100002_DEST_PATH_IMAGE054
wherein C is the degree of coverage, S is the subdata text,
Figure 100002_DEST_PATH_IMAGE056
the number of the users related to the subdata text is U, which is a constant, and the total number of the users in all the subdata texts.
8) The formula for the calculation of the degree of truth may be:
Figure 100002_DEST_PATH_IMAGE058
wherein V is the true degree, O is the subdata text,
Figure 100002_DEST_PATH_IMAGE060
the number of records representing the qualification of the sub data text,
Figure 488324DEST_PATH_IMAGE040
is the total number of subdata texts.
9) The formula may be calculated for the fitness as:
Figure 100002_DEST_PATH_IMAGE062
wherein S is the adaptation degree, O is the subdata text,
Figure DEST_PATH_IMAGE064
the amount of data representing the effective use of the sub data text,
Figure 842688DEST_PATH_IMAGE040
the total amount of data that is effectively used for the entire subdata text.
The measurement data of the measurement item corresponding to each subdata text can be obtained through the formula, and then statistical analysis is carried out on the damaged subdata text to obtain a measurement data set corresponding to each data text set.
On the basis of obtaining a metric data set, in order to improve the reliability and accuracy of data, reliable data may be screened from the metric data set, abnormal data is removed, and optionally, the metric data corresponding to each measure item is extracted from the sub-data text set to obtain a metric data set, including:
for each child data text set:
acquiring subdata texts in the subdata text set;
sequentially extracting the measurement data corresponding to each measurement item from the subdata text to obtain an initial measurement data set;
calculating the standard deviation of the measurement data corresponding to each measure item;
and screening out the metric data meeting preset conditions from the initial metric data set according to the standard deviation of the metric data corresponding to each metric item to obtain a metric data set.
It can be understood that, in the embodiments of the present specification, the metric data in each sub-data text set is screened to remove the data belonging to the anomaly, so that the reliability of the metric data can be ensured, and the reliability and accuracy of the calculation of each measure item are improved. Wherein the preset condition may be:
Figure 304894DEST_PATH_IMAGE002
wherein,
Figure 324802DEST_PATH_IMAGE004
is as followsiThe average value of the measurement data corresponding to the individual measurement items,
Figure 265076DEST_PATH_IMAGE006
first, theiThe standard deviation of the metric data corresponding to the individual measure items,
Figure 54041DEST_PATH_IMAGE008
is as followsiThe individual measure items correspond tojAnd (4) measuring data.
In some other embodiments, the preset condition may also have other expression modes, and may be set according to an actual situation, which is not limited in the embodiments of the present specification.
In this embodiment of the present specification, as shown in fig. 4, the determining different combinations of measure items based on the metric model corresponding to different sub-data text sets includes:
s301: acquiring the attribute of the text content of the subdata text set;
s302: determining a metric attribute of each measure item in a metric model of the information theory;
s303: calculating to obtain the attribute association degree of the sub data text set and each measure item according to the attribute of the text content and the measure attribute;
s304: determining the measure items with the attribute relevance exceeding a preset value as measure items corresponding to the sub-data text sets so as to obtain measure item combinations corresponding to the sub-data text sets.
It can be understood that, because the text contents and the data types of different sub-data text sets are different, all the measure items in the measurement model of the objective information theory are not necessarily suitable for each sub-data text set, only part of the measure items are needed to express the data value degree of each sub-data text set, and in order to determine which measure item combinations any sub-data text set is suitable for or conforms to, the relevance degree can be calculated through the attributes of the text contents of the sub-data text sets and the measurement attributes of the measure items, and then the measure items with higher relevance degree are used as the measure items corresponding to the sub-data text sets.
For example, the measurement attribute of each measurement item may be set in advance, for example, the measurement attribute of a broad scope is a coverage or a coverage of a court system, the measurement attribute of a fine scope is related to the number of data types, and the like, the measurement attribute of each measurement item may set a preset threshold as a criterion for judgment, and the attribute of the text content of the sub-data text set may be determined by a text recognition technology, starting from the measurement data corresponding to each measurement item, to obtain the measurement data of each sub-data text relative to each measurement item, and further obtain an average value of the measurement data of all the sub-data texts in the sub-data text set relative to each measurement item. And calculating the ratio of the average value to a preset threshold value as an attribute relevance degree aiming at each measure item, and judging the measure item combination suitable for each sub-data text set according to the attribute relevance degree and the preset value.
In some other embodiments, the combination of measure items corresponding to each sub data text set may also be set by other manners, such as: determining a data type corresponding to each sub-data text set, sending the data type corresponding to each sub-data text set to a plurality of experts for scoring, wherein the scoring may be that the experts evaluate the matching degree of the data type corresponding to the sub-data text set and each measurement item, and calculating an average value of all the experts scored for each measurement item, when the average value exceeds a specified value, the measurement item can be used as a measurement item adapted to the sub-data text, exemplarily, 10 experts participate in the scoring, and when the score is scored with the popularity matching degree for any sub-data text set a, 10 scores are obtained: 3. 5, 6, 4, 8, 4, 6, 9, 5, and 6, where the average value is 5.6, and the average value exceeds a specified value (for example, 5), the popularity is used as the measure item corresponding to the sub-data text set a, and then the other measure items are sequentially scored to obtain the measure item combination corresponding to the data text set a, it should be noted that the specified values corresponding to different measure items may be different and are set according to actual situations.
In some other embodiments, the combination of the measure items corresponding to the sub data text set may also be determined in other manners, which are not limited in this embodiment of the present specification.
As shown in table 2 below, the combination of measure items corresponding to each sub-data text set is determined on the basis of table 1:
TABLE 2 court System different data type measure item combinations
Figure DEST_PATH_IMAGE066
Figure DEST_PATH_IMAGE068
Figure DEST_PATH_IMAGE070
In this embodiment of the present specification, according to the measure item combination, performing cluster analysis from the measure data sets to obtain a measure data combination for each sub data text set.
It can be understood that, by determining the metric data of each sub-data text set for each measure item and the measure item combination corresponding to each sub-data text set through the above steps, the measure data combination corresponding to each measure item combination can be extracted, that is, the measure data combination corresponding to each sub-data text set is obtained, and the cluster analysis is to extract the measure data combination corresponding to the sub-data text set according to keywords (that is, measure item combinations).
In an embodiment of this specification, the calculating and obtaining an information value of each sub-data text set in the target court system according to the metric data combination includes:
determining a metric calculation formula of each measure item;
determining a calculation function of each sub-data text set information value according to a measurement calculation formula of each measurement item and a measurement item combination corresponding to each sub-data text set;
and calculating to obtain the information value of each subdata text set according to the measurement data combination and the calculation function of the subdata text set information value.
It can be understood that the data information value (i.e. the data value degree) of each sub data text set is obtained by performing comprehensive calculation through a measure item combination, and further, different measure items corresponding to different weight coefficients can be set in the measure item combination, and for example, the calculation function of the information value can be represented by the following formula:
Figure DEST_PATH_IMAGE072
wherein T is the information value of any sub data text set,
Figure DEST_PATH_IMAGE074
the first measure item combination corresponding to the sub data text set is representediThe measurement calculation formula of the individual measurement items,
Figure DEST_PATH_IMAGE076
the first measure item combination corresponding to the sub data text set is representediThe weight coefficients of the individual measure terms,
Figure DEST_PATH_IMAGE078
nand the total number of the measure items corresponding to the sub data text set.
Exemplarily, a first weight combination is set for the measure item combination corresponding to each sub-data text set, and the first weight combination is a weight coefficient occupied by each measure item in the measure item combination corresponding to the same sub-data text set, that is, a measure item weight combination. As shown in table 2, for the subdata text set of legal rules data, the corresponding measure item combinations are breadth, duration, coverage and fitness, and the weight coefficients of each measure item are Q1, Q2, Q3 and Q4, so Q1, Q2 and Q3 constitute the first weight combination of legal rules data.
In this embodiment of the present description, as shown in fig. 5, calculating, according to an information value of each sub-data text set in the target court system and in combination with a data text set and a sub-data text set in a court data model, a data scoring result of the target court system, includes:
s401: constructing a hierarchical big data scoring system, wherein the hierarchical big data scoring system comprises multi-level scoring indexes, the multi-level scoring indexes correspond to the hierarchical structure of the court data model, the data types corresponding to the top level scoring indexes are a regular text set, an entity text set and a case text set, the bottom level scoring indexes are the measure item combinations corresponding to the sub-data types, and the scoring indexes at the same level are provided with corresponding scoring index weight combinations;
s402: and calculating to obtain a data scoring result of the target court system according to the information value of each subdata text set in the target court system and the hierarchical big data scoring system.
It can be understood that, on the basis of the data model established in the above steps, a hierarchical big data scoring system is established for a multi-level data model, for example, a sub-data text set is used as a bottom scoring index (such as a second scoring index), a data text set is used as a middle scoring index (such as a first scoring index), a different weight coefficient (such as a second weight combination) is set for the second scoring index under each first scoring index, a different weight coefficient (such as a third weight combination) is also set between different first scoring indexes, and then, weighting and summing layer by layer is performed to obtain a scoring result of the target court system.
The second weight combination and the third weight combination may be set according to an actual situation, for example, a weight coefficient corresponding to each score index is 1. The second weight combination and the third weight combination may be dynamically changing values, and may be inversely adjusted according to the data scoring result of the hierarchical data evaluation system, for example, determining the scoring result of each scoring index according to the data scoring result, calculating the scoring result distance between any two scoring indexes in the same level, and when the scoring result distance is lower than the specified result distance, the weighting coefficients of the two scoring indexes corresponding to the distance between the scoring result and the scoring result can be adjusted to obtain the second weighting combination and the third weighting combination which are dynamically adjusted, wherein the designated result distance between different scoring indexes can be different and is set according to the actual situation, the scoring result distance may be a ratio, a difference, or a functional relationship between two scoring indexes, and is not limited in the embodiments of the present specification.
In some other embodiments, the determination may also be determined by an expert scoring result, and a specific determination process may be similar to the determination process of the combination of the measure items, which is not limited in the embodiments of the present specification.
As shown in table 2 and fig. 6, a hierarchical data structure, such as top-level regular data (corresponding to regular data text), solid data (corresponding to solid data text), and process data (corresponding to case data text), can be known according to the court data model, as shown in fig. 7, fig. 8, and fig. 9, each data type is provided with a plurality of sub data texts, such as regular data including legal rule data and administrative rule data, and the information values (i.e., the second score indexes) corresponding to the legal rule data and the administrative rule data can be calculated by their corresponding item combinations, so that the second weight combinations of the legal rule data and the administrative rule data, such as D1 (weight coefficients of the legal rule data) and D2 (weight coefficients of the administrative rule data), can be further set, and then the score indexes (i.e., the first score indexes) corresponding to the regular data can be calculated by D1 and D2, in the same process, the scoring indexes corresponding to the entity type data and the process type data can be calculated and obtained respectively.
And the third weight combination is weight coefficients respectively corresponding to the regular data, the entity data and the process data, such as E1, E2 and E3, and the weighting summation is carried out to obtain a scoring result of the court system big data.
In one embodiment of the present specification, the hierarchical big data scoring system may also be constructed as follows:
step 1: obtaining a multilevel data type according to a preset court system data model, wherein the multilevel data type comprises a data type and a plurality of subdata types corresponding to the data type, and the accurate and complete description of the court system data can be realized through the data type;
step 2: on the basis of the data model, extracting measurable semantic items in different sub-data types of the court system big data, and performing semantic matching on the measurable semantic items and a measurement model based on objective information theory to determine a measurement item combination of objective information measurement indexes corresponding to each sub-data type, wherein the measurable semantic items are semantic information of the sub-data types in a specified dimension;
and step 3: determining a calculation formula of each measure item according to the objective information theory measurement model definition corresponding to each matched measure item;
and 4, step 4: establishing a hierarchical big data rating frame which is a multi-level rating item, wherein the top level rating item is regular data, entity data and process data, and the bottom level rating item is the measurement item combination corresponding to the subdata type;
and 5: and constructing the hierarchical big data scoring system according to the hierarchical big data scoring framework and the scoring item weight of each hierarchy.
In some other embodiments, the data scoring result of the target court system may also be obtained through other calculation methods, which are not limited in this embodiment of the specification.
On the basis of the above-mentioned hierarchical big data scoring system, a data scoring result of the target court system may be obtained through calculation, and optionally, the obtaining of the data scoring result of the target court system through calculation according to the information value of each child data text set in the target court system and the hierarchical big data scoring system includes:
defining the information value of the subdata text set as a bottom-level scoring index corresponding to the subdata text;
and according to the bottom-level scoring indexes corresponding to the subdata texts, calculating by weighted summation to obtain a data scoring result of the target court system.
After the information value of each subdata text set in the target court system is obtained, a data scoring result of the target court system can be obtained through calculation by combining a constructed scoring system, wherein the data scoring result can be the value degree of data in the target court system, and the higher the data scoring result is, the higher the value degree of the data is, and the higher the operation efficiency of the target court system is.
In order to judge whether the data scoring result of the target court system is qualified, a preset scoring threshold may be set, when the data scoring result of the target court system reaches the preset scoring threshold, it may be determined that the data value of the target court system is high, the operation efficiency of the target court system is high, and the target court system meets the design requirement of the smart court, where the preset scoring threshold may be set according to an actual situation, or the designated court system may be used as a design or construction standard of the smart court, and the preset scoring threshold may be determined in other ways by calculating the data scoring result of the designated court system as the preset scoring threshold in some other embodiments, which are not limited in the embodiments of this specification.
When the data scoring result of the target court system does not meet the preset requirement (that is, is unqualified), outputting an adjustment instruction to the target court system to improve the operation efficiency of the target court system, and further optimizing the data in the target court system, which optionally may include:
when the data scoring result does not meet the preset requirement, acquiring a weight combination of a subdata text set corresponding to the specified entity text set;
and according to the weight combination, sequentially reducing the adjustment proportion of the sub-data text set from high to low according to the weight, and outputting the adjustment proportion to the target court system so as to optimize the data of the intelligent court in the target court system.
It can be understood that different data texts in the court system represent different data types and sources, for example, a rule text is a text of an operation rule of the court system, and it is basically difficult to actively adjust the operation rule, the case text is text information in a case handling process, and is also generated based on the operation rule and entity data of the court system, so that it is also difficult to actively change, when adjusting the operation efficiency of the court system, the entity text in the court system can be changed by adjusting entity information (such as a personnel structure, an organization and the like) of the court system, and further the generation of data in the operation process of the court system can be adjusted, so that internal information in the court system can be selectively adjusted based on the entity text, for example, the entity text corresponding to adjustable information can be selected as a designated entity text, and then according to weight coefficients corresponding to different designated entity texts, different adjustment proportions are selected, so that entity texts (subdata text sets) with higher importance (namely, larger weight coefficients) can be adjusted greatly, the adjustment efficiency is improved through targeted adjustment, the operation efficiency of the court system is further improved, and the data of the court system is optimized.
It should be noted that the adjustment proportion of the sub-data text set is actually an adjustment suggestion output by the court system, and the adjustment of the sub-data text set can be realized only by adjusting the internal structure of the court system, which needs to be set according to the actual situation, and if the part which cannot be adjusted can be ignored.
According to the intelligent court system big data processing method based on the objective information theory, based on the measurement model of the objective information theory, different measurement item combinations corresponding to different data texts can be determined, information values (namely data value degrees) corresponding to different sub-data text sets can be obtained by calculating measurement data corresponding to each measurement item, a data scoring structure of a target court system can be obtained based on a scoring system constructed in advance, the court system can be scored comprehensively and quantitatively, and therefore an adjusting direction is provided for operation of the court system.
Based on the same inventive concept, the present specification further provides an objective information theory-based big data processing apparatus for a smart court system, as shown in fig. 10, where the apparatus includes:
the system comprises a data acquisition module 100, a case text collection module and a data analysis module, wherein the data acquisition module 100 is used for acquiring a data text collection in a specified time period from a target court system, and the data text collection comprises a rule text collection, an entity text collection and a case text collection;
the classification module 200 is configured to determine, according to the data text sets and according to a court data model, a plurality of sub-data text sets in each data text set;
a metric data obtaining module 300, configured to extract, according to a metric model of an objective information theory, metric data corresponding to each measure item in the metric model from the sub-data text set, so as to obtain a metric data set;
a measurement item combination determining module 400, configured to determine different measurement item combinations based on the metric model corresponding to different sub-data text sets;
a clustering module 500, configured to perform clustering analysis on the metric data sets according to the measure item combinations to obtain a metric data combination for each sub-data text set;
an information value calculating module 600, configured to calculate and obtain an information value of each sub-data text set in the target court system according to the metric data combination, where the information value is used to represent a value attribute of the sub-data text set;
the evaluation module 700 is configured to calculate and obtain a data scoring result of the target court system according to the information value of each subdata text set in the target court system by combining the data text set and the subdata text set in the court data model;
and the optimizing module 800 is configured to output an adjusting instruction to the target court system to optimize the data in the target court system when the data scoring result does not meet a preset requirement.
The advantages obtained by the device are consistent with those obtained by the method, and the embodiments of the present description are not repeated.
As shown in fig. 11, for a computer device provided in this embodiment, the apparatus in this embodiment may be a computer device in this embodiment, and performs the method in this embodiment, and the computer device 1102 may include one or more processors 1104, such as one or more Central Processing Units (CPUs), each of which may implement one or more hardware threads. The computer device 1102 may also include any memory 1106 for storing any kind of information, such as code, settings, data, etc. For example, and without limitation, memory 1106 may include any one or more of the following in combination: any type of RAM, any type of ROM, flash memory devices, hard disks, optical disks, etc. More generally, any memory may use any technology to store information. Further, any memory may provide volatile or non-volatile retention of information. Further, any memory may represent fixed or removable components of computer device 1102. In one case, when the processor 1104 executes the associated instructions, which are stored in any memory or combination of memories, the computer device 1102 can perform any of the operations of the associated instructions. The computer device 1102 also includes one or more drive mechanisms 1108, such as a hard disk drive mechanism, an optical disk drive mechanism, etc., for interacting with any memory.
Computer device 1102 may also include an input/output module 1110 (I/O) for receiving various inputs (via input device 1112) and for providing various outputs (via output device 1114). One particular output mechanism may include a presentation device 1116 and an associated Graphical User Interface (GUI) 1118. In other embodiments, input/output module 1110 (I/O), input device 1112, and output device 1114 may also be excluded, as only one computer device in a network. Computer device 1102 can also include one or more network interfaces 1120 for exchanging data with other devices via one or more communication links 1122. One or more communication buses 1124 couple the above-described components together.
Communication link 1122 may be implemented in any manner, e.g., via a local area network, a wide area network (e.g., the Internet), a point-to-point connection, etc., or any combination thereof. Communications link 1122 may include any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc., governed by any protocol or combination of protocols.
Corresponding to the methods in fig. 2-5, the embodiments herein also provide a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, performs the steps of the above-described method.
Embodiments herein also provide computer readable instructions, wherein when executed by a processor, a program thereof causes the processor to perform the method as shown in fig. 2-5.
It should be understood that, in various embodiments herein, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments herein.
It should also be understood that, in the embodiments herein, the term "and/or" is only one kind of association relation describing an associated object, meaning that three kinds of relations may exist. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided herein, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purposes of the embodiments herein.
In addition, functional units in the embodiments herein may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present invention may be implemented in a form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The principles and embodiments of this document are explained herein using specific examples, which are presented only to aid in understanding the methods and their core concepts; meanwhile, for the general technical personnel in the field, according to the idea of this document, there may be changes in the concrete implementation and the application scope, in summary, this description should not be understood as the limitation of this document.

Claims (14)

1. A big data processing method of a smart court system based on objective information theory is characterized by comprising the following steps:
acquiring a data text set in a specified time period from a target court system, wherein the data text set comprises a rule text set, an entity text set and a case text set;
determining a plurality of subdata text sets in each data text set according to the data text sets and a court data model;
according to a measurement model of an objective information theory, extracting measurement data corresponding to each measurement item in the measurement model from the sub-data text set to obtain a measurement data set;
determining different measure item combinations corresponding to different sub-data text sets and based on the measurement model;
according to the measure item combination, performing cluster analysis on the measure data sets to obtain a measure data combination for each subdata text set;
calculating and obtaining an information value of each subdata text set in the target court system according to the measurement data combination, wherein the information value is used for representing the value of the subdata text set;
calculating to obtain a data scoring result of the target court system according to the information value of each subdata text set in the target court system and by combining the data text set and the subdata text set in a court data model;
and when the data scoring result does not meet the preset requirement, outputting an adjusting instruction to the target court system to optimize the data in the target court system.
2. The method of claim 1, wherein obtaining the set of data texts within a specified time period from the target court system comprises:
determining the data type of a data text set to be extracted;
determining the storage position of the text set of the data to be extracted according to the data type of the text set of the data to be extracted;
and extracting the data text in a specified time period according to the storage position of the data text set to be extracted to form the data text set.
3. The method of claim 2, wherein the information-theoretic metric models include breadth, detail, persistence, richness, volume, delay, pervasion, reality, and fitness;
the method for extracting the measurement data corresponding to each measurement item in the measurement model from the sub-data text set according to the measurement model of the objective information theory to obtain a measurement data set includes:
determining a measurement calculation formula of each measurement item according to a measurement model of the objective information theory;
determining the measurement data required by each measurement item according to the measurement calculation formula;
and extracting the measurement data corresponding to each measurement item from the sub-data text set to obtain a measurement data set.
4. The method of claim 3, wherein the extracting metric data corresponding to each measure item from the sub-data text set to obtain a metric data set comprises:
for each child data text set:
acquiring subdata texts in the subdata text set;
sequentially extracting the measurement data corresponding to each measurement item from the subdata text to obtain an initial measurement data set;
calculating the standard deviation of the measurement data corresponding to each measure item;
and screening out the metric data meeting preset conditions from the initial metric data set according to the standard deviation of the metric data corresponding to each metric item to obtain a metric data set.
5. The method according to claim 4, wherein the preset condition is:
Figure DEST_PATH_IMAGE002
wherein,
Figure DEST_PATH_IMAGE004
is as followsiThe average value of the measurement data corresponding to the individual measurement items,
Figure DEST_PATH_IMAGE006
first, theiThe standard deviation of the metric data corresponding to the individual measure items,
Figure DEST_PATH_IMAGE008
is as followsiThe individual measure items correspond tojAnd (4) measuring data.
6. The method of claim 1, wherein determining different combinations of measure items based on the metric model for different sub-data text sets comprises:
acquiring the attribute of the text content of the subdata text set;
determining a metric attribute of each measure item in a metric model of the information theory;
calculating to obtain the attribute association degree of the sub data text set and each measure item according to the attribute of the text content and the measure attribute;
determining the measure items with the attribute relevance exceeding a preset value as measure items corresponding to the sub-data text sets so as to obtain measure item combinations corresponding to the sub-data text sets.
7. The method of claim 1, wherein the computing the information value of each subdata text set in the target court system from the metric data combination comprises:
determining a metric calculation formula of each measure item;
determining a calculation function of each sub-data text set information value according to a measurement calculation formula of each measurement item and a measurement item combination corresponding to each sub-data text set;
and calculating to obtain the information value of each subdata text set according to the measurement data combination and the calculation function of the subdata text set information value.
8. The method of claim 3,
the measurement calculation formula of the breadth can be as follows:
Figure DEST_PATH_IMAGE010
wherein: let O be the subdata text, C be a constant, i.e. the total number of national court systems,
Figure DEST_PATH_IMAGE012
is the first of subdata text overlayiThe system of each court of court is,
Figure DEST_PATH_IMAGE014
is the first of subdata text overlayiValue weight coefficients for individual court systems;ncovering the total number of court systems for the subdata text;
the measurement calculation formula of the fineness can be as follows:
Figure DEST_PATH_IMAGE016
wherein G is the fineness, O is the subdata text, C is a constant, the total number of data types,
Figure DEST_PATH_IMAGE018
for the sub-data text, the number one is referred toi(1≤in) The type of the data is one of,
Figure DEST_PATH_IMAGE020
for the sub-data text, the number one is referred toiThe value weight coefficient for each data type,nthe total number of the related data types in the subdata text;
the measurement of the persistence degree may be calculated by the following formula:
Figure DEST_PATH_IMAGE022
wherein, T is set as time T,
Figure DEST_PATH_IMAGE024
for the time span, O is the subdata text,nfor the total number of different format data in the sub data text,
Figure DEST_PATH_IMAGE026
is as followsiWhether the format data keeps continuously changing in the time set T or not, if so, then
Figure DEST_PATH_IMAGE028
(ii) a Otherwise
Figure DEST_PATH_IMAGE030
(ii) a SUS as subdata text from
Figure DEST_PATH_IMAGE032
To
Figure DEST_PATH_IMAGE034
Data duration over a time span;
the richness measure calculation formula may be:
Figure DEST_PATH_IMAGE036
wherein R is the richness, O is the subdata text,nfor the total number of cases involved in the subdata text,
Figure DEST_PATH_IMAGE038
for the total number of entities involved in the subdata text,
Figure DEST_PATH_IMAGE040
is as followsiThe number of entities associated among the cases;
the measurement calculation formula of the volume degree can be as follows:
Figure DEST_PATH_IMAGE042
wherein V is the volume, O is the subdata text,
Figure DEST_PATH_IMAGE044
is as followsiThe value weight coefficient of the sub-data text,
Figure DEST_PATH_IMAGE046
is as followsiThe size of the physical space required by the sub-data text;
the delay metric may be calculated by the following formula:
Figure DEST_PATH_IMAGE048
wherein D is delay, T is time set, O is subdata text,
Figure 538098DEST_PATH_IMAGE044
is as followsiThe value weight coefficient of the sub-data text,
Figure DEST_PATH_IMAGE050
is as followsiThe total number of sub-data texts aggregated data in time (i.e. data updates),
Figure 924080DEST_PATH_IMAGE038
total number of updates for all subdata text data;
the measurement of the degree of coverage calculation formula may be:
Figure DEST_PATH_IMAGE052
wherein C is the degree of coverage, S is the subdata text,
Figure DEST_PATH_IMAGE054
the number of users related to the subdata text is U, which is a constant, and the total number of users in all the subdata texts is counted;
the measurement calculation formula of the truth can be as follows:
Figure DEST_PATH_IMAGE056
wherein V is the true degree, O is the subdata text,
Figure DEST_PATH_IMAGE058
the number of records representing the qualification of the sub data text,
Figure 123723DEST_PATH_IMAGE038
the total number of the subdata texts is;
the measurement calculation formula of the fitness can be as follows:
Figure DEST_PATH_IMAGE060
wherein S is the adaptation degree, O is the subdata text,
Figure DEST_PATH_IMAGE062
the amount of data representing the effective use of the sub data text,
Figure 440304DEST_PATH_IMAGE038
the total amount of data that is effectively used for the entire subdata text.
9. The method of claim 1, wherein calculating a data scoring result of the target court system according to the information value of each subdata text set in the target court system and by combining the data text set and the subdata text set in the court data model comprises:
constructing a hierarchical big data scoring system, wherein the hierarchical big data scoring system comprises multi-level scoring indexes, the multi-level scoring indexes correspond to the hierarchical structure of the court data model, the data types corresponding to the top level scoring indexes are a regular text set, an entity text set and a case text set, the bottom level scoring indexes are the measure item combinations corresponding to the sub-data types, and the scoring indexes at the same level are provided with corresponding scoring index weight combinations;
and calculating to obtain a data scoring result of the target court system according to the information value of each subdata text set in the target court system and the hierarchical big data scoring system.
10. The method of claim 9, wherein the calculating a data scoring result of the target court system according to the information value of each subdata text set in the target court system and the hierarchical big data scoring system comprises:
defining the information value of the subdata text set as a bottom-level scoring index corresponding to the subdata text;
and according to the bottom-level scoring indexes corresponding to the subdata texts, calculating by weighted summation to obtain a data scoring result of the target court system.
11. The method of claim 9, wherein when the data scoring result does not meet a preset requirement, outputting an adjustment instruction to the target court system to optimize data in the target court system comprises:
when the data scoring result does not meet the preset requirement, acquiring a weight combination of a subdata text set corresponding to the specified entity text set;
and according to the weight combination, sequentially reducing the adjustment proportion of the sub-data text set from high to low according to the weight, and outputting the adjustment proportion to the target court system so as to optimize the data of the intelligent court in the target court system.
12. An objective information theory-based big data processing device of a smart court system, the device comprising:
the system comprises a data acquisition module, a data analysis module and a data analysis module, wherein the data acquisition module is used for acquiring a data text set in a specified time period from a target court system, and the data text set comprises a rule text set, an entity text set and a case text set;
the classification module is used for determining a plurality of subdata text sets in each data text set according to the data text sets and a court data model;
the measurement data acquisition module is used for extracting measurement data corresponding to each measurement item in the measurement model from the sub-data text set according to the measurement model of the objective information theory to obtain a measurement data set;
the measurement item combination determining module is used for determining different measurement item combinations corresponding to different sub-data text sets and based on the measurement model;
the clustering module is used for carrying out clustering analysis on the measurement data sets according to the measurement item combination to obtain a measurement data combination aiming at each subdata text set;
an information value calculation module, configured to calculate and obtain an information value of each sub-data text set in the target court system according to the metric data combination, where the information value is used to represent a value of the sub-data text set;
the evaluation module is used for calculating and obtaining a data scoring result of the target court system according to the information value of each subdata text set in the target court system by combining the data text set and the subdata text set in the court data model;
and the optimization module is used for outputting an adjusting instruction to the target court system when the data scoring result does not meet the preset requirement so as to optimize the data in the target court system.
13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 11 when executing the computer program.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1 to 11.
CN202111201097.9A 2021-10-15 2021-10-15 Smart court system big data processing method and device based on objective information theory Active CN113641825B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111201097.9A CN113641825B (en) 2021-10-15 2021-10-15 Smart court system big data processing method and device based on objective information theory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111201097.9A CN113641825B (en) 2021-10-15 2021-10-15 Smart court system big data processing method and device based on objective information theory

Publications (2)

Publication Number Publication Date
CN113641825A true CN113641825A (en) 2021-11-12
CN113641825B CN113641825B (en) 2022-01-04

Family

ID=78427077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111201097.9A Active CN113641825B (en) 2021-10-15 2021-10-15 Smart court system big data processing method and device based on objective information theory

Country Status (1)

Country Link
CN (1) CN113641825B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113900921A (en) * 2021-12-07 2022-01-07 人民法院信息技术服务中心 Court information system running state evaluation method, device, equipment and storage medium
CN117633488A (en) * 2023-12-01 2024-03-01 重庆金微科技有限公司 Product feature mining method and system based on user feedback data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008053949A1 (en) * 2006-11-01 2008-05-08 Intellectual Property Bank Corp. Document group analysis device
CN110543628A (en) * 2018-05-29 2019-12-06 南京大学 text information quality measurement method under rule constraint
CN111192176A (en) * 2019-12-30 2020-05-22 华中师范大学 Online data acquisition method and device supporting education informatization assessment
US10671483B1 (en) * 2016-04-22 2020-06-02 EMC IP Holding Company LLC Calculating data value via data protection analytics
CN112633679A (en) * 2020-12-21 2021-04-09 贵州电网有限责任公司电力科学研究院 Information quality quantization method, information quality quantization device, computer equipment and storage medium
CN113470799A (en) * 2021-07-09 2021-10-01 深圳前海天智信息技术有限公司 Intelligent editor of hospital comprehensive quality supervision platform
CN113469571A (en) * 2021-07-22 2021-10-01 广东电网有限责任公司广州供电局 Data quality evaluation method and device, computer equipment and readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008053949A1 (en) * 2006-11-01 2008-05-08 Intellectual Property Bank Corp. Document group analysis device
US10671483B1 (en) * 2016-04-22 2020-06-02 EMC IP Holding Company LLC Calculating data value via data protection analytics
CN110543628A (en) * 2018-05-29 2019-12-06 南京大学 text information quality measurement method under rule constraint
CN111192176A (en) * 2019-12-30 2020-05-22 华中师范大学 Online data acquisition method and device supporting education informatization assessment
CN112633679A (en) * 2020-12-21 2021-04-09 贵州电网有限责任公司电力科学研究院 Information quality quantization method, information quality quantization device, computer equipment and storage medium
CN113470799A (en) * 2021-07-09 2021-10-01 深圳前海天智信息技术有限公司 Intelligent editor of hospital comprehensive quality supervision platform
CN113469571A (en) * 2021-07-22 2021-10-01 广东电网有限责任公司广州供电局 Data quality evaluation method and device, computer equipment and readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张弛: "数据资产价值分析模型与交易体系研究", 《中国优秀博士学位论文全文数据库》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113900921A (en) * 2021-12-07 2022-01-07 人民法院信息技术服务中心 Court information system running state evaluation method, device, equipment and storage medium
CN117633488A (en) * 2023-12-01 2024-03-01 重庆金微科技有限公司 Product feature mining method and system based on user feedback data

Also Published As

Publication number Publication date
CN113641825B (en) 2022-01-04

Similar Documents

Publication Publication Date Title
Aziz et al. Predicting supervise machine learning performances for sentiment analysis using contextual-based approaches
CN111309824B (en) Entity relationship graph display method and system
US10552735B1 (en) Applied artificial intelligence technology for processing trade data to detect patterns indicative of potential trade spoofing
CN108090800B (en) Game prop pushing method and device based on player consumption potential
CN113641825B (en) Smart court system big data processing method and device based on objective information theory
CN111797320B (en) Data processing method, device, equipment and storage medium
An et al. Identifying financial statement fraud with decision rules obtained from Modified Random Forest
CN108833139B (en) OSSEC alarm data aggregation method based on category attribute division
CN111738843B (en) Quantitative risk evaluation system and method using running water data
CN112560105B (en) Joint modeling method and device for protecting multi-party data privacy
Xu et al. Novel key indicators selection method of financial fraud prediction model based on machine learning hybrid mode
CN115545103A (en) Abnormal data identification method, label identification method and abnormal data identification device
CN104572623B (en) A kind of efficient data analysis and summary method of online LDA models
Kumar et al. A novel fuzzy rough sets theory based CF recommendation system
CN108304568A (en) A kind of real estate Expectations big data processing method and system
WO2018044955A1 (en) Systems and methods for measuring collected content significance
CN109636529B (en) Commodity recommendation method and device and computer-readable storage medium
Hennig et al. An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture‐based clustering
CN114626940A (en) Data analysis method and device and electronic equipment
Hakyemez et al. K-means vs. Fuzzy C-means: a comparative analysis of two popular clustering techniques on the featured mobile applications benchmark
CN112926991A (en) Cascade group severity grade dividing method and system
CN113111284B (en) Classification information display method and device, electronic equipment and readable storage medium
Apitzsch et al. Cluster Analysis of Mixed Data Types in Credit Risk: A study of clustering algorithms to detect customer segments
Alallaq et al. Sentiment analysis to enhance detection of latent astroturfing groups in online social networks
Back The use of Anomaly Detection in Identification of Unintentional and Intentional Financial Misstatements

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant