CN116595173A - Data processing method, device, equipment and storage medium for policy information management - Google Patents

Data processing method, device, equipment and storage medium for policy information management Download PDF

Info

Publication number
CN116595173A
CN116595173A CN202310506130.1A CN202310506130A CN116595173A CN 116595173 A CN116595173 A CN 116595173A CN 202310506130 A CN202310506130 A CN 202310506130A CN 116595173 A CN116595173 A CN 116595173A
Authority
CN
China
Prior art keywords
policy
text
structured
knowledge graph
texts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310506130.1A
Other languages
Chinese (zh)
Inventor
高书增
倪尉添
章杨新
姚倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongpu Software Co Ltd
Original Assignee
Dongpu Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongpu Software Co Ltd filed Critical Dongpu Software Co Ltd
Priority to CN202310506130.1A priority Critical patent/CN116595173A/en
Publication of CN116595173A publication Critical patent/CN116595173A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of big data and discloses a data processing method, a device, equipment and a storage medium for policy information management. The method comprises the following steps: preprocessing the collected policy texts to obtain structured policy texts; classifying the structured policy texts based on the policy subjects and the fields to obtain policy types corresponding to the structured policy texts; storing the structured policy text into a policy database according to the policy type; constructing a policy information knowledge graph based on the structured policy text, and linking the policy information knowledge graph with a policy database; the policy information knowledge graph is shared based on a preset data sharing mechanism.

Description

Data processing method, device, equipment and storage medium for policy information management
Technical Field
The present invention relates to the field of big data technologies, and in particular, to a data processing method, apparatus, device, and storage medium for policy information management.
Background
Policy data refers to information contained in documents of various policies, regulations, etc. issued by governments. Policy data includes content such as policy topics, industry categories, timelines, applicability, keywords, projects/measures, rewards/penalties, and the like.
Policy data has significant value and significance to governments and businesses. The government needs to collect and analyze various policy data, formulate and perfect corresponding policy guidelines, and conduct accurate implementation and continuous supervision; enterprises need to know and comply with the policies and regulations to standardize operations and keep the policies changing in full, thereby reducing risks and improving competitiveness.
Policy data is typically stored in unstructured form, with content that is unordered, difficult to sort and retrieve, and difficult to manage and utilize efficiently and quantitatively.
Accordingly, there is a need for improvement and development in the art.
Disclosure of Invention
The invention provides a data processing method, a device, equipment and a storage medium for policy information management, which are used for storing a policy text into a policy database according to a policy type, constructing a policy information knowledge graph and linking the policy information knowledge graph with the policy database so as to facilitate centralized management, inquiry and update of policies.
The first aspect of the present invention provides a data processing method for policy information management, the data processing method for policy information management comprising: preprocessing the collected policy texts to obtain structured policy texts; classifying the structured policy texts based on the policy subjects and the fields to obtain policy types corresponding to the structured policy texts; storing the structured policy text into a policy database according to a policy type; constructing a policy information knowledge graph based on the structured policy text and linking the policy information knowledge graph with the policy database; and sharing the policy information knowledge graph based on a preset data sharing mechanism.
Optionally, in a first implementation manner of the first aspect of the present invention, the preprocessing the collected policy text to obtain a processed policy text includes: data cleaning is carried out on the collected policy texts, and cleaned policy texts are obtained; performing data conversion and normalization processing on the cleaned policy text to obtain normalized policy text; and carrying out structuring processing on the normalized policy text to obtain a structured policy text.
Optionally, in a second implementation manner of the first aspect of the present invention, the structuring the normalized policy text to obtain a structured policy text includes: extracting and dividing the normalized policy text according to chapters; intercepting the text of each chapter according to the extraction result, and adding metadata to obtain a plurality of chapter texts, wherein the metadata comprises a theme, a abstract and an implementation period; and sequencing the chapter texts to obtain a structured policy text.
Optionally, in a third implementation manner of the first aspect of the present invention, the classifying the structured policy text based on the policy theme and the domain to obtain a policy type corresponding to the structured policy text includes: acquiring a classification system of a policy theme and a domain, and forming a policy classification rule based on the classification system of the policy theme and the domain; performing word segmentation, word stopping and word stem processing on the structured policy text to obtain a processed policy text; based on the policy classification rules, the processed policy text is mapped to corresponding policy topics and fields using a rules engine.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the storing the structured policy text into a policy database according to a policy type includes: creating a policy text table and a policy classification table in a policy database, and connecting the policy text table and the policy classification table to create a policy classification relation table; storing the structured policy text into a corresponding policy text table based on a policy type corresponding to the structured policy text; establishing full text indexes for the fields of 'policy text contents' in the policy text table, respectively establishing single-column indexes and multi-column indexes for the fields of 'issuing institutions' and 'issuing dates' in the policy text table, establishing indexes for the fields of 'policy classification names' in the policy classification table, and establishing composite indexes for the fields of 'policy text numbers' and 'policy classification numbers' in the policy classification relation table.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the constructing a policy information knowledge graph based on the structured policy text and linking the policy information knowledge graph with the policy database includes: extracting key information of the structured policy text; establishing a relationship between the policy entity and the attribute based on the extracted key information, and constructing a policy information knowledge graph by adopting a knowledge graph technology; the policy entity is a policy file or a policy theme, and the attributes include targets, contents and enforcement mechanisms of the policy; and establishing a mapping relation between the policy information knowledge graph and the policy database, and linking the policy information knowledge graph with the policy database.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the sharing the policy information knowledge graph based on a preset data sharing mechanism includes: setting a first access right level for the sensitivity degree of the structured policy text; setting a second access right level according to the user role; the policy information knowledge graph is shared based on the first access right level and the second access right level.
The second aspect of the present invention provides a data processing apparatus for policy information management, including a preprocessing module, configured to preprocess collected policy texts to obtain structured policy texts; the classification module is used for classifying the structured policy texts based on the policy subjects and the domain to obtain policy types corresponding to the structured policy texts; the storage module is used for storing the structured policy text into a policy database according to the policy type; the construction module is used for constructing a policy information knowledge graph based on the structured policy text and linking the policy information knowledge graph with the policy database; and the sharing module is used for sharing the policy information knowledge graph based on a preset data sharing mechanism.
Optionally, in a first implementation manner of the second aspect of the present invention, the preprocessing module includes: the data cleaning unit is used for cleaning the data of the collected policy texts to obtain cleaned policy texts; the normalization processing unit is used for performing data conversion and normalization processing on the cleaned policy text to obtain a normalized policy text; and the structuring processing unit is used for carrying out structuring processing on the normalized policy text to obtain a structured policy text.
Optionally, in a second implementation manner of the second aspect of the present invention, the classification module includes: the forming unit is used for acquiring a classification system of the policy theme and the domain and forming a policy classification rule based on the classification system of the policy theme and the domain; the text processing unit is used for carrying out word segmentation, word stopping and word stem processing on the structured policy text to obtain a processed policy text; and the classification unit is used for mapping the processed policy text to corresponding policy subjects and fields by using a rule engine based on the policy classification rules.
Optionally, in a third implementation manner of the second aspect of the present invention, the storage module includes: the creating unit is used for creating a policy text table and a policy classification table in the policy database, connecting the policy text table with the policy classification table and creating a policy classification relation table; the storage unit is used for storing the structured policy text into the corresponding policy text table based on the policy type corresponding to the structured policy text; the index establishing unit is used for establishing full-text indexes for the fields of the 'policy text contents' in the policy text table, respectively establishing single-column indexes and multi-column indexes for the fields of the 'issuing mechanism' and the 'issuing date' in the policy text table, establishing indexes for the fields of the 'policy classification names' in the policy classification table, and establishing composite indexes for the fields of the 'policy text numbers' and the 'policy classification numbers' in the policy classification relation table.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the building module includes: the extraction unit is used for extracting key information of the structured policy text; the knowledge graph construction unit is used for establishing a relationship between the policy entity and the attribute based on the extracted key information and constructing a policy information knowledge graph by adopting a knowledge graph technology; and the link unit is used for establishing a mapping relation between the policy information knowledge graph and the policy database and linking the policy information knowledge graph with the policy database.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the sharing module includes: a first authority setting unit for setting a first access authority level for a sensitivity degree of the structured policy text; the second authority setting unit is used for setting a second access authority level according to the user role; and the sharing unit is used for sharing the policy information knowledge graph based on the first access right level and the second access right level.
A third aspect of the present invention provides a data processing apparatus for policy information management, comprising: a memory and at least one processor, the memory having computer readable instructions stored therein, the memory and the at least one processor being interconnected by a line; the at least one processor invokes the computer readable instructions in the memory to cause the policy information managed data processing device to perform the steps of the policy information managed data processing method as described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein computer-readable instructions which, when run on a computer, cause the computer to perform the steps of a data processing method of policy information management as described above.
According to the technical scheme provided by the invention, the collected policy texts are preprocessed to obtain the structured policy texts, so that a good basis can be provided for processing and analyzing policy information; the structured policy text is stored in the policy database according to the policy type, a policy information knowledge graph is constructed, the policy information knowledge graph is linked with the policy database, centralized management, inquiry and updating of the policy information are facilitated, the information processing and presentation of relevance, hierarchical structure and the like among the policy information can be realized, and a user can be helped to understand the policy more quickly and easily by establishing the policy information knowledge graph; in addition, the policy information knowledge graph is shared through a preset data sharing mechanism, so that the management and use efficiency of policy information can be improved, and the co-establishment sharing is promoted.
Drawings
FIG. 1 is a first flowchart of a data processing method for policy information management according to an embodiment of the present invention;
FIG. 2 is a second flowchart of a data processing method for policy information management according to an embodiment of the present invention;
FIG. 3 is a third flowchart of a data processing method for policy information management according to an embodiment of the present invention;
FIG. 4 is a fourth flowchart of a data processing method for policy information management according to an embodiment of the present invention;
FIG. 5 is a fifth flowchart of a data processing method for policy information management according to an embodiment of the present invention;
FIG. 6 is a sixth flowchart of a data processing method for policy information management according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a data processing apparatus for policy information management according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a data processing apparatus for policy information management according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a data processing method, a device, equipment and a storage medium for policy information management, which are used for storing a policy text into a policy database according to a policy type, constructing a policy information knowledge graph and linking the policy information knowledge graph with the policy database so as to facilitate centralized management, inquiry and update of policy information. The method comprises the following steps: preprocessing the collected policy texts to obtain structured policy texts; classifying the structured policy texts based on the policy subjects and the fields to obtain policy types corresponding to the structured policy texts; storing the structured policy text into a policy database according to the policy type; constructing a policy information knowledge graph based on the structured policy text, and linking the policy information knowledge graph with a policy database; the policy information knowledge graph is shared based on a preset data sharing mechanism.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For easy understanding, a specific flow of an embodiment of the present invention is described below, referring to fig. 1, and a first embodiment of a data processing method for policy information management in an embodiment of the present invention includes:
s101, preprocessing the collected policy texts to obtain the structured policy texts.
It should be understood that the execution subject of the present invention may be a data processing apparatus for policy information management, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.
It will be appreciated that policy documents may be collected from government official websites.
In this embodiment, preprocessing the collected policy text may include data cleaning, where data cleaning refers to performing operations such as screening, conversion, integration, deduplication, etc. on the original data during the data preprocessing stage, so as to obtain clean, usable, and meaningful data. Dirty data, error data and invalid data are effectively reduced by the cleaned data, and the accuracy and the precision of the data are ensured. And the data cleaning can effectively reduce the existence of redundant and invalid data, shorten the data processing time and save the computing resources.
S102, classifying the structured policy texts based on the policy subjects and the fields to obtain the policy types corresponding to the structured policy texts.
In this embodiment, structured policy text may be classified into some policy types with similarity and class features based on classification of policy topics and domains, for example: financial, scientific, educational, medical, health, urban and rural construction, industrial and energy, and cultural entertainment.
Financial class policies include financial related policies, tax policies, financial policies, and the like. Technological education policies include scientific research and technology development policies, educational and personal resource policies, and the like. Medical and health type policies include medical institution management policies, medicine and equipment policies, and the like. Urban and rural construction policies include urban planning and construction policies, rural development policies, environmental protection policies, and the like. Industry and energy policies include industry policies, energy policies, transportation policies, and the like. Political legal class policies include constitution and legal policies, personal and social security policies, national defense and foreign exchange policies, and the like. Cultural entertainment policies include cultural and travel policies, sports policies, and the like.
In this embodiment, the structured policy text may be classified by constructing a machine learning classification model or a rule engine, where the machine learning classification model may employ class algorithms such as a naive bayes classifier, a Support Vector Machine (SVM) classifier, a decision tree classifier, and a random forest classifier.
S103, storing the structured policy text into a policy database according to the policy type.
In this embodiment, a plurality of fields may be preset inside the policy database to store basic information of policies, such as a policy title, a date of issuance, a policy implementation scope, and the like. By utilizing the fields, a multi-dimensional search condition can be established for the structured policy text, and the accuracy of policy information search is improved. That is, storing the structured policy text into the policy database according to the policy type can improve the efficiency and accuracy of policy retrieval. In addition, an index can be established in the policy database, and when a user needs to inquire about a certain type of policy, the system can find out the corresponding policy text through the quick index, so that a great deal of time and calculation resources are avoided in full text searching.
S104, constructing a policy information knowledge graph based on the structured policy text, and linking the policy information knowledge graph with a policy database.
It will be appreciated that by semantic parsing, more important elements in the structured policy text can be extracted according to various rules and models.
In this embodiment, the policy information knowledge graph concentrates the contents of data, relationships, rules, and the like of the policy information, and can be continuously expanded and optimized as needed. The construction of the policy information knowledge graph mainly comprises entity extraction and relation extraction. Entity extraction refers to identifying and labeling entities, such as policy topics, places, institutions, people and the like, in a structured policy text to form an entity library, and identifying and labeling relationships between entities, such as relationships between policy issuing departments and policy issuing dates, in the structured policy text to form a relationship library, and constructing a policy information knowledge graph through knowledge mapping, association and other technologies on the basis of the entity library and the relationship library.
S105, sharing the policy information knowledge graph based on a preset data sharing mechanism.
In this embodiment, the preset data sharing mechanism includes customization of a sharing manner and a sharing object, and considering data security and privacy protection, the sharing object needs to be classified and evaluated, and corresponding sharing scope, authority, use condition, and the like are formulated. For shared objects with different layers, fields and roles, different access rights and control mechanisms can be set, so that the shading performance and the security of the data access process are ensured.
The embodiment provides a data processing method for policy information management, which preprocesses collected policy texts to obtain structured policy texts, and can provide a good basis for processing and analyzing policy information; the structured policy text is stored in the policy database according to the policy type, a policy information knowledge graph is constructed, the policy information knowledge graph is linked with the policy database, centralized management, inquiry and updating of the policy information are facilitated, the information processing and presentation of relevance, hierarchical structure and the like among the policy information can be realized, and a user can be helped to understand the policy more quickly and easily by establishing the policy information knowledge graph; in addition, the policy information knowledge graph is shared through a preset data sharing mechanism, so that the management and use efficiency of policy information can be improved, and the co-establishment sharing is promoted.
Referring to fig. 2, a second embodiment of a data processing method for policy information management according to an embodiment of the present invention includes:
and S201, cleaning the collected policy texts by data to obtain cleaned policy texts.
In this embodiment, data cleansing of the collected policy text includes: unnecessary information in the original data is filtered and deleted, and the unnecessary information comprises null values, repeated values, noise data, abnormal data and the like.
S202, performing data conversion and normalization processing on the cleaned policy text to obtain a normalized policy text;
it will be appreciated that the collected policy text is typically provided in the form of a PDF or Word document, which is unstructured data, and therefore, the post-cleaning policy text needs to be converted to a structured data format (e.g., excel, CSV).
In addition, if policy data originates from different data sources, there may be different data formats and units, requiring normalization processing to facilitate unified analysis. For example, the time format is converted into a unified date format.
S203, carrying out structuring processing on the normalized policy text to obtain a structured policy text.
Specifically, performing structural processing on the normalized policy text to obtain a structured policy text includes: extracting and dividing the normalized policy text according to chapters; intercepting the text of each chapter according to the extraction result, and adding metadata to obtain a plurality of chapter texts, wherein the metadata comprises a theme, a abstract and an implementation period; and sequencing the chapter texts to obtain the structured policy text.
It will be appreciated that structuring the normalized policy text may also include extracting key attributes of the normalized policy text, including policy enforcement deadlines, authorities, applicability, and policy content keywords, to form a query tag, and exposing terms of high frequency of occurrence in the normalized policy text using a visualization tool (e.g., word Cloud).
In this embodiment, by preprocessing and structuring the policy text, the structured policy text can be conveniently stored in the database, and data retrieval and analysis can be performed through query languages such as SQL, which is more efficient and provides a good basis for processing policy information.
Referring to fig. 3, a third embodiment of a data processing method for policy information management according to an embodiment of the present invention includes:
s301, acquiring a classification system of the policy subjects and the domain, and forming policy classification rules based on the classification system of the policy subjects and the domain.
In this embodiment, the classification system of the policy theme and the domain may be customized in advance, and the classification system of the policy theme and the domain may be adjusted according to the actual needs by referring to the national economy industry classification issued by the national statistical bureau.
For example, for policies related to the service industry, the service industry is classified into business services, accommodation dining services, financial insurance services, real estate services, public facility management services, and rental and business services, etc., corresponding to the respective policy subjects and fields.
And designing rules according to the obtained policy theme and domain classification system to form policy classification rules.
S302, word segmentation, word stopping and word stem processing are carried out on the structured policy text, and the processed policy text is obtained.
In this embodiment, word segmentation of the structured policy text refers to dividing the structured policy text according to word boundaries to obtain a series of discrete words; chinese word segmentation is required to adopt a Chinese word segmentation device, and commonly used examples include jieba, pkuseg and the like.
Decommissioning words refers to the removal of some common high-frequency words that have no actual meaning from text, which need to be removed from the text because some common high-frequency words that have no actual meaning, such as "on", "off", etc., have little meaning for text analysis and processing.
Word drying refers to the reduction or normalization of english words to better match and compare words of different morphologies. For example, words of different forms such as "running", "run", "runner" are converted into their root form "run".
S303, mapping the processed policy text to corresponding policy subjects and fields by using a rule engine based on the policy classification rules.
In this embodiment, the rule engine is a software tool that maps "if-then" conditional statements into actions or outputs, and is capable of automatically executing rules to process input data, enabling automated decisions. The method comprises the steps of matching the processed policy texts according to policy classification rules, mapping the processed policy texts to corresponding policy subjects and fields, and outputting and storing mapping results for subsequent analysis and processing.
For example, for a policy text describing a financial policy, it may be matched to a financial insurance service category by looking up keywords, such as words of "bank financing", "stock market supervision", etc., and determining that the financial policy is the relevant topic. Meanwhile, if other aspects of content are also mentioned in the policy text, such as a house tax receipt or tax preference policy, it may be mapped to a corresponding policy field and theme.
For example, for a policy text matched to a financial insurance service, the corresponding policy topic is output as "financial policy", and the policy field is "financial insurance service" under "business service".
In this embodiment, the use of the rule engine to classify the processed policy text may improve efficiency, accuracy, and reliability.
Referring to fig. 4, a fourth embodiment of a data processing method for policy information management according to an embodiment of the present invention includes:
s401, creating a policy text table and a policy classification table in a policy database, and connecting the policy text table and the policy classification table to create a policy classification relation table.
In this embodiment, templates of a policy text table, a policy classification table and a policy classification relation table may be preset, then the policy text table and the policy classification table are created in the policy database according to the preset templates, and then the policy text table and the policy classification table are connected to create the policy classification relation table.
The policy text table is used to store detailed information of policy text, including fields of policy number, policy title, policy body content, issuing authority, issuing time, update time, etc.
The policy classification table is used for storing policy class information, including fields of policy classification number, policy classification name, policy classification description, etc.
The policy classification relation table is used for recording the relation between the policy text and the policy classification, and comprises fields such as a policy text ID, a policy classification ID and the like. By connecting the policy text table and the policy classification table, a policy classification relation table is established, and the IDs of the policy text table and the policy classification table are mapped with the IDs in the policy classification relation table.
S402, storing the structured policy text into a corresponding policy text table based on the policy type corresponding to the structured policy text.
In this embodiment, a record is created in the corresponding policy text table, and relevant fields including a policy number, a policy title, policy body content, a release mechanism, release time, update time, etc. are filled in; the created record is then added to the corresponding policy text table and the INSERT operation may be completed using the INSERT statement. If there are multiple structured policy texts to be stored, the above steps may be repeated, creating a record for each structured policy text and adding it to the corresponding policy text table.
S403, establishing full text indexes for the fields of 'policy text contents' in the policy text table, respectively establishing single-column indexes and multi-column indexes for the fields of 'issuing institutions' and 'issuing dates' in the policy text table, establishing indexes for the fields of 'policy classification names' in the policy classification table, and establishing composite indexes for the fields of 'policy text numbers' and 'policy classification numbers' in the policy classification relation table.
It can be understood that the full text index refers to indexing the text type field, so that the operation speed of matching, searching, sorting and the like of the text content can be increased. In the policy management system, the "policy text content" field is a text type field, and the content retrieval efficiency of the policy text can be improved by establishing a full text index for the field.
A single-column index refers to indexing one field, while a multi-column index refers to indexing multiple fields simultaneously. In the policy management system, the "issuing organization" and "issuing date" fields are relatively common query fields, so that the query and filtering of policies can be accelerated by respectively establishing a single-column index and a multi-column index for the query fields.
The policy classification table is an important table in the policy management system, in which a "policy classification name" field is an important field frequently used for filtering policy information, and thus, indexing the field can accelerate the query speed of policy classification information.
The policy classification relation table is a table that manages correspondence between policy classifications and policy texts. Here, in order to improve query efficiency, a form of a composite index is adopted. Composite indexing refers to indexing multiple columns simultaneously, so that the query efficiency of screening policy text by policy classification can be optimized.
Referring to fig. 5, a fifth embodiment of a data processing method for policy information management according to an embodiment of the present invention includes:
s501, extracting key information of the structured policy text.
In this embodiment, the key information includes a policy name, a regulation standard, a responsibility department, a policy theme, a policy object, and a supporting measure.
In this embodiment, the structured policy text is semantically analyzed using natural language processing techniques, from which key information of the structured policy text is extracted.
S502, establishing a relation between a policy entity and an attribute based on the extracted key information, and constructing a policy information knowledge graph by adopting a knowledge graph technology; the policy entity is a policy document or a policy topic, and the attributes include the goals, content and enforcement agencies of the policy.
In this embodiment, a named entity recognition technique is used to extract various entities, such as policy main body, policy domain, policy target, etc., from the extracted key information. By applying the technologies of dependency syntactic analysis, keyword extraction and the like, various relations among different entities, such as the belonging relation between a main body and a domain, the relation between the domain and a target and the like, are extracted from the extracted key information.
S503, establishing a mapping relation between the policy information knowledge graph and the policy database, and linking the policy information knowledge graph with the policy database.
In this embodiment, a unique identifier is established in the policy database for each entity and attribute in the policy information knowledge graph to enable linking to the relevant locations of the policy database. For example, the policy document may correspond to a unique number in the policy database, and the policy theme may correspond to a related policy term in the policy database.
In the embodiment, the policy information knowledge graph is constructed based on the structured policy text and linked to the relevant position of the policy database, so that the comprehensive understanding and grasp of the policy information are facilitated, and the accuracy and suitability of the policy to the demands are enhanced.
Referring to fig. 6, a sixth embodiment of a data processing method for policy information management according to an embodiment of the present invention includes:
s601, setting a first access right level for the sensitivity degree of the structured policy text.
In this embodiment, policy information is used as a sensitive data type, corresponding authority limits are set according to factors such as a propagation range, a policy formulation level, content characteristics and the like, and technical means such as encryption are adopted to ensure information security. The degrees of sensitivity include public, internal, confidential and impersonative, different degrees of sensitivity corresponding to different first levels of access rights.
S602, setting a second access right level according to the user role.
In this embodiment, different access rights are authorized for users with different roles, so as to realize different dimensionality viewing and analysis of the policy information knowledge graph. The user roles include general users, academic researchers, government agency related workers and scientific and technological enterprise related workers. The different user roles correspond to different second access permission levels. The browsing, searching and inquiring operations of the common user on the policy information knowledge graph are not limited. Under the condition of meeting basic query conditions, academic researchers can acquire more complete policy information data and can analyze and process the data. Government agency related personnel can obtain more detailed and accurate policy information and can edit and export related policy text. Related personnel of science and technology enterprises can dock and process the policy information with data in the self field so as to obtain more accurate policy guiding directions.
S603, sharing a policy information knowledge graph based on the first access right level and the second access right level.
It can be appreciated that when sharing the policy information knowledge graph is implemented, a sharing data format and a sharing data form may also be preset, for example, the policy information knowledge graph may use a standardized data format and an API interface, so as to facilitate data interaction and sharing between different service systems. Therefore, the data safety, effectiveness and reliability can be ensured, and the data utilization value is improved.
In the implementation, the first access right level and the second access right level are set according to the sensitivity degree of the structured policy text and the user role, so that the security and the effective use of the policy information knowledge graph can be ensured.
The above describes a data processing method for policy information management in an embodiment of the present invention, and the following describes an embodiment of the present invention, referring to fig. 7, in which an implementation manner of a data processing apparatus for policy information management in an embodiment of the present invention includes:
a preprocessing module 701, configured to preprocess the collected policy text to obtain a structured policy text;
the classification module 702 is configured to classify the structured policy text based on the policy topic and the domain, and obtain a policy type corresponding to the structured policy text;
a storage module 703, configured to store the structured policy text into a policy database according to a policy type;
a construction module 704, configured to construct a policy information knowledge graph based on the structured policy text, and link the policy information knowledge graph with the policy database;
the sharing module 705 is configured to share the policy information knowledge graph based on a preset data sharing mechanism.
In this embodiment, the preprocessing module 701 includes: a data cleaning unit 7011, configured to perform data cleaning on the collected policy text to obtain a cleaned policy text; the normalization processing unit 7012 is used for performing data conversion and normalization processing on the cleaned policy text to obtain a normalized policy text; the structuring processing unit 7013 is configured to perform structuring processing on the normalized policy text to obtain a structured policy text.
In this embodiment, the classification module 702 includes: a forming unit 7021, configured to obtain a classification system of a policy topic and a domain, and form a policy classification rule based on the classification system of the policy topic and the domain; a text processing unit 7022, configured to perform word segmentation, word stopping and word stem processing on the structured policy text, so as to obtain a processed policy text; the classification unit 7023 is configured to map the processed policy text to a corresponding policy topic and domain using a rule engine based on the policy classification rule.
In the present embodiment, the storage module 703 includes: a creating unit 7031, configured to create a policy text table and a policy classification table in the policy database, and connect the policy text table and the policy classification table to create a policy classification relationship table; a storage unit 7032, configured to store the structured policy text into a corresponding policy text table based on a policy type corresponding to the structured policy text; the index establishing unit 7033 is configured to establish a full text index for a "policy text content" field in the policy text table, establish a single-column index and a multi-column index for a "release organization" field and a "release date" field in the policy text table, establish an index for a "policy classification name" field in the policy classification table, and establish a composite index for two fields, namely, a "policy text number" and a "policy classification number" field in the policy classification relation table.
In this embodiment, the building block 704 includes: an extracting unit 7041 for extracting key information of the structured policy text; a knowledge graph construction unit 7042, configured to establish a relationship between the policy entity and the attribute based on the extracted key information, and construct a policy information knowledge graph by using a knowledge graph technology; a linking unit 7043, configured to establish a mapping relationship between the policy information knowledge graph and the policy database, and link the policy information knowledge graph and the policy database.
In this embodiment, the sharing module 705 includes: a first authority setting unit 7051 for setting a first access authority level for the sensitivity level of the structured policy text; a second authority setting unit 7052 for setting a second access authority level according to the user role; a sharing unit 7053 for sharing the policy information knowledge graph based on the first access right level and the second access right level.
In the embodiment, the collected policy text is preprocessed to obtain the structured policy text, so that a good basis can be provided for processing and analyzing policy information; the structured policy text is stored in the policy database according to the policy type, a policy information knowledge graph is constructed, the policy information knowledge graph is linked with the policy database, centralized management, inquiry and updating of the policy information are facilitated, the information processing and presentation of relevance, hierarchical structure and the like among the policy information can be realized, and a user can be helped to understand the policy more quickly and easily by establishing the policy information knowledge graph; in addition, the policy information knowledge graph is shared through a preset data sharing mechanism, so that the management and use efficiency of policy information can be improved, and the co-establishment sharing is promoted.
The above-described data processing apparatus for policy information management in the embodiment of the present invention is described in detail in fig. 7 from the point of view of a modularized functional entity, and the following describes the data processing device for policy information management in the embodiment of the present invention in detail from the point of view of hardware processing.
Fig. 8 is a schematic structural diagram of a policy information management data processing device according to an embodiment of the present invention, where the device 800 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 810 (e.g., one or more processors) and a memory 820, and one or more storage media 830 (e.g., one or more mass storage devices) storing application programs 833 or data 832. Wherein memory 820 and storage medium 830 can be transitory or persistent. The program stored on the storage medium 830 may include one or more modules (not shown), each of which may include a series of instruction operations in the device 800. Still further, the processor 810 may be arranged to communicate with a storage medium 830 in which a series of instruction operations are performed on the device 800.
The device 800 may also include one or more power supplies 840, one or more wired or wireless network interfaces 850, one or more input/output interfaces 860, and/or one or more operating systems 831, such as Windows Serve, mac OS X, unix, linux, freeBSD, etc.
The embodiments of the present invention also provide a computer readable storage medium, which may be a non-volatile computer readable storage medium, or may be a volatile computer readable storage medium, in which instructions are stored which, when executed on a computer, cause the computer to perform the steps of a data processing method for policy information management.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system or apparatus and unit described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A data processing method for policy information management, the data processing method for policy information management comprising:
preprocessing the collected policy texts to obtain structured policy texts;
classifying the structured policy texts based on the policy subjects and the fields to obtain policy types corresponding to the structured policy texts;
storing the structured policy text into a policy database according to a policy type;
constructing a policy information knowledge graph based on the structured policy text and linking the policy information knowledge graph with the policy database;
and sharing the policy information knowledge graph based on a preset data sharing mechanism.
2. The method for processing data for policy information management according to claim 1, wherein said preprocessing the collected policy texts to obtain processed policy texts comprises:
data cleaning is carried out on the collected policy texts, and cleaned policy texts are obtained;
performing data conversion and normalization processing on the cleaned policy text to obtain normalized policy text;
and carrying out structuring processing on the normalized policy text to obtain a structured policy text.
3. The method for processing policy information management data according to claim 2, wherein said structuring the normalized policy text to obtain a structured policy text comprises:
extracting and dividing the normalized policy text according to chapters;
intercepting the text of each chapter according to the extraction result, and adding metadata to obtain a plurality of chapter texts, wherein the metadata comprises a theme, a abstract and an implementation period;
and sequencing the chapter texts to obtain a structured policy text.
4. The method for processing policy information according to claim 1, wherein said classifying the structured policy text based on the policy topic and the domain to obtain the policy type corresponding to the structured policy text comprises:
Acquiring a classification system of a policy theme and a domain, and forming a policy classification rule based on the classification system of the policy theme and the domain;
performing word segmentation, word stopping and word stem processing on the structured policy text to obtain a processed policy text;
based on the policy classification rules, the processed policy text is mapped to corresponding policy topics and fields using a rules engine.
5. The data processing method for policy information management according to claim 1, wherein said storing the structured policy text in the policy database according to the policy type comprises:
creating a policy text table and a policy classification table in a policy database, and connecting the policy text table and the policy classification table to create a policy classification relation table;
storing the structured policy text into a corresponding policy text table based on a policy type corresponding to the structured policy text;
establishing full text indexes for the fields of 'policy text contents' in the policy text table, respectively establishing single-column indexes and multi-column indexes for the fields of 'issuing institutions' and 'issuing dates' in the policy text table, establishing indexes for the fields of 'policy classification names' in the policy classification table, and establishing composite indexes for the fields of 'policy text numbers' and 'policy classification numbers' in the policy classification relation table.
6. The data processing method for policy information management according to claim 1, wherein said constructing a policy information knowledge graph based on said structured policy text and linking said policy information knowledge graph with said policy database comprises:
extracting key information of the structured policy text;
establishing a relationship between the policy entity and the attribute based on the extracted key information, and constructing a policy information knowledge graph by adopting a knowledge graph technology; the policy entity is a policy file or a policy theme, and the attributes include targets, contents and enforcement mechanisms of the policy;
and establishing a mapping relation between the policy information knowledge graph and the policy database, and linking the policy information knowledge graph with the policy database.
7. The data processing method for policy information management according to claim 1, wherein said sharing the policy information knowledge base based on a preset data sharing mechanism comprises:
setting a first access right level for the sensitivity degree of the structured policy text;
setting a second access right level according to the user role;
the policy information knowledge graph is shared based on the first access right level and the second access right level.
8. A data processing apparatus for policy information management, comprising:
the preprocessing module is used for preprocessing the collected policy texts to obtain structured policy texts;
the classification module is used for classifying the structured policy texts based on the policy subjects and the domain to obtain policy types corresponding to the structured policy texts;
the storage module is used for storing the structured policy text into a policy database according to the policy type;
the construction module is used for constructing a policy information knowledge graph based on the structured policy text and linking the policy information knowledge graph with the policy database;
and the sharing module is used for sharing the policy information knowledge graph based on a preset data sharing mechanism.
9. A data processing apparatus for policy information management, comprising a memory and at least one processor, said memory having computer readable instructions stored therein;
the at least one processor invokes the computer readable instructions in the memory to perform the steps of the data processing method of policy information management as claimed in any one of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the data processing method of policy information management according to any of claims 1-7.
CN202310506130.1A 2023-05-06 2023-05-06 Data processing method, device, equipment and storage medium for policy information management Pending CN116595173A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310506130.1A CN116595173A (en) 2023-05-06 2023-05-06 Data processing method, device, equipment and storage medium for policy information management

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310506130.1A CN116595173A (en) 2023-05-06 2023-05-06 Data processing method, device, equipment and storage medium for policy information management

Publications (1)

Publication Number Publication Date
CN116595173A true CN116595173A (en) 2023-08-15

Family

ID=87610811

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310506130.1A Pending CN116595173A (en) 2023-05-06 2023-05-06 Data processing method, device, equipment and storage medium for policy information management

Country Status (1)

Country Link
CN (1) CN116595173A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117574009A (en) * 2023-10-31 2024-02-20 灵犀科技有限公司 Structured policy data generation method, device, electronic equipment and readable medium
CN117708350A (en) * 2024-02-06 2024-03-15 成都草根有智创新科技有限公司 Enterprise policy information association method and device and electronic equipment
CN118069790A (en) * 2024-04-18 2024-05-24 苏州城方信息技术有限公司 Rescue policy matching method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117574009A (en) * 2023-10-31 2024-02-20 灵犀科技有限公司 Structured policy data generation method, device, electronic equipment and readable medium
CN117708350A (en) * 2024-02-06 2024-03-15 成都草根有智创新科技有限公司 Enterprise policy information association method and device and electronic equipment
CN117708350B (en) * 2024-02-06 2024-05-14 成都草根有智创新科技有限公司 Enterprise policy information association method and device and electronic equipment
CN118069790A (en) * 2024-04-18 2024-05-24 苏州城方信息技术有限公司 Rescue policy matching method and device

Similar Documents

Publication Publication Date Title
US11222052B2 (en) Machine learning-based relationship association and related discovery and
US11663254B2 (en) System and engine for seeded clustering of news events
Inzalkar et al. A survey on text mining-techniques and application
US8060505B2 (en) Methodologies and analytics tools for identifying white space opportunities in a given industry
JP5879260B2 (en) Method and apparatus for analyzing content of microblog message
Avasthi et al. Techniques, applications, and issues in mining large-scale text databases
CN116595173A (en) Data processing method, device, equipment and storage medium for policy information management
Li et al. An intelligent approach to data extraction and task identification for process mining
Geiß et al. Neckar: A named entity classifier for wikidata
Osipov et al. Exactus expert—search and analytical engine for research and development support
CN114254201A (en) Recommendation method for science and technology project review experts
CA2956627A1 (en) System and engine for seeded clustering of news events
Kalra et al. Data mining of heterogeneous data with research challenges
Gupta et al. Research and implementation of event extraction from twitter using LDA and scoring function
Ise Integration and analysis of unstructured data for decision making: Text analytics approach
CN116467291A (en) Knowledge graph storage and search method and system
US20220156228A1 (en) Data Tagging And Synchronisation System
Abolhassani et al. Extracting topics from semi-structured data for enhancing enterprise knowledge graphs
Zhu Financial data analysis application via multi-strategy text processing
Qumsiyeh et al. Enhancing web search by using query-based clusters and multi-document summaries
Zhang et al. A text mining based method for policy recommendation
KR20190052980A (en) Device and method of processing recruitment information
Beheshti et al. Data curation apis
Scholtes et al. Big data analytics for e-discovery
Swaraj et al. A fast approach to identify trending articles in hot topics from XML based big bibliographic datasets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination