CN112214615A - Policy document processing method and device based on knowledge graph and storage medium - Google Patents
Policy document processing method and device based on knowledge graph and storage medium Download PDFInfo
- Publication number
- CN112214615A CN112214615A CN202011117084.9A CN202011117084A CN112214615A CN 112214615 A CN112214615 A CN 112214615A CN 202011117084 A CN202011117084 A CN 202011117084A CN 112214615 A CN112214615 A CN 112214615A
- Authority
- CN
- China
- Prior art keywords
- policy
- file
- information
- entity
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003860 storage Methods 0.000 title claims abstract description 17
- 238000003672 processing method Methods 0.000 title abstract description 8
- 238000000034 method Methods 0.000 claims abstract description 61
- 238000012545 processing Methods 0.000 claims abstract description 38
- 238000010835 comparative analysis Methods 0.000 claims abstract description 25
- 238000002372 labelling Methods 0.000 claims abstract description 17
- 238000000605 extraction Methods 0.000 claims description 16
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 9
- 230000010365 information processing Effects 0.000 claims description 8
- 238000013145 classification model Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 abstract description 17
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 12
- 230000008901 benefit Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 230000002349 favourable effect Effects 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 239000000969 carrier Substances 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- RWSOTUBLDIXVET-UHFFFAOYSA-N Dihydrogen sulfide Chemical compound S RWSOTUBLDIXVET-UHFFFAOYSA-N 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/322—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Creation or modification of classes or clusters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Tourism & Hospitality (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Animal Behavior & Ethology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Educational Administration (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a policy document processing method, a policy document processing device, a policy document processing storage medium and computer equipment based on a knowledge graph, relates to the field of artificial intelligence, and can be applied to the field of intelligent cities or intelligent government affairs. The method comprises the following steps: acquiring a target policy file and a comparison file of the target policy file; obtaining at least one difference information of a target policy file and a comparison file according to the policy knowledge map; and generating comparative analysis suggestion information of the target policy file according to the at least one difference information and the labeling information of the at least one difference information in the policy knowledge graph. According to the method, the target policy file and the comparison file are processed according to the policy knowledge graph, the comparison and analysis suggestion information is automatically generated according to the difference information and the labeling information, various key information and difference information in the target policy file and the comparison file can be effectively captured, the efficiency of analyzing and processing the policy file is improved, and the workload of a user is reduced.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to a policy document processing method and device based on a knowledge graph, a storage medium and computer equipment.
Background
With the continuous development of big data technology and computer technology, various industries and various types of data analysis and processing work begin to take more and more important roles, taking various policy documents issued by government departments as an example, as the government plays a vital macroscopic regulation and control role in the development of the economic society, and the government comprises a plurality of functional institutions, and policy information issued by each functional institution can affect the industries, enterprises and products to a certain extent, so that the analysis and processing of the policy documents become very important work.
However, in the prior art, analysis processing is performed on various policy documents issued by government departments, and mainly focuses on collecting, displaying and managing various policy documents, classifying various policy documents from dimensions such as industry and industry, or performing structured extraction and display on policy documents according to contents of a single policy document, and specific contents in the policy documents can be obtained only by an analyst reading and comparative analysis layer by layer through own experience, so that the analysis efficiency is low.
Disclosure of Invention
In view of this, the present application provides a method, a system, a storage medium, and a computer device for processing policy documents based on a knowledge graph, and mainly aims to solve the technical problem of low efficiency of analyzing policy documents in the prior art.
According to a first aspect of the present invention, there is provided a method for processing a policy document based on a knowledge-graph, the method comprising:
acquiring a target policy file and a comparison file of the target policy file;
obtaining at least one difference information of a target policy file and a comparison file according to the policy knowledge map;
and generating comparative analysis suggestion information of the target policy file according to the at least one difference information and the labeling information of the at least one difference information in the policy knowledge graph.
According to a second aspect of the present invention, there is provided a knowledge-graph-based policy document processing apparatus, comprising:
the information acquisition module is used for acquiring a target policy file and a comparison file of the target policy file;
the information processing module is used for obtaining at least one difference information of the target policy file and the comparison file according to the policy knowledge map;
and the information generation module is used for generating comparative analysis suggestion information of the target policy file according to the at least one difference information and the labeling information of the at least one difference information in the policy knowledge map.
According to a third aspect of the present invention, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method of knowledge-graph based policy document processing.
According to a fourth aspect of the present invention, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above-mentioned method for processing a knowledge-graph based policy document when executing the program.
According to the policy document processing method based on the knowledge graph, the device, the storage medium and the computer equipment, firstly, a comparison document of a target policy document and the target policy document is obtained, then the target policy document and the comparison document are analyzed and processed according to the policy knowledge graph to obtain difference information of the target policy document and the comparison document, and finally, comparison analysis suggestion information of the target policy document is automatically generated according to the difference information and label information of the difference information in the policy knowledge graph. The method automatically processes the target policy file and the comparison file according to the policy knowledge graph, can effectively capture various key information in the target policy file and the comparison file and extract difference information of the key information and the comparison file, improves the efficiency of analyzing and processing the policy file, greatly reduces the workload of a user, and in addition, automatically generates comparison and analysis suggestion information through the labeled information in the knowledge graph, can effectively improve the accuracy of analyzing and processing the policy file, and provides a powerful basis for the comparison and analysis of the policy file.
In addition, the policy document processing method, the policy document processing device, the storage medium and the computer equipment based on the knowledge graph can be applied to the fields of smart cities or smart government affairs and the like, so that the construction of the smart cities and the smart government affairs is promoted, and the life work of urban residents and the performance of government functions are further promoted.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart diagram illustrating a method for processing a knowledge-graph-based policy document according to an embodiment of the present invention;
FIG. 2 is a flow chart diagram illustrating another method for processing a knowledge-graph based policy document according to an embodiment of the invention;
FIG. 3 illustrates a schematic diagram of a policy knowledge graph provided by an embodiment of the present invention;
FIG. 4 illustrates another policy knowledge map provided by an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a knowledge-graph-based policy document processing apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of another knowledge-graph-based policy document processing apparatus according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
In one embodiment, as shown in fig. 1, a method for processing policy document based on knowledge graph is provided, which is described by taking the method as an example applied to a computer device such as a client or a server, and includes the following steps:
101. and acquiring a target policy file and a comparison file of the target policy file.
The policy document refers to a document that is issued by an official organization in a standardized manner for general steps and specific measures to be taken, and in this embodiment, the target policy document and the comparison document of the target policy document are both the policy documents referred to by the above definitions.
Specifically, the computer device may search the database for the relevant policy file according to the search information such as the title or the keyword input by the user, and then the user may select the target policy file and the comparison file of the target policy file from the searched policy files. It should be noted that the target policy document refers to a policy document that is analyzed by the user's intention, and generally, the number of the target policy documents is one, and the reference document of the target policy document is a policy document that is compared or referred to by the user's intention, and the number of the target policy documents may be one or more. In addition, when the comparison file of the target policy file is obtained, the comparison file can be read according to the retrieval information such as titles or keywords, at least one comparison file which is similar to the target policy file can be found through the policy knowledge map, and the computer equipment can find the more accurate policy file which is in the same field as the target policy file and has the content which is more similar to the target policy file through the policy knowledge map.
Optionally, after the target policy file and the comparison file of the target policy file are obtained, the computer device may display the file to the user for viewing through an output device such as a liquid crystal screen or a touch screen.
102. And obtaining at least one difference information of the target policy document and the comparison document according to the policy knowledge graph.
The Knowledge map (also called Knowledge domain visualization or Knowledge domain mapping map) is a series of different graphs displaying the relationship between the Knowledge development process and the structure, and the Knowledge resources and the carriers thereof are described by using the visualization technology, so as to mine, analyze, construct, draw and display Knowledge and the mutual relation between the Knowledge resources and the carriers.
Further, there are various ways of representing the knowledge graph, such as semantic network, framework, script, etc., in this embodiment, the knowledge graph may be expressed by using a semantic network model, where the semantic network model is a concept network connected by semantic relations, and is a collection of numerous triples composed of points and edges, and in this embodiment, the policy knowledge graph may be constructed by triples composed of relations between policy entities and policy entities, where nodes of the triples are policy entities, edges of the triples are relations between policy entities, and the expression of the triples is (policy entity 1, entity relation, policy entity 2), as shown in the following diagram, (a) market further optimizes a carrier environment advanced user power access reform implementation scheme (trial) policy entities, transact [ entity relation ], two items [ digital entity ]) is a triplet, in the present embodiment, the critical information in each policy document is stored in the policy knowledge graph in the form of such triples.
Further, the computer device may search each policy entity and each entity relationship corresponding to the comparison file of the target policy file and the target policy file in the pre-established policy knowledge graph, then perform comparative analysis on each policy entity and each entity relationship corresponding to the target policy file and the target policy file, thereby finding out the difference policy entity and the entity relationship in the target policy file and the comparison file, and determining the difference triple of the target policy file and the comparison file, and then extract and process the difference triples, thereby obtaining the difference information of the target policy file and the comparison file. It should be noted that the target policy document and the comparison document are both pre-stored in the database, and the key information of the target policy document and the comparison document is also pre-stored in the knowledge graph after a series of processing.
103. And generating comparative analysis suggestion information of the target policy file according to the at least one difference information and the labeling information of the at least one difference information in the policy knowledge graph.
Specifically, after at least one piece of difference information of the target policy file and the comparison file of the target policy is extracted, the computer device may sort and summarize the difference information and the label information corresponding to the difference information in the knowledge graph, so as to express the difference information and the label information of the difference information in a text form by using some text templates, and finally generate the comparative analysis suggestion information of the target policy file. The marking information of the difference information in the policy knowledge graph comprises a positive relation and a negative relation, the computer equipment can further judge whether the difference information of the target policy file and the comparison file is positive or negative for the target policy file through the marking information, namely the computer equipment can judge whether the difference between the target policy file and the comparison file is advantage information or disadvantage information for the target policy file through the marking information, and therefore basis is provided for generating comparative analysis suggestion information.
The scheme can be applied to a plurality of scenes, for example, after a certain official functional department issues a policy, a more intuitive mode is to compare the policy file with the same type of policy so as to find out the difference of the policy file compared with other policy files and the places to be improved, wherein the places to be improved are the places where the issued policy file needs to be improved; or the user group aimed at by the policy document wants to analyze the difference, advantage or disadvantage of two or more policy documents, so as to assist the user to make a decision, and the like. In the prior art, the policy file comparison function provided by the related website is only compared from the dimensions of industry, industry and the like, the content in the policy file is not deeply mined, the granularity is coarse, in addition, the policy file comparison function in the prior art cannot provide the function of generating the policy file comparison analysis suggestion, the related user can only find the difference content in the policy file by reading a large number of similar texts, and the efficiency of policy file information processing is greatly reduced.
The method for processing the policy document based on the knowledge graph includes the steps of firstly obtaining a comparison document of a target policy document and the target policy document, then analyzing and processing the target policy document and the comparison document according to the policy knowledge graph to obtain difference information of the target policy document and the comparison document, and finally automatically generating comparative analysis suggestion information of the target policy document according to the difference information and label information of the difference information in the policy knowledge graph. The method automatically processes the target policy file and the comparison file according to the policy knowledge graph, can effectively capture various key information in the target policy file and the comparison file and extract difference information of the key information and the comparison file, improves the efficiency of analyzing and processing the policy file, greatly reduces the workload of a user, and in addition, automatically generates comparison and analysis suggestion information through the labeled information in the knowledge graph, can effectively improve the accuracy of analyzing and processing the policy file, and provides a powerful basis for the comparison and analysis of the policy file.
In addition, the policy document processing method, the policy document processing device, the storage medium and the computer equipment based on the knowledge graph can be applied to the fields of smart cities or smart government affairs and the like, so that the construction of the smart cities and the smart government affairs is promoted, and the life work of urban residents and the performance of government functions are further promoted.
Further, as a refinement and an extension of the specific implementation of the above embodiment, in order to fully illustrate the implementation process of the embodiment, a method for processing a policy document based on a knowledge graph is provided, as shown in fig. 2, the method includes the following steps:
201. and collecting a sample policy file, and establishing a policy knowledge graph according to the sample policy file.
In this embodiment, the method for establishing a policy knowledge map according to a sample policy file may include the following steps: firstly, collecting sample policy files in batches, preprocessing the collected sample policy files, then carrying out entity identification, entity disambiguation and relationship extraction on the preprocessed sample policy files so as to obtain a plurality of triples consisting of policy entities and entity relationships, then obtaining a policy knowledge graph according to the triples consisting of the policy entities and the entity relationships, and finally carrying out information labeling on part of entity relationships in the policy knowledge graph, wherein the labeling information of the entity relationships can comprise positive relationships and negative relationships.
Specifically, the computer device can collect various policy file information in batches on various government websites, policy professional websites and various large household websites through a web crawler technology, and then collect, clean, convert, analyze, summarize and store the collected policy file information in a warehouse. The web crawler technology is a program or script for automatically capturing internet information according to a certain rule, and the efficiency of collecting data can be greatly improved by reasonably applying the web crawler technology. After a large number of policy documents are collected, a policy knowledge graph can be initially constructed by using the policy documents, specifically, the construction process of the knowledge graph mainly comprises three steps of entity identification, entity disambiguation and relationship extraction, and the processing mode of each step is as follows:
specifically, in this embodiment, the policy entity is a basic unit of the policy knowledge graph, and the policy entity can be mainly classified into three categories: entity class, time class and number class. In this embodiment, the policy entity may specifically be a title name of the policy document, a publishing organization, a publishing time, an information source, a keyword in the title, a keyword in the document, a hierarchical structured title in the document, and digital information in the policy document.
Specifically, there are various ways to extract entities from the policy document, including extraction based on preset rules and extraction by machine learning. The entity extraction based on the preset rule can firstly establish a named entity list in advance, and then sequentially extract character information of keywords, key features or key positions from the obtained policy file as policy entities according to entity names in the named entity list; the extraction through the machine learning mode can utilize the pre-labeled corpus to train the language model, so that the language model learns the probability of a certain word or the probability of the word as a component of a named entity, a candidate field is calculated to serve as the probability value of the named entity, and if the probability value of the named entity is larger than a threshold value, the named entity is extracted to serve as a policy entity.
After entity identification, entity disambiguation is performed on the extracted policy entity, and the purpose of entity disambiguation is that for the same entity name, the contents of expression thereof are completely different under different file environments. Specifically, entity disambiguation may employ a clustering-based entity disambiguation method or an entity disambiguation method based on entity links, where the clustering-based entity disambiguation method refers to that a target entity list is not given, entity named items are disambiguated in a clustering manner, all named items pointing to the same target entity are clustered to the same category by a disambiguation system, and each category in a clustering result corresponds to one target entity; the entity disambiguation method based on entity link refers to that a target entity list is given, and entity designation items are linked with corresponding entities in the target entity list to realize disambiguation. And then, pairwise generating candidate entity pairs (the entity surfaces are subjected to Cartesian products) for the extracted entities according to the sequence of the left entity and the right entity to prepare for subsequent relation extraction.
After entity identification and entity disambiguation, the relationships between policy entities can be extracted, wherein relationship extraction refers to identifying semantic relationships between policy entities, and relationship extraction methods include sentence-level relationship extraction, corpus-level relationship extraction, restricted domain relationship extraction, open domain relationship extraction and the like. In this embodiment, there may be various relationships between policy entities, such as classification relationships, proximity relationships, membership relationships, affiliation relationships, attribute relationships, hierarchical relationships, and the like, and there may be various specific relationship names in each relationship, which indicate the relationship between entities.
The classification relation indicates that one entity is a type of another entity, for example, if the policy document a is a notification type document, the policy document a and the notification type document are classified, and the name of the relation is a document type; the close relationship represents that the entities are similar in shape, content and the like, for example, the policy file B and the policy file C are guidance opinion class files issued by local governments and used for promoting the development of small and medium-sized enterprises, the relationship between the policy file B and the policy file C is a similar relationship, and the name of the relationship is a similar file; the membership represents that one entity is a member of the other entity, for example, the policy document D is jointly issued by an office of the department of industry and informatization and an office of the department of civil administration, the relationships between the policy document D and the office of the department of industry and informatization and the office of the department of civil administration are membership relationships, and the relationship name is an issuing organization; the dependency relationship indicates that one entity is a part of another entity, for example, if one content recorded in the policy document E is a delivery process, the relationship between the policy document E and the delivery process is a dependency relationship, and the name of the relationship is important content; the attribute relationship may indicate that one entity has an attribute represented by another entity, for example, if the implementation step of the delivery process recorded in the policy document E is 3 steps, the relationship between the delivery process and the number 3 is the attribute relationship, and the name of the relationship is the implementation step; the hierarchical relationship may represent a hierarchical relationship between one entity and another entity, for example, if the policy file F is a policy file published in city a, the policy file G is a similar policy file published in city B, and the relationship between the policy file F and the policy file G is a hierarchical relationship, and the name of the relationship is a same-level file.
In this embodiment, the knowledge graph can be expressed by various ways, such as semantic network, framework, script, etc., in this embodiment, a semantic network model can be used to express the knowledge graph, specifically, in this embodiment, the policy knowledge graph may be constructed by triples formed by relationships between policy entities and policy entities, where the nodes of the triples are policy entities, the edges of the triples are relationships between policy entities, and the expression manner of the triples is (policy entity 1, entity relationship, policy entity 2), as shown in fig. 3, (XX city further optimizes the carrier environment to deepen the user power access reform implementation scheme (trial operation) [ policy entity ], handle procedure [ entity relationship ], and two items [ digital entities ]) are one triplet. In the present embodiment, the critical information in each policy document is stored in the policy knowledge graph in the form of such triples.
Further, after the policy knowledge graph is initially created, part of relations between the named entities and the digital entities can be labeled, Word2Vec vectors of the labeled relation files are input into a linear regression model to be trained, association relations between the relations and numbers are obtained, a binary model is obtained, information labeling can be carried out on entity relations in the policy knowledge graph, which are associated with the digital entities, through the generated binary model, and therefore whether the numbers in the policy knowledge graph, which are associated with the policy entities through the relations, are positive relations or negative relations is judged, namely whether the larger the digital entities are, the better the policy is, or the smaller the digital entities are, the better the policy is. For example, the partial relationships in the policy graph may be labeled as follows: if the smaller the digital entity is, the more favorable the policy is, the relationship connected to the digital entity is determined to be a negative relationship, and the relationship may be labeled as "0", and if the larger the digital entity is, the more favorable the policy is, the more favorable the relationship connected to the digital entity is, the relationship may be labeled as "1". After the second classification model is built, all relations connected with the digital entities in the policy knowledge graph can be labeled through the relation second classification model.
In this embodiment, as new policy documents are issued successively, the created policy knowledge graph also needs to be updated and refined continuously, specifically, the computer device may periodically record the policy documents issued by each official and functional department, add the key information of the recorded policy documents to the policy knowledge graph, and label the information of the updated policy knowledge graph, thereby obtaining the updated policy knowledge graph.
202. And dividing the policy knowledge graph into a plurality of clusters according to a community discovery algorithm.
Specifically, as shown in fig. 4, for the established policy knowledge graph, the policy knowledge graph may be divided into a plurality of clusters by using a community discovery algorithm, where each cluster includes a plurality of policy files having similar relationships with each other. Specifically, the community discovery refers to some dense groups in the network, the connection between nodes in the same community is relatively tight, the connection between different communities is relatively sparse, and further, the community discovery algorithms are various. The essence of the graph partitioning algorithm is to regard the communities as dense subgraph structures, and divide the nodes in the graph into groups with a preset size, so that the number of edges among the groups is minimum. The clustering algorithm is that assuming that a community has a hierarchical structure, the similarity of each pair of nodes in a network is calculated, then cluster division is carried out by adopting an aggregation method and a splitting method, the aggregation method is that corresponding node pairs are connected from strong to weak according to the similarity to form a Dendrogram (Dendrogram), then the Dendrogram is transversely cut according to requirements to obtain a community structure, the splitting method is that nodes which are associated with each other weakest are found out, edges among the nodes are deleted, the network is divided into smaller and smaller components through repeated operation, and the connected network forms the community. The splitting algorithm is similar to the splitting method in the clustering algorithm, and the difference is that the clustering algorithm does not calculate the node similarity when splitting the network, but directly deletes the associated edges between two communities, and the similarity of two points on the edges is not necessarily very low.
In the embodiment, the purpose of dividing the clusters is to divide the same type of policies (such as operator environment type policies and industry support type policies) into the same cluster. Policies in the same cluster have more same relationships and have more comparative values. This has the advantage of efficiently scoping the effective alignment policy. According to the embodiment, the policy map is divided into the plurality of clusters, and the positive relation and the negative relation are marked in the knowledge map, so that the information in the policy knowledge map can be richer and targeted, and each policy file can be analyzed subsequently, and a more targeted comparative analysis result can be obtained.
Finally, after the construction and labeling of the policy knowledge graph is completed, the policy document knowledge graph may be stored in a database of the computer device and retrieved using the stored data. Specifically, there are two types of table storage manners of the knowledge graph, namely a triple table and a type table, that is, the implementation can store the knowledge graph in a table manner or in a graph manner, and when the knowledge graph is used for information retrieval, information in the policy knowledge graph can be queried through structured query languages such as SQL and SPARQL.
203. And acquiring a target policy file and a comparison file of the target policy file.
Specifically, the computer device may search the database for the relevant policy file according to the search information such as the title or the keyword input by the user, and then the user may select the target policy file and the comparison file of the target policy file from the searched policy files. It should be noted that the target policy document refers to a policy document that is analyzed by the user's intention, and generally, the number of the target policy documents is one, and the reference document of the target policy document is a policy document that is compared or referred to by the user's intention, and the number of the target policy documents may be one or more. In addition, when the comparison file of the target policy file is obtained, the comparison file can be read according to the retrieval information such as titles or keywords, at least one comparison file which is similar to the target policy file can be found through the policy knowledge map, and the computer equipment can find the more accurate policy file which is in the same field as the target policy file and has the content which is more similar to the target policy file through the policy knowledge map. Optionally, after the target policy file and the comparison file of the target policy file are obtained, the computer device may display the file to the user for viewing through an output device such as a liquid crystal screen or a touch screen.
204. And obtaining at least one difference information of the target policy document and the comparison document according to the policy knowledge graph.
Specifically, the computer device may search each policy entity and each entity relationship corresponding to a comparison file of the target policy file and the target policy file in a pre-established policy knowledge graph, then perform comparative analysis on each policy entity and each entity relationship corresponding to the target policy file and the target policy file, thereby finding out a difference policy entity and an entity relationship in the target policy file and the comparison file, and determining a difference triple of the target policy file and the comparison file, and then extract and process the difference triples, so that difference information of the target policy file and the comparison file can be obtained. It should be noted that the target policy document and the comparison document are both pre-stored in the database, and the key information of the target policy document and the comparison document is also pre-stored in the knowledge graph after a series of processing.
In an alternative embodiment, the specific method for obtaining at least one difference information between the target policy document and the comparison document according to the policy knowledge graph may include the following steps: firstly, a first policy entity corresponding to the file title of a target policy file and a second policy entity corresponding to the file title of a comparison file are inquired in a policy knowledge map, then, using a first policy entity and a second policy entity as meta nodes respectively, finding at least one differential policy entity which is not commonly associated with the first policy entity and the second policy entity, i.e. to find a differential policy entity not associated with the first policy entity but with the second policy entity, or finding a differential policy entity which is associated with the first policy entity but not associated with the second policy entity, and finally obtaining a differential triple of the target policy file and the comparison file according to at least one found differential policy entity, and extracting and sorting the difference triplets to obtain at least one difference information.
205. And generating comparative analysis suggestion information of the target policy file according to the at least one difference information and the labeling information of the at least one difference information in the policy knowledge graph.
Specifically, after at least one piece of difference information of the target policy file and the comparison file of the target policy is extracted, the computer device may sort and summarize the difference information and the label information corresponding to the difference information in the knowledge graph, so as to express the difference information and the label information of the difference information in a text form by using some text templates, and finally generate the comparative analysis suggestion information of the target policy file. The marking information of the difference information in the policy knowledge graph comprises a positive relation and a negative relation, the computer equipment can further judge whether the difference information of the target policy file and the comparison file is positive or negative for the target policy file through the marking information, namely the computer equipment can judge whether the difference between the target policy file and the comparison file is advantage information or disadvantage information for the target policy file through the marking information, and therefore basis is provided for generating comparative analysis suggestion information.
In an optional embodiment, a specific method for generating comparative analysis suggestion information of a document to be processed according to at least one difference information and at least one piece of labeled information of the difference information in a policy knowledge graph may include the following steps: and generating suggested information of each difference information through a preset template according to the labeling information of the entity relationship in each difference information, and then sorting and summarizing each difference information and the suggested information of each difference information to obtain comparative analysis suggested information of the file to be processed. In addition, the comparative analysis suggestion information can be displayed to the user through some display modes, so that the user can visually observe the difference and the advantages and the disadvantages between the policy file and the similar policy, and the analysis efficiency of the policy file is improved.
In this embodiment, there are two main ways of processing policy files, the first way is a single policy generation suggestion, that is, comparing one policy file with all policy files of the same type, and generating comparative analysis suggestion information, so as to determine the position of the policy file in the files of the same type; the second mode is a multi-policy comparison production suggestion, namely, a plurality of policy files are transversely compared, so that a more targeted comparison analysis suggestion is provided for a user.
Furthermore, when the policy documents are analyzed and compared, different and identical points among a plurality of policy documents are obtained by using the policy triples in the knowledge graph and the labeling information of the entity relations in the policy triples, and different levels can be set for the suggestion information according to the difference value of the digital entities. For example, if the number of digital entities related to a policy entity of the policy document a by a forward relationship is less than 1 than the number of digital entities related to the policy entity of the policy document B by the forward relationship, it can be determined that the policy entity of the policy document a is slightly inferior to the policy entity B; if the difference between the numbers is more than 5, the situation that the numbers are greatly lagged is judged; if the associated numbers are the same, no suggestion needs to be output. The digital setting corresponding to each level can be changed according to the concrete condition of the entity relationship. As shown in fig. 3, two policy documents are "implementation of further optimizing operator environment in city a" to further improve user power access reform implementation scheme (trial ") and" implementation of further optimizing power access operator environment in city B "(trial), both policy documents are policy documents for optimizing power access operator environment, comparing that the entity relationship" handling procedure "exists in both policy documents, it can be known that the difference between the calculation results of the digital entities corresponding to the entity relationship is 1, and the entity relationship is a negative relationship, and a piece of recommendation information can be obtained that" the policy implementation condition in city B is better than that in city a ".
According to the policy document processing method based on the knowledge graph, a user does not need to search comparison documents similar to a target policy document in the same field manually, and does not need to obtain detailed information in the policy document through reading a large amount.
Further, as a specific implementation of the method shown in fig. 1 to fig. 4, the present embodiment provides a policy document processing apparatus based on a knowledge graph, as shown in fig. 5, the apparatus includes: an information acquisition module 31, an information processing module 32, and an information generation module 33.
An information obtaining module 31, configured to obtain a target policy file and a comparison file of the target policy file;
an information processing module 32, configured to obtain at least one difference information between the target policy document and the comparison document according to the policy knowledge graph;
and the information generating module 33 is configured to generate comparative analysis suggestion information of the target policy file according to the at least one difference information and the label information of the at least one difference information in the policy knowledge graph.
In a specific application scenario, as shown in fig. 6, the apparatus further includes a map creating module 34, where the map creating module 34 is specifically configured to collect sample policy files in batches and preprocess the collected sample policy files; carrying out entity identification, entity disambiguation and relationship extraction on the preprocessed sample policy file to obtain a plurality of triples consisting of policy entities and entity relationships; obtaining a policy knowledge graph according to a plurality of triples consisting of policy entities and entity relations; and carrying out information annotation on part of entity relations in the policy knowledge graph, wherein annotation information of the entity relations comprises positive relations and negative relations.
In a specific application scenario, the model building module 34 may be further configured to perform information labeling on an entity relationship associated with a digital entity in the policy knowledge graph according to a pre-built relationship classification model.
In a specific application scenario, as shown in fig. 6, the apparatus further includes a cluster dividing module 35, where the cluster dividing module 35 is specifically configured to divide the policy knowledge graph into a plurality of clusters according to a community discovery algorithm, where each cluster includes a plurality of policy files having a similar relationship with each other.
In a specific application scenario, the information obtaining module 31 may be specifically configured to read the comparison file of the target policy file according to the header information of the comparison file, or search for at least one comparison file having a similar relationship with the target policy file according to the policy knowledge map.
In a specific application scenario, the information processing module 32 may be specifically configured to query a policy knowledge graph for a first policy entity corresponding to a file title of a target policy file and a second policy entity corresponding to a file title of a comparison file; respectively taking the first policy entity and the second policy entity as meta nodes, and searching at least one difference policy entity which is not commonly associated with the first policy entity and the second policy entity; and obtaining at least one difference information of the target policy document and the comparison document according to at least one difference policy entity.
In a specific application scenario, the information generating module 33 is specifically configured to generate, according to the labeling information of the entity relationship in the at least one difference information, suggestion information of the at least one difference information through a preset template; and obtaining comparative analysis suggestion information of the file to be processed according to the at least one difference information and the suggestion information of the at least one difference information.
It should be noted that other corresponding descriptions of the functional units related to the policy file processing apparatus based on a knowledge graph provided in this embodiment may refer to the corresponding descriptions in fig. 1 to fig. 4, and are not repeated herein.
Based on the above-mentioned methods as shown in fig. 1 to 4, correspondingly, the present embodiment further provides a storage medium, on which a computer program is stored, and the program, when executed by a processor, implements the above-mentioned method for processing a policy document based on a knowledge graph as shown in fig. 1 to 4.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, and the software product to be identified may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, or the like), and include several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the implementation scenarios of the present application.
Based on the method shown in fig. 1 to 4 and the embodiments of the policy document processing apparatus based on a knowledge graph shown in fig. 5 and 6, in order to achieve the above object, the present embodiment further provides an entity device for analyzing and processing a policy document based on a knowledge graph, which may be specifically a personal computer, a server, a smart phone, a tablet computer, a smart watch, or other network devices, and the entity device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the above-described methods as shown in fig. 1-4.
Optionally, the entity device may further include a user interface, a network interface, a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, and the like. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.
Those skilled in the art will appreciate that the structure of the entity device of the knowledge-graph-based policy document analysis process provided in the present embodiment does not constitute a limitation to the entity device, and may include more or less components, or combine some components, or arrange different components.
The storage medium may further include an operating system and a network communication module. The operating system is a program for managing the hardware of the above-mentioned entity device and the software resources to be identified, and supports the operation of the information processing program and other software and/or programs to be identified. The network communication module is used for realizing communication among components in the storage medium and communication with other hardware and software in the information processing entity device.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware. By applying the technical scheme of the application, the target policy file and the comparison file of the target policy file are firstly obtained, then the target policy file and the comparison file are analyzed and processed according to the policy knowledge map to obtain the difference information of the target policy file and the comparison file, and finally the comparative analysis suggestion information of the target policy file is automatically generated according to the difference information and the marking information of the difference information in the policy knowledge map. Compared with the prior art, the method has the advantages that the target policy file and the comparison file are automatically processed according to the policy knowledge graph, various key information in the target policy file and the comparison file can be effectively captured, and difference information between the key information and the difference information can be extracted, so that the analysis and processing efficiency of the policy file is improved, the workload of a user is greatly reduced, in addition, the method automatically generates comparison and analysis suggestion information through the labeled information in the knowledge graph, the analysis and processing accuracy of the policy file can be effectively improved, and a powerful basis is provided for comparison and analysis of the policy file.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.
Claims (10)
1. A method for processing policy documents based on a knowledge graph, the method comprising:
acquiring a target policy file and a comparison file of the target policy file;
obtaining at least one difference information of the target policy document and the comparison document according to a policy knowledge graph;
and generating comparative analysis suggestion information of the target policy file according to the at least one difference information and the labeling information of the at least one difference information in the policy knowledge graph.
2. The method of claim 1, wherein the method for creating the policy knowledge graph comprises:
collecting sample policy files in batches, and preprocessing the collected sample policy files;
carrying out entity identification, entity disambiguation and relationship extraction on the preprocessed sample policy file to obtain a plurality of triples consisting of policy entities and entity relationships;
obtaining the policy knowledge graph according to the multiple triples consisting of the policy entities and the entity relations;
and carrying out information annotation on at least part of entity relations in the policy knowledge graph, wherein the annotation information of the entity relations comprises positive relations and negative relations.
3. The method of claim 2, wherein the annotating information for at least some entity relationships in a policy knowledge graph comprises:
and carrying out information annotation on the entity relation associated with the digital entity in the policy knowledge graph according to a pre-established relation two-classification model.
4. The method of claim 2, further comprising:
and dividing the policy knowledge graph into a plurality of clusters according to a community discovery algorithm, wherein each cluster comprises a plurality of policy files with similar relations with each other.
5. The method according to any of claims 1-4, wherein obtaining the comparison document of the target policy document comprises:
reading the comparison file of the target policy file according to the title information of the comparison file; or
And searching at least one comparison document with similar relation with the target policy document according to the policy knowledge graph.
6. The method according to any one of claims 1-4, wherein obtaining at least one difference between the target policy document and the control document according to the policy knowledge-graph comprises:
inquiring a first policy entity corresponding to the file title of the target policy file and a second policy entity corresponding to the file title of the comparison file in the policy knowledge map;
searching at least one difference policy entity which is not commonly associated with the first policy entity and the second policy entity by respectively using the first policy entity and the second policy entity as meta nodes;
and obtaining at least one difference information of the target policy document and the comparison document according to the at least one difference policy entity.
7. The method according to any one of claims 1 to 4, wherein the generating of the comparative analysis suggestion information of the document to be processed according to the at least one difference information and the label information of the at least one difference information in the policy knowledge graph comprises:
generating suggested information of the at least one difference information through a preset template according to the labeling information of the entity relationship in the at least one difference information;
and obtaining comparative analysis suggestion information of the file to be processed according to the at least one difference information and the suggestion information of the at least one difference information.
8. A knowledge-graph-based policy document processing apparatus, the apparatus comprising:
the information acquisition module is used for acquiring a target policy file and a comparison file of the target policy file;
the information processing module is used for obtaining at least one difference information of the target policy file and the comparison file according to a policy knowledge graph;
and the information generation module is used for generating comparative analysis suggestion information of the target policy file according to the at least one difference information and the labeling information of the at least one difference information in the policy knowledge map.
9. A storage medium having a computer program stored thereon, the computer program, when being executed by a processor, realizing the steps of the method of any one of claims 1 to 7.
10. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 7 when executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011117084.9A CN112214615A (en) | 2020-10-19 | 2020-10-19 | Policy document processing method and device based on knowledge graph and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011117084.9A CN112214615A (en) | 2020-10-19 | 2020-10-19 | Policy document processing method and device based on knowledge graph and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112214615A true CN112214615A (en) | 2021-01-12 |
Family
ID=74055734
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011117084.9A Pending CN112214615A (en) | 2020-10-19 | 2020-10-19 | Policy document processing method and device based on knowledge graph and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112214615A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112800246A (en) * | 2021-04-09 | 2021-05-14 | 北京智源人工智能研究院 | Policy pedigree construction method and device and electronic equipment |
CN117708350A (en) * | 2024-02-06 | 2024-03-15 | 成都草根有智创新科技有限公司 | Enterprise policy information association method and device and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107633005A (en) * | 2017-08-09 | 2018-01-26 | 广州思涵信息科技有限公司 | A kind of knowledge mapping structure, comparison system and method based on class teaching content |
CN109345006A (en) * | 2018-09-12 | 2019-02-15 | 张连祥 | A kind of trade and investment promotion policy analysis optimization method and system based on region development objective |
CN110297912A (en) * | 2019-05-20 | 2019-10-01 | 平安科技(深圳)有限公司 | Cheat recognition methods, device, equipment and computer readable storage medium |
US20200226133A1 (en) * | 2016-10-18 | 2020-07-16 | Hithink Financial Services Inc. | Knowledge map building system and method |
-
2020
- 2020-10-19 CN CN202011117084.9A patent/CN112214615A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200226133A1 (en) * | 2016-10-18 | 2020-07-16 | Hithink Financial Services Inc. | Knowledge map building system and method |
CN107633005A (en) * | 2017-08-09 | 2018-01-26 | 广州思涵信息科技有限公司 | A kind of knowledge mapping structure, comparison system and method based on class teaching content |
CN109345006A (en) * | 2018-09-12 | 2019-02-15 | 张连祥 | A kind of trade and investment promotion policy analysis optimization method and system based on region development objective |
CN110297912A (en) * | 2019-05-20 | 2019-10-01 | 平安科技(深圳)有限公司 | Cheat recognition methods, device, equipment and computer readable storage medium |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112800246A (en) * | 2021-04-09 | 2021-05-14 | 北京智源人工智能研究院 | Policy pedigree construction method and device and electronic equipment |
CN117708350A (en) * | 2024-02-06 | 2024-03-15 | 成都草根有智创新科技有限公司 | Enterprise policy information association method and device and electronic equipment |
CN117708350B (en) * | 2024-02-06 | 2024-05-14 | 成都草根有智创新科技有限公司 | Enterprise policy information association method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mühlroth et al. | A systematic literature review of mining weak signals and trends for corporate foresight | |
US20210397980A1 (en) | Information recommendation method and apparatus, electronic device, and readable storage medium | |
US10146862B2 (en) | Context-based metadata generation and automatic annotation of electronic media in a computer network | |
Kaleel et al. | Cluster-discovery of Twitter messages for event detection and trending | |
Jotheeswaran et al. | OPINION MINING USING DECISION TREE BASED FEATURE SELECTION THROUGH MANHATTAN HIERARCHICAL CLUSTER MEASURE. | |
CN113190687B (en) | Knowledge graph determining method and device, computer equipment and storage medium | |
CN112000773B (en) | Search engine technology-based data association relation mining method and application | |
CN110321446B (en) | Related data recommendation method and device, computer equipment and storage medium | |
Das et al. | A CV parser model using entity extraction process and big data tools | |
WO2016200667A1 (en) | Identifying relationships using information extracted from documents | |
CN107918644A (en) | News subject under discussion analysis method and implementation system in reputation Governance framework | |
García-Plaza et al. | Reorganizing clouds: A study on tag clustering and evaluation | |
CN112214615A (en) | Policy document processing method and device based on knowledge graph and storage medium | |
Noubours et al. | NLP as an essential ingredient of effective OSINT frameworks | |
Dong et al. | Using hybrid algorithmic-crowdsourcing methods for academic knowledge acquisition | |
Chen et al. | EXACT: attributed entity extraction by annotating texts | |
CN110442614B (en) | Metadata searching method and device, electronic equipment and storage medium | |
KR20160120583A (en) | Knowledge Management System and method for data management based on knowledge structure | |
CN116049376B (en) | Method, device and system for retrieving and replying information and creating knowledge | |
Chala et al. | A Framework for Enriching Job Vacancies and Job Descriptions Through Bidirectional Matching. | |
Tsvetovat et al. | NetIntel: A database for manipulation of rich social network data | |
Caraballo et al. | Automatic creation and analysis of a linked data cloud diagram | |
Velkova | Unstructured social media data processing with artificial intelligence | |
Hong | [Retracted] Application of Data Mining in Network Information Dynamic Push Software | |
Song et al. | A dynamic learning framework to thoroughly extract structured data from web pages without human efforts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |