CN112463985A - Government affair map model construction method, device, equipment and computer readable medium - Google Patents

Government affair map model construction method, device, equipment and computer readable medium Download PDF

Info

Publication number
CN112463985A
CN112463985A CN202011409775.6A CN202011409775A CN112463985A CN 112463985 A CN112463985 A CN 112463985A CN 202011409775 A CN202011409775 A CN 202011409775A CN 112463985 A CN112463985 A CN 112463985A
Authority
CN
China
Prior art keywords
data
knowledge
target
government affair
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011409775.6A
Other languages
Chinese (zh)
Other versions
CN112463985B (en
Inventor
邓亮
王晓旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202011409775.6A priority Critical patent/CN112463985B/en
Publication of CN112463985A publication Critical patent/CN112463985A/en
Application granted granted Critical
Publication of CN112463985B publication Critical patent/CN112463985B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to the technical field of knowledge maps, in particular to a government affair map model construction method, device, equipment and computer readable medium. The method comprises the following steps: acquiring target government affair data, wherein the target government affair data are acquired from the Internet government affair field and are used for representing legal person extension information; extracting map knowledge matched with target government data from a preset legal person map knowledge base, wherein the legal person map knowledge base is obtained according to the construction data of the general government knowledge map; and constructing a knowledge graph of the extended information of the legal person on the target government affair data by utilizing the graph knowledge. The method and the device accumulate and precipitate the knowledge generated in the construction process, provide necessary industry knowledge guidance for the construction of the next government legal person expanded information atlas model, and therefore apply the knowledge in the knowledge base to iteration, and enable non-professional technicians to participate in the construction work of the atlas model.

Description

Government affair map model construction method, device, equipment and computer readable medium
Technical Field
The application relates to the technical field of knowledge maps, in particular to a government affair map model construction method, device, equipment and computer readable medium.
Background
The system provides regional industry analysis capability for the government affairs industry, guides local industry development, provides all-dimensional enterprise information query service based on multi-dimensional enterprise big data, and monitors enterprise development situation and risk condition. By constructing the corporate knowledge map of the enterprise, the complex network relationship among enterprises, high governments, jurisdictions, brands, products, regions and industrial chains is deeply excavated. The construction of the corporate knowledge map library of the government affairs industry enterprise mainly focuses on gathering relevant information generated by corporate in social and economic activities.
At present, in the related technology, multiple times of iterative updating are needed in the process of governing the data of the corporate knowledge graph. According to the logic of knowledge acquisition, each iteration basically needs to go through the following 3 processes:
1. information extraction: extracting entities, attributes and interrelations among the entities from various types of data sources, and forming ontology knowledge expression on the basis;
2. and (3) knowledge fusion: after obtaining new knowledge, it needs to be integrated to eliminate contradictions and ambiguities, for example, some entities may have multiple expressions, a certain name may correspond to multiple different entities, etc.;
3. knowledge processing: for the new fused knowledge, after quality evaluation (part of the knowledge needs to be manually screened), the qualified part of the knowledge can be added into the knowledge base to ensure the quality of the knowledge base.
Each iteration requires profound manual screening by an industry expert, and low efficiency is caused by the fact that a large amount of manual examination and verification is required when joint extraction is carried out on structured data, unstructured data and semi-structured data in an organization. And the quality of knowledge extraction is poor, the application in the later stage of the government affair map is greatly hindered due to low data quality, and the knowledge in the knowledge base cannot be applied to iteration.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The application provides a government affair map model construction method, device, equipment and computer readable medium, which are used for solving the technical problem that knowledge in a knowledge base cannot be applied to iteration.
According to an aspect of an embodiment of the present application, there is provided a government affairs map model construction method, including:
acquiring target government affair data, wherein the target government affair data are acquired from the Internet government affair field and are used for representing legal person extension information;
extracting map knowledge matched with target government data from a preset legal person map knowledge base, wherein the legal person map knowledge base is obtained according to the construction data of the general government knowledge map;
and constructing a knowledge graph of the extended information of the legal person on the target government affair data by utilizing the graph knowledge.
Optionally, the obtaining of the target government data comprises at least one of the following ways:
sequentially capturing target government affair data in each page in the first capturing link from a starting page of the first capturing link; under the condition that all the pages of the first grabbing link are grabbed completely and the end condition is not met, sequentially grabbing the target government affair data in all the pages of the second grabbing link from the start page of the second grabbing link continuously until the end condition is met, and stopping grabbing the data;
capturing target government affair data in a current page; and under the condition that the end condition is not met, determining a target link from a plurality of links in the current page, capturing target government affair data in the target page pointed by the target link, and terminating data capturing until the end condition is met.
Optionally, the extracting of the atlas knowledge matched with the target government data from the preset legal atlas knowledge base comprises:
extracting a model identifier of a knowledge graph to be constructed by the target government affair data;
and extracting at least one of a data classification label, a data coding standard and an entity incidence relation matched with the model identification from a preset legal person atlas knowledge base.
Optionally, constructing a knowledge graph of corporate expansion information for the target government data by using graph knowledge includes:
classifying the target government affair data by using a data classification label, wherein the data classification label comprises at least one of industry and commerce information, stockholder information, main personnel information, branch information, annual newspaper information, tax rating, illegal information, judicial assistance information and information of a deceased person;
encoding the classified target government affair data according to a data encoding standard;
associating the coded target government affair data according to the entity association relation;
and constructing a knowledge graph by using the associated target government affair data.
Optionally, constructing a knowledge graph by using the associated target government affairs data, including:
determining a target legal person from the target government affairs data;
extracting a body data set of the target legal person, wherein the data in the body data set is used for expressing at least one of objects, enterprises, social organizations, roads, buildings and internet texts which are related to the target legal person;
and constructing the association edges between the main entity and the sporocarps and between the sporocarps according to the association relationship among the objects, enterprises, social organizations, buildings, roads and Internet texts indicated by the data set of the entity by taking the target law as the main entity and the business field of the target law as the sporocarps.
Optionally, before extracting the atlas knowledge matched with the target government data from the preset corporate atlas knowledge base, the method further comprises determining data classification labels according to at least one of the following manners, and storing the data classification labels in the corporate atlas knowledge base:
acquiring a first reference data set; converting the data in the first reference data set into feature vectors; determining cosine similarity among the feature vectors, and classifying the feature vectors with the cosine similarity smaller than a target threshold value into the same classification data set; determining data classification labels of different classification data sets, and storing the data classification labels and the classification data sets in a legal atlas knowledge base;
acquiring a second reference data set, and storing the second reference data set by using a table structure; performing semantic recognition on a second reference data set of the table structure; classifying according to the recognition result; determining a data classification label of each class; and storing the data classification labels and the recognition results in a corporate map knowledge base.
Optionally, after constructing the knowledge graph of the corporate expansion information for the target government affairs data by using the graph knowledge, the method further comprises:
acquiring verification data;
verifying the knowledge graph of the extended information of the legal person by using verification data;
and when the verification result indicates that the accuracy of the knowledge graph reaches the target threshold value, the verification is passed.
According to another aspect of the embodiments of the present application, there is provided a government affairs map model building device, including:
the system comprises a government affair data acquisition module, a government affair data acquisition module and a government affair management module, wherein the government affair data acquisition module is used for acquiring target government affair data which are acquired from the field of Internet government affairs and used for representing legal person extension information;
the system comprises a map knowledge extraction module, a general government affair knowledge map construction module and a data processing module, wherein the map knowledge extraction module is used for extracting map knowledge matched with target government affair data from a preset legal person map knowledge base, and the legal person map knowledge base is obtained according to construction data of the general government affair knowledge map;
and the knowledge map construction module is used for constructing a knowledge map of the corporate expansion information on the target government affair data by using map knowledge.
Optionally, the government affair data obtaining module includes:
the depth traversal unit is used for sequentially capturing target government affair data in each page in the first capturing link from the initial page of the first capturing link; under the condition that all the pages of the first grabbing link are grabbed completely and the end condition is not met, sequentially grabbing the target government affair data in all the pages of the second grabbing link from the start page of the second grabbing link continuously until the end condition is met, and stopping grabbing the data;
the breadth traversing unit is used for capturing target government affair data in the current page; and under the condition that the end condition is not met, determining a target link from a plurality of links in the current page, capturing target government affair data in the target page pointed by the target link, and terminating data capturing until the end condition is met.
Optionally, the map knowledge extraction module comprises:
the model identification extraction unit is used for extracting the model identification of the knowledge graph to be constructed by the target government affair data;
and the knowledge extraction unit is used for extracting at least one of a data classification label, a data coding standard and an entity incidence relation matched with the model identification from a preset legal atlas knowledge base.
Optionally, the knowledge-graph building module comprises:
the data classification unit is used for classifying the target government affair data by using a data classification label, wherein the data classification label comprises at least one of industry and commerce information, stockholder information, main personnel information, branch agency information, annual newspaper information, tax rating, illegal information, judicial assistance information and distrusted person information;
the data coding unit is used for coding the classified target government affair data according to a data coding standard;
the data association unit is used for associating the coded target government affair data according to the entity association relation;
and the map construction unit is used for constructing a knowledge map by using the associated target government affair data.
Optionally, the map building unit comprises:
the legal person determining subunit is used for determining a target legal person from the target government affairs data;
the associated data extraction subunit is used for extracting a body data set of the target legal person, and the data in the body data set is used for expressing at least one of objects, enterprises, social organizations, roads, buildings and internet texts associated with the target legal person;
and the map construction subunit is used for constructing the main entity and the sporophore and the associated edges between the sporophore and the sporophore according to the associated relationship indicated by the data set of the entity among the objects, enterprises, social organizations, buildings, roads and Internet texts by taking the target legal person as the main entity and the business field of the target legal person as the sporophore.
Optionally, the apparatus further includes a data classification label determination module, including:
a first determination unit for acquiring a first reference data set; converting the data in the first reference data set into feature vectors; determining cosine similarity among the feature vectors, and classifying the feature vectors with the cosine similarity smaller than a target threshold value into the same classification data set; determining data classification labels of different classification data sets, and storing the data classification labels and the classification data sets in a legal atlas knowledge base;
the second determining unit is used for acquiring a second reference data set, wherein the second reference data set is stored by a table structure; performing semantic recognition on a second reference data set of the table structure; classifying according to the recognition result; determining a data classification label of each class; and storing the data classification labels and the recognition results in a corporate map knowledge base.
Optionally, the apparatus further comprises a verification module comprising:
a verification data acquisition unit for acquiring verification data;
the verification unit is used for verifying the knowledge graph of the extended information of the legal person by using the verification data;
and the verification result determining unit is used for passing the verification when the verification result indicates that the accuracy of the knowledge graph reaches the target threshold.
According to another aspect of the embodiments of the present application, there is provided an electronic device, including a memory, a processor, a communication interface, and a communication bus, where the memory stores a computer program executable on the processor, and the memory and the processor communicate with each other through the communication bus and the communication interface, and the processor implements the steps of the method when executing the computer program.
According to another aspect of embodiments of the present application, there is also provided a computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the above-mentioned method.
Compared with the related art, the technical scheme provided by the embodiment of the application has the following advantages:
the technical scheme includes that target government affair data are acquired from the Internet government affair field and used for representing legal person expansion information; extracting map knowledge matched with target government data from a preset legal person map knowledge base, wherein the legal person map knowledge base is obtained according to the construction data of the general government knowledge map; and constructing a knowledge graph of the extended information of the legal person on the target government affair data by utilizing the graph knowledge. The method and the device accumulate and precipitate the knowledge generated in the construction process, provide necessary industry knowledge guidance for the construction of the next government legal person expanded information atlas model, and therefore apply the knowledge in the knowledge base to iteration, and enable non-professional technicians to participate in the construction work of the atlas model.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the technical solutions in the embodiments or related technologies of the present application, the drawings needed to be used in the description of the embodiments or related technologies will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without any creative effort.
FIG. 1 is a hardware environment diagram of an alternative government map model construction method according to an embodiment of the present application;
FIG. 2 is a flow chart of an alternative government map model construction method provided in accordance with an embodiment of the present application;
FIG. 3 is a block diagram of an alternative government map model building apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an alternative electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.
In the related technology, multiple times of iteration updating are needed in the process of governing the data of the corporate knowledge map. According to the logic of knowledge acquisition, each iteration basically needs to go through the following 3 processes:
1. information extraction: extracting entities, attributes and interrelations among the entities from various types of data sources, and forming ontology knowledge expression on the basis;
2. and (3) knowledge fusion: after obtaining new knowledge, it needs to be integrated to eliminate contradictions and ambiguities, for example, some entities may have multiple expressions, a certain name may correspond to multiple different entities, etc.;
3. knowledge processing: for the new fused knowledge, after quality evaluation (part of the knowledge needs to be manually screened), the qualified part of the knowledge can be added into the knowledge base to ensure the quality of the knowledge base.
Each iteration requires profound manual screening by an industry expert, and low efficiency is caused by the fact that a large amount of manual examination and verification is required when joint extraction is carried out on structured data, unstructured data and semi-structured data in an organization. And the quality of knowledge extraction is poor, the application in the later stage of the government affair map is greatly hindered due to low data quality, and the knowledge in the knowledge base cannot be applied to iteration.
To solve the problems mentioned in the background, according to an aspect of the embodiments of the present application, an embodiment of a government map model construction method is provided.
Alternatively, in the embodiment of the present application, the above-described government affair map model building method may be applied to a hardware environment formed by the terminal 101 and the server 103 as shown in fig. 1. As shown in fig. 1, a server 103 is connected to a terminal 101 through a network, which may be used to provide services for the terminal or a client installed on the terminal, and a database 105 may be provided on the server or separately from the server, and is used to provide data storage services for the server 103, and the network includes but is not limited to: wide area network, metropolitan area network, or local area network, and the terminal 101 includes but is not limited to a PC, a cell phone, a tablet computer, and the like.
A method for building a government affair map model in the embodiment of the present application may be executed by the server 103, or may be executed by both the server 103 and the terminal 101, as shown in fig. 2, the method may include the following steps:
step S202, target government affair data are obtained, wherein the target government affair data are collected from the Internet government affair field and are used for representing legal person expansion information.
In this embodiment, the internet in the government field includes government departments, statistical offices and various large sites related to enterprise operations, and the target government data is enterprise business expansion information surrounding a legal person (represented by a unified social credit code), and includes business information, stockholder information, main persons, branches, annual newspaper information, tax rating, serious violation, judicial assistance, distressed persons, public announcement, referee documents, court announcement, executed persons, censorship checks, environmental protection punishment, administrative penalty, engineering exception, operation exception, mortgage of labor, judicial auction, equity discharge, underpinning information, patents, trademark information, external investment, bid, website filing, copyright, financing history, administrative license, certificate of quality, software copyright, import and export credit, recruitment, and the like. The data provides underlying data support for information query, identity check and other shared services. The extended information of the legal person reflects the state attributes of the legal person in different life cycles, and the frequency of the change of the extended information of the legal person is higher due to the uncertainty of the business activities of the legal person.
In the embodiment of the present application, a legal person (unified social credit code) may be used as a core, and the "one-code association" of the legal person information may be realized by associating the legal person extension information such as the enterprise registration information, the enterprise change registration information, the enterprise yearbook information, and the tax registration information with the unified social credit code.
Optionally, the obtaining of the target government data comprises at least one of the following ways:
sequentially capturing target government affair data in each page in the first capturing link from a starting page of the first capturing link; under the condition that all the pages of the first grabbing link are grabbed completely and the end condition is not met, sequentially grabbing the target government affair data in all the pages of the second grabbing link from the start page of the second grabbing link continuously until the end condition is met, and stopping grabbing the data;
capturing target government affair data in a current page; and under the condition that the end condition is not met, determining a target link from a plurality of links in the current page, capturing target government affair data in the target page pointed by the target link, and terminating data capturing until the end condition is met.
In the embodiment of the application, the data can be acquired by performing depth traversal according to a depth-first traversal strategy, and the data can be acquired by performing breadth traversal according to a breadth-first traversal strategy.
The depth-first traversal strategy is to select one link from a plurality of capturing links, namely a first link, then a web crawler starts from a starting page of the first link, tracks the links one by one along the first link, transfers the links into a second link after processing the first link, and performs data crawling one by one along the second link from the starting page in the second link again, wherein in the process of capturing data, if an ending condition is met, the crawling of the data is stopped, and the ending condition can be determined according to the data quantity to be obtained and can also be determined according to the number of the links.
The breadth-first traversal strategy is to directly insert the link found in the newly downloaded webpage into the tail of the address queue to be grabbed. That is, the web crawler may first crawl all web pages linked in the current web page, then select one of the linked web pages, and continue to crawl all web pages linked in the web page. Similarly, crawling data is stopped when an end condition is met, which can be determined according to the amount of data to be acquired and also according to the lateral spread of the link.
Optionally, in this embodiment of the application, a data warehouse technology may also be used to align real-time data and non-real-time data, so as to achieve full-scale and incremental data acquisition, and the data warehouse technology is mature and is not described herein again.
And step S204, extracting the atlas knowledge matched with the target government affair data from a preset legal atlas knowledge base, wherein the legal atlas knowledge base is obtained according to the construction data of the general government affair knowledge atlas.
In the embodiment of the application, the knowledge graph can be constructed or iterated by utilizing the graph knowledge in the legal graph knowledge base. The map knowledge is obtained according to the accumulated experience of the data, the method, the relation, the standard and the like involved in the construction process of the general government affair knowledge map.
Optionally, the extracting of the atlas knowledge matched with the target government data from the preset legal atlas knowledge base comprises:
extracting a model identifier of a knowledge graph to be constructed by the target government affair data;
and extracting at least one of a data classification label, a data coding standard and an entity incidence relation matched with the model identification from a preset legal person atlas knowledge base.
In the embodiment of the application, the knowledge graph to be constructed can be determined in advance before the knowledge graph is constructed, so that data acquisition is carried out, the acquired data is marked with the model identification, and the data classification label, the data coding standard and the entity incidence relation matched with the model identification can be extracted from the legal person graph knowledge base.
Optionally, before extracting the atlas knowledge matched with the target government data from the preset corporate atlas knowledge base, the method further comprises determining data classification labels according to at least one of the following manners, and storing the data classification labels in the corporate atlas knowledge base:
acquiring a first reference data set; converting the data in the first reference data set into feature vectors; determining cosine similarity among the feature vectors, and classifying the feature vectors with the cosine similarity smaller than a target threshold value into the same classification data set; determining data classification labels of different classification data sets, and storing the data classification labels and the classification data sets in a legal atlas knowledge base;
acquiring a second reference data set, and storing the second reference data set by using a table structure; performing semantic recognition on a second reference data set of the table structure; classifying according to the recognition result; determining a data classification label of each class; and storing the data classification labels and the recognition results in a corporate map knowledge base.
In the embodiment of the application, the first reference data set is an unstructured stored data set, the content of unstructured data can be identified through a natural language processing technology, then the data is classified, and finally a corresponding class label is marked. Specifically, the data in the first reference data set may be converted into feature vectors in an Embedding manner or a Word2Vector manner, and the data is classified by calculating cosine similarity between the feature vectors, so that the feature vectors with the cosine similarity within a target threshold are classified into the same classification data set, and finally, a data classification label is attached to each classification data set. The second reference data set is a data set which is stored in a structured mode, the table structure can be analyzed, the table nouns, the column names and the example data are subjected to semantic analysis, and corresponding category labels are marked after the semantic analysis. After the data classification tags are determined, the data classification tags and corresponding data sets are saved in a corporate atlas knowledge base.
In the embodiment of the application, the finally obtained data classification labels comprise business information, stockholder information, main personnel information, branch information, annual newspaper information, tax ratings, illegal information, judicial assistance information, information of a deceased person and the like.
The data encoding standard can be realized by data cleaning, data integration and data reduction. Data cleaning is used for filling vacancy values, identifying isolated points, eliminating noise and correcting data inconsistency. Data integration may be used to integrate data from different data sources into a consistent data store, for example, by converting to metadata, correlation analysis, data collision detection, and semantic heterogeneity analysis. Data reduction is a technique for data processing, such as data cube aggregation, dimension reduction, data compression, numerical reduction, and discretization, can all be used to derive a reduced representation of data with minimal loss of information content.
By unifying the data coding standard, the acquired data can be unified in format and data identification.
The entity association relationship can be divided into three aspects of entity, attribute and relationship. At the entity level, the main entity types in the extended information map model of the legal person can be determined: a juridical person; on the attribute level, the attributes of the legal person comprise business information, yearbook information, tax information and the like; the relationship layer mainly comprises investors/stockholders, legal representatives, high management, branches and logic relationships among the investors/stockholders, the legal representatives, the high management and the branches.
And step S206, constructing a knowledge graph of the corporate expansion information for the target government affair data by utilizing the graph knowledge.
In the embodiment of the application, the association edges between the entities can be constructed through the relations of the relation level, so that the knowledge graph is formed, and the information of the attribute level is stored in the entity nodes of the knowledge graph, so that the construction of the knowledge graph of the extended information of the legal person is completed.
Optionally, constructing a knowledge graph of corporate expansion information for the target government data by using graph knowledge includes:
classifying the target government affair data by using a data classification label, wherein the data classification label comprises at least one of industry and commerce information, stockholder information, main personnel information, branch information, annual newspaper information, tax rating, illegal information, judicial assistance information and information of a deceased person;
encoding the classified target government affair data according to a data encoding standard;
associating the coded target government affair data according to the entity association relation;
and constructing a knowledge graph by using the associated target government affair data.
In the embodiment of the application, the target government affair data can be classified according to the data classification labels stored in the corporate map knowledge base, then the data are coded according to the data coding standard so as to eliminate the problems of non-uniform format and incorrect data, finally the complex network relationship among enterprises, high governance, legal persons, brands, products, regions and industrial chains indicated in the target government affair data is determined according to the entity association relationship, finally the corporate creates an entity, and an association edge is created according to the association relationship, so that the knowledge map is constructed.
Optionally, constructing a knowledge graph by using the associated target government affairs data, including:
determining a target legal person from the target government affairs data;
extracting a body data set of the target legal person, wherein the data in the body data set is used for expressing at least one of objects, enterprises, social organizations, roads, buildings and internet texts which are related to the target legal person;
and constructing the association edges between the main entity and the sporocarps and between the sporocarps according to the association relationship among the objects, enterprises, social organizations, buildings, roads and Internet texts indicated by the data set of the entity by taking the target law as the main entity and the business field of the target law as the sporocarps.
In the embodiment of the application, as the bodies (including entities such as people, enterprises, social organizations, roads, buildings and the like, and also including events, related texts and multimedia occurring in cities) of all the root types of the cities are numerous, only the number of the entity types can reach tens of thousands, the implementation is carried out in steps according to the knowledge graph, the idea of quick iteration is adopted, a layered domain division mode is adopted, firstly, a target legal person is determined as a main entity, the sporocarp is determined according to the service use condition of the target legal person, and then, the main entity and the sporocarp as well as the associated edges between the sporocarp and the sporocarp are added, so that the complete knowledge graph of the expanded information of the legal person is formed.
Optionally, after constructing the knowledge graph of the corporate expansion information for the target government affairs data by using the graph knowledge, the method further comprises:
acquiring verification data;
verifying the knowledge graph of the extended information of the legal person by using verification data;
and when the verification result indicates that the accuracy of the knowledge graph reaches the target threshold value, the verification is passed.
In the embodiment of the application, a model designer needs to continuously check the data content to verify the service. The data content is inconsistent with the actual service due to the reasons of too long time, lack of maintenance and the like, and is limited by various conditions such as personnel experience, service system personnel experience and the like, more missed places exist, and in the model design stage, continuous verification is needed to find problems and update related results.
By adopting the technical scheme, the knowledge generated in the construction process can be accumulated and precipitated, and necessary industry knowledge guidance is provided for the construction of the next government legal person expanded information map model, so that the knowledge in the knowledge base is applied to iteration, and non-professional technicians can participate in the construction work of the map model.
According to still another aspect of the embodiments of the present application, as shown in fig. 3, there is provided a government affairs map model building apparatus including:
the government affair data acquisition module 301 is used for acquiring target government affair data, wherein the target government affair data are acquired from the internet government affair field and used for representing legal person extension information;
the map knowledge extraction module 303 is used for extracting map knowledge matched with the target government affair data from a preset legal person map knowledge base, wherein the legal person map knowledge base is obtained according to the construction data of the general government affair knowledge map;
and the knowledge graph construction module 305 is used for constructing a knowledge graph of the corporate expansion information on the target government affairs data by using graph knowledge.
It should be noted that the government affairs data acquiring module 301 in this embodiment may be configured to execute step S202 in this embodiment, the map knowledge extracting module 303 in this embodiment may be configured to execute step S204 in this embodiment, and the knowledge map constructing module 305 in this embodiment may be configured to execute step S206 in this embodiment.
It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may operate in a hardware environment as shown in fig. 1, and may be implemented by software or hardware.
Optionally, the government affair data obtaining module includes:
the depth traversal unit is used for sequentially capturing target government affair data in each page in the first capturing link from the initial page of the first capturing link; under the condition that all the pages of the first grabbing link are grabbed completely and the end condition is not met, sequentially grabbing the target government affair data in all the pages of the second grabbing link from the start page of the second grabbing link continuously until the end condition is met, and stopping grabbing the data;
the breadth traversing unit is used for capturing target government affair data in the current page; and under the condition that the end condition is not met, determining a target link from a plurality of links in the current page, capturing target government affair data in the target page pointed by the target link, and terminating data capturing until the end condition is met.
Optionally, the map knowledge extraction module comprises:
the model identification extraction unit is used for extracting the model identification of the knowledge graph to be constructed by the target government affair data;
and the knowledge extraction unit is used for extracting at least one of a data classification label, a data coding standard and an entity incidence relation matched with the model identification from a preset legal atlas knowledge base.
Optionally, the knowledge-graph building module comprises:
the data classification unit is used for classifying the target government affair data by using a data classification label, wherein the data classification label comprises at least one of industry and commerce information, stockholder information, main personnel information, branch agency information, annual newspaper information, tax rating, illegal information, judicial assistance information and distrusted person information;
the data coding unit is used for coding the classified target government affair data according to a data coding standard;
the data association unit is used for associating the coded target government affair data according to the entity association relation;
and the map construction unit is used for constructing a knowledge map by using the associated target government affair data.
Optionally, the map building unit comprises:
the legal person determining subunit is used for determining a target legal person from the target government affairs data;
the associated data extraction subunit is used for extracting a body data set of the target legal person, and the data in the body data set is used for expressing at least one of objects, enterprises, social organizations, roads, buildings and internet texts associated with the target legal person;
and the map construction subunit is used for constructing the main entity and the sporophore and the associated edges between the sporophore and the sporophore according to the associated relationship indicated by the data set of the entity among the objects, enterprises, social organizations, buildings, roads and Internet texts by taking the target legal person as the main entity and the business field of the target legal person as the sporophore.
Optionally, the apparatus further includes a data classification label determination module, including:
a first determination unit for acquiring a first reference data set; converting the data in the first reference data set into feature vectors; determining cosine similarity among the feature vectors, and classifying the feature vectors with the cosine similarity smaller than a target threshold value into the same classification data set; determining data classification labels of different classification data sets, and storing the data classification labels and the classification data sets in a legal atlas knowledge base;
the second determining unit is used for acquiring a second reference data set, wherein the second reference data set is stored by a table structure; performing semantic recognition on a second reference data set of the table structure; classifying according to the recognition result; determining a data classification label of each class; and storing the data classification labels and the recognition results in a corporate map knowledge base.
Optionally, the apparatus further comprises a verification module comprising:
a verification data acquisition unit for acquiring verification data;
the verification unit is used for verifying the knowledge graph of the extended information of the legal person by using the verification data;
and the verification result determining unit is used for passing the verification when the verification result indicates that the accuracy of the knowledge graph reaches the target threshold.
According to another aspect of the embodiments of the present application, there is provided an electronic device, as shown in fig. 4, including a memory 401, a processor 403, a communication interface 405, and a communication bus 407, where the memory 401 stores a computer program that is executable on the processor 403, the memory 401 and the processor 403 communicate with each other through the communication interface 405 and the communication bus 407, and the processor 403 implements the steps of the method when executing the computer program.
The memory and the processor in the electronic equipment are communicated with the communication interface through a communication bus. The communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
There is also provided, in accordance with yet another aspect of an embodiment of the present application, a computer-readable medium having non-volatile program code executable by a processor.
Optionally, in an embodiment of the present application, a computer readable medium is configured to store program code for the processor to perform the following steps:
acquiring target government affair data, wherein the target government affair data are acquired from the Internet government affair field and are used for representing legal person extension information;
extracting map knowledge matched with target government data from a preset legal person map knowledge base, wherein the legal person map knowledge base is obtained according to the construction data of the general government knowledge map;
and constructing a knowledge graph of the extended information of the legal person on the target government affair data by utilizing the graph knowledge.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.
When the embodiments of the present application are specifically implemented, reference may be made to the above embodiments, and corresponding technical effects are achieved.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk. It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A government affair map model building method is characterized by comprising the following steps:
acquiring target government affair data, wherein the target government affair data are acquired from the Internet government affair field and are used for representing legal person extension information;
extracting map knowledge matched with the target government affair data from a preset legal person map knowledge base, wherein the legal person map knowledge base is obtained according to the construction data of the general government affair knowledge map;
and constructing a knowledge graph of the extended information of the legal person for the target government affair data by using the graph knowledge.
2. The method of claim 1, wherein obtaining target government data comprises at least one of:
sequentially capturing the target government affair data in each page in a first capturing link from a starting page of the first capturing link; under the condition that all the pages of the first grabbing link are grabbed completely and the end condition is not met, sequentially grabbing the target government affair data in each page of the second grabbing link from the start page of the second grabbing link continuously until the end condition is met, and stopping grabbing the data;
capturing the target government affair data in the current page; and under the condition that the ending condition is not met, determining a target link from a plurality of links in the current page, grabbing the target government affair data in a target page pointed by the target link, and terminating grabbing the data when the ending condition is met.
3. The method of claim 1, wherein extracting profile knowledge from a pre-set corporate profile knowledge base that matches the target government data comprises:
extracting a model identifier of the knowledge graph to be constructed by the target government affair data;
and extracting at least one of a data classification label, a data coding standard and an entity incidence relation matched with the model identification from the preset legal person atlas knowledge base.
4. The method of claim 3, wherein constructing a knowledge-graph of the corporate extension information for the target government data using the graph knowledge comprises:
classifying the target government affair data by using the data classification label, wherein the data classification label comprises at least one of industry and commerce information, stockholder information, main personnel information, branch information, annual newspaper information, tax rating, illegal information, judicial assistance information and information of a deceased person;
encoding the classified target government affair data according to the data encoding standard;
associating the encoded target government affair data according to the entity association relation;
and constructing the knowledge graph by using the associated target government affair data.
5. The method according to claim 4, wherein constructing the knowledge-graph using the correlated target government data comprises:
determining a target legal person from the target government data;
extracting a body data set of the target legal person, wherein data in the body data set is used for representing at least one of objects, enterprises, social organizations, roads, buildings and internet texts which are related to the target legal person;
and constructing association edges between the main entity and the sporocarp and between the sporocarp and the sporocarp according to the association relation among the objects, enterprises, social organizations, buildings, roads and Internet texts indicated by the data set of the entity by taking the target legal person as the main entity and the business field where the target legal person is located as the sporocarp.
6. The method according to any one of claims 3 to 5, wherein prior to extracting profile knowledge matching the target government data from a pre-set corporate profile knowledge base, the method further comprises determining the data classification tags and storing the data classification tags in the corporate profile knowledge base in at least one of the following ways:
acquiring a first reference data set; converting data in the first reference data set into feature vectors; determining cosine similarity among the feature vectors, and classifying the feature vectors with the cosine similarity smaller than a target threshold value into the same classification data set; determining the data classification labels of different classification data sets, and storing the data classification labels and the classification data sets in the legal atlas knowledge base;
acquiring a second reference data set, wherein the second reference data set is stored by a table structure; performing semantic recognition on the second reference dataset of the table structure; classifying according to the recognition result; determining the data classification label for each class; and storing the data classification labels and the recognition results in the legal person atlas knowledge base.
7. The method according to any one of claims 1 to 5, wherein after constructing the knowledge-graph of the corporate extended information for the target government data using the graph knowledge, the method further comprises:
acquiring verification data;
verifying the knowledge graph of the extended information of the legal person by using the verification data;
and when the verification result indicates that the accuracy of the knowledge graph reaches a target threshold value, the verification is passed.
8. A government affairs map model building device is characterized by comprising:
the system comprises a government affair data acquisition module, a government affair data acquisition module and a government affair management module, wherein the government affair data acquisition module is used for acquiring target government affair data, and the target government affair data is acquired from the field of Internet government affairs and is used for representing legal person extension information;
the system comprises a map knowledge extraction module, a target government affair data acquisition module and a data processing module, wherein the map knowledge extraction module is used for extracting map knowledge matched with the target government affair data from a preset legal person map knowledge base, and the legal person map knowledge base is obtained according to the construction data of a general government affair knowledge map;
and the knowledge graph construction module is used for constructing the knowledge graph of the corporate expansion information on the target government affair data by using the graph knowledge.
9. An electronic device comprising a memory, a processor, a communication interface and a communication bus, wherein the memory stores a computer program operable on the processor, and the memory and the processor communicate via the communication bus and the communication interface, wherein the processor implements the steps of the method according to any of the claims 1 to 7 when executing the computer program.
10. A computer-readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to perform the method of any of claims 1 to 7.
CN202011409775.6A 2020-12-04 2020-12-04 Government map model construction method, government map model construction device, government map model construction equipment and computer readable medium Active CN112463985B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011409775.6A CN112463985B (en) 2020-12-04 2020-12-04 Government map model construction method, government map model construction device, government map model construction equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011409775.6A CN112463985B (en) 2020-12-04 2020-12-04 Government map model construction method, government map model construction device, government map model construction equipment and computer readable medium

Publications (2)

Publication Number Publication Date
CN112463985A true CN112463985A (en) 2021-03-09
CN112463985B CN112463985B (en) 2024-07-19

Family

ID=74805846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011409775.6A Active CN112463985B (en) 2020-12-04 2020-12-04 Government map model construction method, government map model construction device, government map model construction equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN112463985B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114372125A (en) * 2021-12-03 2022-04-19 北京北明数科信息技术有限公司 Government affair knowledge base construction method, system, equipment and medium based on knowledge graph
CN117151429A (en) * 2023-10-27 2023-12-01 中电科大数据研究院有限公司 Government service flow arranging method and device based on knowledge graph

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241062A (en) * 2018-09-27 2019-01-18 国信优易数据有限公司 A kind of generation method and device of government data catalogue
CN110347894A (en) * 2019-05-31 2019-10-18 平安科技(深圳)有限公司 Knowledge mapping processing method, device, computer equipment and storage medium based on crawler
CN111078897A (en) * 2019-12-26 2020-04-28 国衡智慧城市科技研究院(北京)有限公司 System for generating six-dimensional knowledge map
CN111183421A (en) * 2017-10-06 2020-05-19 株式会社东芝 Service providing system, business analysis support system, method, and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111183421A (en) * 2017-10-06 2020-05-19 株式会社东芝 Service providing system, business analysis support system, method, and program
CN109241062A (en) * 2018-09-27 2019-01-18 国信优易数据有限公司 A kind of generation method and device of government data catalogue
CN110347894A (en) * 2019-05-31 2019-10-18 平安科技(深圳)有限公司 Knowledge mapping processing method, device, computer equipment and storage medium based on crawler
CN111078897A (en) * 2019-12-26 2020-04-28 国衡智慧城市科技研究院(北京)有限公司 System for generating six-dimensional knowledge map

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114372125A (en) * 2021-12-03 2022-04-19 北京北明数科信息技术有限公司 Government affair knowledge base construction method, system, equipment and medium based on knowledge graph
CN117151429A (en) * 2023-10-27 2023-12-01 中电科大数据研究院有限公司 Government service flow arranging method and device based on knowledge graph
CN117151429B (en) * 2023-10-27 2024-01-26 中电科大数据研究院有限公司 Government service flow arranging method and device based on knowledge graph

Also Published As

Publication number Publication date
CN112463985B (en) 2024-07-19

Similar Documents

Publication Publication Date Title
CN112182246B (en) Method, system, medium, and application for creating an enterprise representation through big data analysis
CN106682150B (en) Information processing method and device
CN110597870A (en) Enterprise relation mining method
CN111899089A (en) Enterprise risk early warning method and system based on knowledge graph
CN109325019B (en) Data association relationship network construction method
CN112445875B (en) Data association and verification method and device, electronic equipment and storage medium
CN111125343A (en) Text analysis method and device suitable for human-sentry matching recommendation system
CN112036842B (en) Intelligent matching device for scientific and technological service
CN112463985A (en) Government affair map model construction method, device, equipment and computer readable medium
CN110288451B (en) Financial reimbursement method, system, equipment and storage medium
CN111709714A (en) Method and device for predicting lost personnel based on artificial intelligence
CN115545671A (en) Method and system for structured processing of laws and regulations
CN113220875A (en) Internet information classification method and system based on industry label and electronic equipment
CN111581447A (en) Judgment text and book evaluation method
CN113536070A (en) Address resolution method, system, computer equipment and storage medium
CN113722617A (en) Method and device for identifying actual office address of enterprise and electronic equipment
CN115080698A (en) Bidding analysis method, system, equipment and storage medium based on big data
Rana et al. Road accident prediction using machine learning algorithm
CN111782803B (en) Work order processing method and device, electronic equipment and storage medium
Tao et al. A traffic accident morphology diagnostic model based on a rough set decision tree
CN112036841A (en) Policy analysis system and method based on intelligent semantic recognition
CN107329956B (en) Project information standardization method and device
Haensch et al. A Multi-Method Data Science Pipeline for Analyzing Police Service
CN115526500A (en) Benefit-administration information pushing method, benefit-administration information pushing device, benefit-administration information pushing equipment, benefit-administration information pushing medium and program product
CN112115271B (en) Knowledge graph construction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant