CN112463984A - Database mode expansion method, device, equipment and computer readable medium - Google Patents

Database mode expansion method, device, equipment and computer readable medium Download PDF

Info

Publication number
CN112463984A
CN112463984A CN202011408081.0A CN202011408081A CN112463984A CN 112463984 A CN112463984 A CN 112463984A CN 202011408081 A CN202011408081 A CN 202011408081A CN 112463984 A CN112463984 A CN 112463984A
Authority
CN
China
Prior art keywords
data
target
database
service domain
domain boundary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011408081.0A
Other languages
Chinese (zh)
Other versions
CN112463984B (en
Inventor
邓亮
王晓旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202011408081.0A priority Critical patent/CN112463984B/en
Publication of CN112463984A publication Critical patent/CN112463984A/en
Application granted granted Critical
Publication of CN112463984B publication Critical patent/CN112463984B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to the field of knowledge graph technology, and in particular, to a method, an apparatus, a device, and a computer readable medium for database schema expansion. The method comprises the following steps: extracting a first service domain boundary parameter of a first database mode from a knowledge base; adjusting the boundary parameter of the first service domain by using the reference data set to obtain a boundary parameter of a second service domain, wherein the data in the reference data set is service data in a target application range; adding a target data source in a target range indicated by the second service domain boundary parameter; and expanding the first database mode by using the data acquired from the target data source to obtain a second database mode. According to the method and the device, knowledge in the process of constructing the schema (database mode) is accumulated in the knowledge base, so that the knowledge of early precipitation can be reused when the schema is expanded, the efficiency of schema expansion is improved, non-professional technicians can participate in the work of expanding the schema, and the cost investment of the personnel is reduced.

Description

Database mode expansion method, device, equipment and computer readable medium
Technical Field
The present application relates to the field of knowledge graph technology, and in particular, to a method, an apparatus, a device, and a computer readable medium for database schema expansion.
Background
By constructing an intelligent knowledge extraction engine in the government affair industry, various government data are gathered and fused, entity types are extracted, mutual relations are established according to attribute relations, semantic relations, characteristic relations and the like in the entity types, a government affair knowledge map is formed, basic data support and service are provided for various government affair service applications, macroscopic decisions and the like, and government administration capacity is improved. Wherein the schema of the government affairs knowledge graph provides data support for the knowledge graph.
At present, in the related technology, schema data acquisition of a knowledge graph adopts manual entry, and data are scattered in each service database, so that a knowledge system which can be repeatedly utilized cannot be formed, and unified management also becomes a difficult problem.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The application provides a database mode expansion method, a database mode expansion device, a database mode expansion equipment and a computer readable medium, and aims to solve the technical problems that experience in the early stage cannot be reused and expansion efficiency is low.
According to an aspect of an embodiment of the present application, there is provided a database schema extension method, including:
extracting a first service domain boundary parameter of a first database mode from a knowledge base, wherein the knowledge base is obtained according to a construction parameter of the first database mode, the first database mode is used for constructing a government affair map, and the first service domain boundary parameter is used for determining a target application range of the government affair map;
adjusting the boundary parameter of the first service domain by using the reference data set to obtain a boundary parameter of a second service domain, wherein the data in the reference data set is service data in a target application range;
adding a target data source in a target range indicated by the second service domain boundary parameter;
and expanding the first database mode by using the data acquired from the target data source to obtain a second database mode.
Optionally, before extracting the first service domain boundary parameter of the first database schema from the knowledge base, the method further includes constructing the first database schema as follows:
determining a first service domain boundary parameter according to the application range of a government affair map to be constructed;
determining a first data source in a first range indicated by the first service domain boundary parameter;
collecting data of a first data source;
extracting target data which indicates entities, events and incidence relations among the entities and the events from the data of the first data source;
dividing the target data into a plurality of association pairs according to the association form of the entities and the events, wherein the entities and the events in each association pair are combined according to the association between the entities and the events;
the plurality of associated pairs are merged into a first database schema.
Optionally, after obtaining the first database schema, the method further includes constructing a knowledge base as follows:
and storing the first service domain boundary parameter, the first data source and the first database mode in a database to obtain a knowledge base.
Optionally, after obtaining the first database schema, constructing the knowledge base further includes:
determining a first target word with the occurrence frequency larger than or equal to a frequency threshold value in a first database mode;
deleting the interference words from the first target words to obtain second target words;
and storing the second target word as a domain high-frequency characteristic word in a knowledge base.
Optionally, the expanding the first database schema by using the data acquired from the target data source to obtain the second database schema includes:
collecting first target data of a target data source;
extracting second target data which indicates entities, events and incidence relations among the entities and the events from the first target data;
dividing the second target data into a plurality of association pairs according to the association form of the entities and the events, wherein the entities and the events in each association pair are combined according to the association relationship between the entities and the events;
and combining the plurality of association pairs into the first database mode to obtain a second database mode.
Optionally, acquiring the first target data of the target data source includes at least one of the following ways:
sequentially capturing first target data in each page in the first capturing link from a starting page of the first capturing link; under the condition that all the pages of the first grabbing link are grabbed completely and the end condition is not met, sequentially grabbing the first target data in all the pages of the second grabbing link from the start page of the second grabbing link continuously until the end condition is met, and stopping grabbing the data;
capturing first target data in a current page; and under the condition that the end condition is not met, determining a target link from a plurality of links in the current page, grabbing first target data in a target page pointed by the target link, and terminating grabbing the data until the end condition is met.
According to another aspect of the embodiments of the present application, there is provided a database schema extension apparatus, including:
the initial parameter extraction module is used for extracting a first service domain boundary parameter of a first database mode from a knowledge base, the knowledge base is obtained according to the construction parameter of the first database mode, the first database mode is used for constructing a government affair map, and the first service domain boundary parameter is used for determining the target application range of the government affair map;
the parameter correction module is used for adjusting the boundary parameter of the first service domain by using the reference data set to obtain the boundary parameter of the second service domain, and the data in the reference data set is service data in a target application range;
the extended data source module is used for increasing a target data source in a target range indicated by the boundary parameter of the second service domain;
and the data set expansion module is used for expanding the first database mode by using the data acquired from the target data source to acquire a second database mode.
Optionally, the apparatus further comprises an initial data set building module, comprising:
the initial parameter determining unit is used for determining a first service domain boundary parameter according to the application range of the government affair map to be constructed;
an initial data source determining unit, configured to determine a first data source within a first range indicated by the first service domain boundary parameter;
the first acquisition unit is used for acquiring data of a first data source;
the first extraction unit is used for extracting target data which indicates entities, events and incidence relations among the entities and the events from the data of the first data source;
the first dividing unit is used for dividing the target data into a plurality of association pairs according to the association form of the entity-event, and the entity-event in each association pair is combined according to the association relationship between the entity and the event;
a first merging unit for merging the plurality of associated pairs into a first database schema.
Optionally, the apparatus further includes a knowledge base building module, including:
and the knowledge base construction unit is used for storing the first service domain boundary parameter, the first data source and the first database mode in a database to obtain a knowledge base.
Optionally, the knowledge base building module further includes:
the high-frequency word determining unit is used for determining a first target word of which the occurrence frequency is greater than or equal to a frequency threshold in the first database mode;
the interference word deleting unit is used for deleting the interference words from the first target words to obtain second target words;
and the high-frequency characteristic word storage unit is used for storing the second target word as a domain high-frequency characteristic word in the knowledge base.
Optionally, the data set extension module comprises:
the second acquisition unit is used for acquiring first target data of a target data source;
the second extraction unit is used for extracting second target data which indicate entities, events and incidence relations among the entities and the events from the first target data;
the second dividing unit is used for dividing the second target data into a plurality of association pairs according to the association form of the entity and the event, and the entity and the event in each association pair are combined according to the association relationship between the entity and the event;
and the second merging unit is used for merging the plurality of association pairs into the first database mode to obtain a second database mode.
Optionally, the second acquisition unit further comprises:
the first acquisition subunit is used for sequentially acquiring first target data in each page in the first capturing link from a starting page of the first capturing link; under the condition that all the pages of the first grabbing link are grabbed completely and the end condition is not met, sequentially grabbing the first target data in all the pages of the second grabbing link from the start page of the second grabbing link continuously until the end condition is met, and stopping grabbing the data;
the second acquisition subunit is used for capturing first target data in the current page; and under the condition that the end condition is not met, determining a target link from a plurality of links in the current page, grabbing first target data in a target page pointed by the target link, and terminating grabbing the data until the end condition is met.
According to another aspect of the embodiments of the present application, there is provided an electronic device, including a memory, a processor, a communication interface, and a communication bus, where the memory stores a computer program executable on the processor, and the memory and the processor communicate with each other through the communication bus and the communication interface, and the processor implements the steps of the method when executing the computer program.
According to another aspect of embodiments of the present application, there is also provided a computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the above-mentioned method.
Compared with the related art, the technical scheme provided by the embodiment of the application has the following advantages:
the technical scheme includes that a first service domain boundary parameter of a first database mode is extracted from a knowledge base, the knowledge base is obtained according to a construction parameter of the first database mode, the first database mode is used for constructing a government affair map, and the first service domain boundary parameter is used for determining a target application range of the government affair map; adjusting the boundary parameter of the first service domain by using the reference data set to obtain a boundary parameter of a second service domain, wherein the data in the reference data set is service data in a target application range; adding a target data source in a target range indicated by the second service domain boundary parameter; and expanding the first database mode by using the data acquired from the target data source to obtain a second database mode. According to the method and the device, knowledge in the process of constructing the schema (database mode) is accumulated in the knowledge base, so that the knowledge precipitated in the early stage can be reused when the schema is expanded, the efficiency of schema expansion is improved, non-professional technicians can participate in the work of expanding the schema, and the cost investment of the personnel is reduced.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the technical solutions in the embodiments or related technologies of the present application, the drawings needed to be used in the description of the embodiments or related technologies will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without any creative effort.
FIG. 1 is a diagram illustrating an alternative hardware environment for a database schema expansion method according to an embodiment of the present application;
FIG. 2 is a flow chart of an alternative database schema expansion method provided in accordance with an embodiment of the present application;
FIG. 3 is a block diagram of an alternative database schema extension apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an alternative electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.
In the related technology, the schema data acquisition of the knowledge graph adopts manual input, and the data is scattered in each service database, so that a knowledge system which can be repeatedly utilized cannot be formed, and the unified management also becomes a difficult problem.
To solve the problems mentioned in the background, according to an aspect of the embodiments of the present application, an embodiment of a database schema extension method is provided.
Alternatively, in this embodiment of the present application, the database schema extension method described above may be applied to a hardware environment formed by the terminal 101 and the server 103 as shown in fig. 1. As shown in fig. 1, a server 103 is connected to a terminal 101 through a network, which may be used to provide services for the terminal or a client installed on the terminal, and a database 105 may be provided on the server or separately from the server, and is used to provide data storage services for the server 103, and the network includes but is not limited to: wide area network, metropolitan area network, or local area network, and the terminal 101 includes but is not limited to a PC, a cell phone, a tablet computer, and the like.
A database schema expansion method in this embodiment may be executed by the server 103, or may be executed by both the server 103 and the terminal 101, as shown in fig. 2, the method may include the following steps:
step S202, extracting a first service domain boundary parameter of a first database mode from a knowledge base, wherein the knowledge base is obtained according to the construction parameter of the first database mode, the first database mode is used for constructing a government affair map, and the first service domain boundary parameter is used for determining the target application range of the government affair map.
In this embodiment of the application, a database schema, that is, a schema, is an organization and a structure of a database, and includes a table (table), a column (column), a data type (data type), a view (view), a stored procedure (stored procedure), a relationship (relationships), a primary key (primary key), a foreign key (foreign key), and the like, and an extended schema is actually data of an extended database and a relationship between data. The schema is the basis for constructing the government affair map. When the government affair map is iterated and updated, the database mode which is continuously updated and expanded is required to be relied on. The iteration and the updating of the government affair map lay a foundation for the development of new business and the construction of a new system.
Optionally, before extracting the first service domain boundary parameter of the first database schema from the knowledge base, the method further includes constructing the first database schema as follows:
step 1, determining a first service domain boundary parameter according to an application range of a government affair map to be constructed;
step 2, determining a first data source in a first range indicated by the first service domain boundary parameter;
step 3, collecting data of a first data source;
step 4, extracting target data which indicates entities, events and incidence relations among the entities and the events from the data of the first data source;
step 5, dividing the target data into a plurality of association pairs according to the association form of the entities and the events, wherein the entities and the events in each association pair are combined according to the association between the entities and the events;
and 6, combining the plurality of associated pairs into a first database mode.
In the embodiment of the application, the application range of the government affair map comprises the scope of the field of a standard map model of the government affair industry, the scene of the business, the processing intention and the range of data, the ranges can be identified and combed according to relevant legal and legal provisions and policy documents of the government affair industry of a national published version, the fields, the business scenes and the data range can be identified and combed by combining the actual experience division of the governments of all places, and the custom configuration can be carried out according to the standard specification. By the method, the filter words and the filter words can be determined, and parameters such as a data acquisition website can be determined to serve as the boundary parameters of the first service domain. The determined data acquisition website is the first data source.
The collected data can be business corpora, texts, tables, existing knowledge bases and the like of the government affairs industry, wherein important terms related to the field and existing field ontologies can be main collection objects. The domain ontology is a description of the subject concept, and comprises concepts in the subject, attributes of the concepts, relationships between the concepts, constraints of the attributes and the relationships, and the like. Since knowledge has significant domain characteristics, domain ontologies can represent knowledge more reasonably and efficiently. A domain ontology may represent specific knowledge within a particular domain. The specific field can be determined according to the requirements of an ontology builder, can be a subject field, can be a combination of several fields, and can also be a small range in one field.
After the data is collected, effective data is extracted, namely data which can reflect entities, events and incidence relations between the entities and the events are extracted, wherein the entities can be government affairs related objects such as enterprises and legal persons, and the events are management activities, legal behaviors and the like of the entities. In particular, valid data may be extracted by a data analysis tool.
And finally, dividing effective data, namely the target data into a plurality of association pairs according to an entity-event association form, and combining the association pairs into a data set to obtain the first database mode.
After the schema is constructed, the schema needs to be continuously updated and iterated to ensure the correctness of the standard map model, and therefore the schema can be extended through relevant technologies such as big data and knowledge map model construction.
Optionally, after obtaining the first database schema, the method further includes constructing a knowledge base as follows:
and storing the first service domain boundary parameter, the first data source and the first database mode in a database to obtain a knowledge base.
In the embodiment of the application, the first service domain boundary parameter, the first data source, the first database mode and other earlier-stage carded data content, the data logic relationship structure, the service term and the industry standard knowledge can be programmed and landed through technical means, and are led into the knowledge base and can be called. The technical means of finishing programmed landing means that when a knowledge base is constructed, a data logic relation structure, a business term and an industry standard used in the field are determined according to the field knowledge, and the knowledge is imported into the database to form the knowledge base. When the definition and the description information of the database storage structure change, the content of the knowledge base is updated in time, and the freshness and the activity of knowledge are ensured.
Optionally, after obtaining the first database schema, constructing the knowledge base further includes:
determining a first target word with the occurrence frequency larger than or equal to a frequency threshold value in a first database mode;
deleting the interference words from the first target words to obtain second target words;
and storing the second target word as a domain high-frequency characteristic word in a knowledge base.
In the embodiment of the application, high-frequency feature words in the field can be extracted through data analysis, first, a first target word with the occurrence frequency larger than or equal to a frequency threshold value is counted, and the frequency threshold value can be set according to actual needs. And then deleting the interference words, such as the words, the real words, each word and the like, from the first target words to obtain second target words, namely high-frequency characteristic words in the field, wherein the high-frequency characteristic words can be accumulated in a knowledge base as knowledge.
And step S204, adjusting the boundary parameter of the first service domain by using the reference data set to obtain the boundary parameter of the second service domain, wherein the data in the reference data set is service data in the target application range.
In the embodiment of the application, the database mode needs to be expanded to determine the domain of the field of the standard map model of the government affairs industry, the scene of the business, the processing intention and the data range again, because the actual real business data in the field can be obtained from the required field to form a reference data set, the boundary parameter of the first business domain is corrected, the boundary parameter of the second business domain is obtained, and thus the data which is more in line with the actual situation of the project or the client is obtained.
In the embodiment of the application, in the modification process, the method for determining the first service domain boundary parameter may be adopted, that is, the identification and the combing may be performed according to relevant legal and regulatory provisions and policy documents of the government affairs industry of the national published version, the identification and the combing may be performed in combination with the fields, service scenes and data ranges divided by the actual experience of the governments of various regions, and the custom configuration may be performed according to the standard specification to finally determine the parameters of the screened words, the filtered words, the data acquisition website and the like.
Step S206, adding a target data source in the target range indicated by the second service domain boundary parameter.
In the embodiment of the application, a new data source can be added, for example, a business corpus, a text, a form, an existing knowledge base and the like of the government industry are collected by enterprise websites, financial websites and the like of the government industry related to the construction of the schema, and important terms and existing domain ontologies related to the field can be main collection objects.
In the embodiment of the application, the existing collected data cannot support the development of new services and the construction of new systems, so that the existing data needs to be supplemented based on the requirements of the new services and the new systems. According to specific requirements, the data collection range is expanded by adopting modes of a web crawler, a data reporting system, new data acquisition equipment and the like.
And S208, expanding the first database mode by using the data acquired from the target data source to acquire a second database mode.
Optionally, the expanding the first database schema by using the data acquired from the target data source to obtain the second database schema includes:
step 1, collecting first target data of a target data source;
step 2, extracting second target data which indicates entities, events and incidence relations among the entities and the events from the first target data;
step 3, dividing the second target data into a plurality of association pairs according to the association form of the entities and the events, wherein the entities and the events in each association pair are combined according to the association relationship between the entities and the events;
and 4, merging the plurality of association pairs into the first database mode to obtain a second database mode.
In the embodiment of the application, the database mode can be expanded according to a mode of constructing the database mode, the first target data is collected from the newly added target data source, and then the effective data, namely the second target data capable of reflecting the entity, the event and the incidence relation between the entity and the event, is extracted from the first target data. And finally, dividing a new entity-event association pair by using the second target data so as to expand the original database mode.
Optionally, acquiring the first target data of the target data source includes at least one of the following ways:
sequentially capturing first target data in each page in the first capturing link from a starting page of the first capturing link; under the condition that all the pages of the first grabbing link are grabbed completely and the end condition is not met, sequentially grabbing the first target data in all the pages of the second grabbing link from the start page of the second grabbing link continuously until the end condition is met, and stopping grabbing the data;
capturing first target data in a current page; and under the condition that the end condition is not met, determining a target link from a plurality of links in the current page, grabbing first target data in a target page pointed by the target link, and terminating grabbing the data until the end condition is met.
In the embodiment of the application, the data can be acquired by performing depth traversal according to a depth-first traversal strategy, and the data can be acquired by performing breadth traversal according to a breadth-first traversal strategy.
The depth-first traversal strategy is to select one link from a plurality of capturing links, namely a first link, then a web crawler starts from a starting page of the first link, tracks the links one by one along the first link, transfers the links into a second link after processing the first link, and performs data crawling one by one along the second link from the starting page in the second link again, wherein in the process of capturing data, if an ending condition is met, the crawling of the data is stopped, and the ending condition can be determined according to the data quantity to be obtained and can also be determined according to the number of the links.
The breadth-first traversal strategy is to directly insert the link found in the newly downloaded webpage into the tail of the address queue to be grabbed. That is, the web crawler may first crawl all web pages linked in the current web page, then select one of the linked web pages, and continue to crawl all web pages linked in the web page. Similarly, crawling data is stopped when an end condition is met, which can be determined according to the amount of data to be acquired and also according to the lateral spread of the link.
By adopting the technical scheme, the knowledge in the schema building process is accumulated in the knowledge base, so that the knowledge precipitated in the early stage can be reused when the schema is expanded, the schema expansion efficiency is improved, non-professional technicians can participate in the schema expansion work, and the cost investment of the personnel is reduced.
According to still another aspect of the embodiments of the present application, as shown in fig. 3, there is provided a database schema expansion apparatus including:
the initial parameter extraction module 301 is configured to extract a first service domain boundary parameter of a first database mode from a knowledge base, where the knowledge base is obtained according to a construction parameter of the first database mode, the first database mode is used to construct a government affair map, and the first service domain boundary parameter is used to determine a target application range of the government affair map;
the parameter modification module 303 is configured to adjust the first service domain boundary parameter by using a reference data set to obtain a second service domain boundary parameter, where data in the reference data set is service data in a target application range;
an extended data source module 305, configured to add a target data source within a target range indicated by the second service domain boundary parameter;
and a data set extension module 307, configured to extend the first database schema by using data acquired from the target data source, so as to obtain a second database schema.
It should be noted that the initial parameter extraction module 301 in this embodiment may be configured to execute step S202 in this embodiment, the parameter modification module 303 in this embodiment may be configured to execute step S204 in this embodiment, the extended data source module 305 in this embodiment may be configured to execute step S206 in this embodiment, and the data set extension module 307 in this embodiment may be configured to execute step S208 in this embodiment.
It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may operate in a hardware environment as shown in fig. 1, and may be implemented by software or hardware.
Optionally, the apparatus further comprises an initial data set building module, comprising:
the initial parameter determining unit is used for determining a first service domain boundary parameter according to the application range of the government affair map to be constructed;
an initial data source determining unit, configured to determine a first data source within a first range indicated by the first service domain boundary parameter;
the first acquisition unit is used for acquiring data of a first data source;
the first extraction unit is used for extracting target data which indicates entities, events and incidence relations among the entities and the events from the data of the first data source;
the first dividing unit is used for dividing the target data into a plurality of association pairs according to the association form of the entity-event, and the entity-event in each association pair is combined according to the association relationship between the entity and the event;
a first merging unit for merging the plurality of associated pairs into a first database schema.
Optionally, the apparatus further includes a knowledge base building module, including:
and the knowledge base construction unit is used for storing the first service domain boundary parameter, the first data source and the first database mode in a database to obtain a knowledge base.
Optionally, the knowledge base building module further includes:
the high-frequency word determining unit is used for determining a first target word of which the occurrence frequency is greater than or equal to a frequency threshold in the first database mode;
the interference word deleting unit is used for deleting the interference words from the first target words to obtain second target words;
and the high-frequency characteristic word storage unit is used for storing the second target word as a domain high-frequency characteristic word in the knowledge base.
Optionally, the data set extension module comprises:
the second acquisition unit is used for acquiring first target data of a target data source;
the second extraction unit is used for extracting second target data which indicate entities, events and incidence relations among the entities and the events from the first target data;
the second dividing unit is used for dividing the second target data into a plurality of association pairs according to the association form of the entity and the event, and the entity and the event in each association pair are combined according to the association relationship between the entity and the event;
and the second merging unit is used for merging the plurality of association pairs into the first database mode to obtain a second database mode.
Optionally, the second acquisition unit further comprises:
the first acquisition subunit is used for sequentially acquiring first target data in each page in the first capturing link from a starting page of the first capturing link; under the condition that all the pages of the first grabbing link are grabbed completely and the end condition is not met, sequentially grabbing the first target data in all the pages of the second grabbing link from the start page of the second grabbing link continuously until the end condition is met, and stopping grabbing the data;
the second acquisition subunit is used for capturing first target data in the current page; and under the condition that the end condition is not met, determining a target link from a plurality of links in the current page, grabbing first target data in a target page pointed by the target link, and terminating grabbing the data until the end condition is met.
According to another aspect of the embodiments of the present application, there is provided an electronic device, as shown in fig. 4, including a memory 401, a processor 403, a communication interface 405, and a communication bus 407, where the memory 401 stores a computer program that is executable on the processor 403, the memory 401 and the processor 403 communicate with each other through the communication interface 405 and the communication bus 407, and the processor 403 implements the steps of the method when executing the computer program.
The memory and the processor in the electronic equipment are communicated with the communication interface through a communication bus. The communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
There is also provided, in accordance with yet another aspect of an embodiment of the present application, a computer-readable medium having non-volatile program code executable by a processor.
Optionally, in an embodiment of the present application, a computer readable medium is configured to store program code for the processor to perform the following steps:
extracting a first service domain boundary parameter of a first database mode from a knowledge base, wherein the knowledge base is obtained according to a construction parameter of the first database mode, the first database mode is used for constructing a government affair map, and the first service domain boundary parameter is used for determining a target application range of the government affair map;
adjusting the boundary parameter of the first service domain by using the reference data set to obtain a boundary parameter of a second service domain, wherein the data in the reference data set is service data in a target application range;
adding a target data source in a target range indicated by the second service domain boundary parameter;
and expanding the first database mode by using the data acquired from the target data source to obtain a second database mode.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.
When the embodiments of the present application are specifically implemented, reference may be made to the above embodiments, and corresponding technical effects are achieved.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk. It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A database schema extension method, comprising:
extracting a first service domain boundary parameter of a first database mode from a knowledge base, wherein the knowledge base is obtained according to a construction parameter of the first database mode, the first database mode is used for constructing a government affair map, and the first service domain boundary parameter is used for determining a target application range of the government affair map;
adjusting the first service domain boundary parameter by using a reference data set to obtain a second service domain boundary parameter, wherein data in the reference data set is service data in the target application range;
adding a target data source in a target range indicated by the second service domain boundary parameter;
and expanding the first database mode by using the data acquired from the target data source to obtain a second database mode.
2. The method of claim 1, wherein prior to extracting the first business domain boundary parameters of the first database schema from the knowledge base, the method further comprises constructing the first database schema as follows:
determining the first service domain boundary parameter according to the application range of the government affair map to be constructed;
determining a first data source in a first range indicated by the first service domain boundary parameter;
collecting data of the first data source;
extracting target data which indicates entities, events and incidence relations among the entities and the events from the data of the first data source;
dividing the target data into a plurality of association pairs according to an entity-event association form, wherein the entity-event in each association pair is combined according to the association relationship between the entity and the event;
merging a plurality of the associated pairs into the first database schema.
3. The method of claim 2, wherein after obtaining the first database schema, the method further comprises building the knowledge base as follows:
and storing the first service domain boundary parameter, the first data source and the first database mode in a database to obtain the knowledge base.
4. The method of claim 3, wherein after obtaining the first database schema, constructing the knowledge base further comprises:
determining a first target word with a frequency of occurrence greater than or equal to a frequency threshold in the first database schema;
deleting the interference words from the first target words to obtain second target words;
and storing the second target word as a domain high-frequency feature word in the knowledge base.
5. The method of any of claims 1 to 4, wherein extending the first database schema with data collected from the target data source to obtain a second database schema comprises:
collecting first target data of the target data source;
extracting second target data which indicates entities, events and incidence relations among the entities and the events from the first target data;
dividing the second target data into a plurality of association pairs according to an entity-event association form, wherein the entity-event in each association pair is combined according to the association relationship between the entity and the event;
and merging the plurality of association pairs into the first database mode to obtain the second database mode.
6. The method of claim 5, wherein acquiring first target data of the target data source comprises at least one of:
sequentially grabbing the first target data in each page in a first grabbing link from a starting page of the first grabbing link; under the condition that all the pages of the first grabbing link are grabbed completely and the end condition is not met, sequentially grabbing the first target data in all the pages of the second grabbing link from the start page of the second grabbing link continuously until the end condition is met, and stopping grabbing the data;
capturing the first target data in the current page; and under the condition that the ending condition is not met, determining a target link from a plurality of links in the current page, grabbing the first target data in the target page pointed by the target link, and terminating data grabbing until the ending condition is met.
7. A database schema extension apparatus, comprising:
the system comprises an initial parameter extraction module, a knowledge base and a service domain boundary extraction module, wherein the initial parameter extraction module is used for extracting a first service domain boundary parameter of a first database mode from the knowledge base, the knowledge base is obtained according to construction parameters of the first database mode, the first database mode is used for constructing a government affair map, and the first service domain boundary parameter is used for determining a target application range of the government affair map;
the parameter correction module is used for adjusting the first service domain boundary parameter by using a reference data set to obtain a second service domain boundary parameter, wherein the data in the reference data set is service data in the target application range;
the extended data source module is used for increasing a target data source in a target range indicated by the second service domain boundary parameter;
and the data set expansion module is used for expanding the first database mode by using the data acquired from the target data source to acquire a second database mode.
8. The apparatus of claim 7, further comprising an initial data set construction module comprising:
the initial parameter determining unit is used for determining the first service domain boundary parameter according to the application range of the government affair map to be constructed;
an initial data source determining unit, configured to determine a first data source within a first range indicated by the first service domain boundary parameter;
the first acquisition unit is used for acquiring data of the first data source;
the first extraction unit is used for extracting target data which indicates entities, events and incidence relations among the entities and the events from the data of the first data source;
the first dividing unit is used for dividing the target data into a plurality of association pairs according to an entity-event association form, wherein the entity-event in each association pair is combined according to the association relationship between the entity and the event;
a first merging unit, configured to merge the plurality of association pairs into the first database schema.
9. An electronic device comprising a memory, a processor, a communication interface and a communication bus, wherein the memory stores a computer program operable on the processor, and the memory and the processor communicate via the communication bus and the communication interface, wherein the processor implements the steps of the method according to any of the claims 1 to 6 when executing the computer program.
10. A computer-readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to perform the method of any of claims 1 to 6.
CN202011408081.0A 2020-12-04 2020-12-04 Database schema extension method, device, equipment and computer readable medium Active CN112463984B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011408081.0A CN112463984B (en) 2020-12-04 2020-12-04 Database schema extension method, device, equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011408081.0A CN112463984B (en) 2020-12-04 2020-12-04 Database schema extension method, device, equipment and computer readable medium

Publications (2)

Publication Number Publication Date
CN112463984A true CN112463984A (en) 2021-03-09
CN112463984B CN112463984B (en) 2024-02-27

Family

ID=74806551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011408081.0A Active CN112463984B (en) 2020-12-04 2020-12-04 Database schema extension method, device, equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN112463984B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446368A (en) * 2018-03-15 2018-08-24 湖南工业大学 A kind of construction method and equipment of Packaging Industry big data knowledge mapping
CN109446341A (en) * 2018-10-23 2019-03-08 国家电网公司 The construction method and device of knowledge mapping
CN109597855A (en) * 2018-11-29 2019-04-09 北京邮电大学 Domain knowledge map construction method and system based on big data driving
WO2019165456A1 (en) * 2018-02-26 2019-08-29 Fractal Industries, Inc. Automated scalable contextual data collection and extraction system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019165456A1 (en) * 2018-02-26 2019-08-29 Fractal Industries, Inc. Automated scalable contextual data collection and extraction system
CN108446368A (en) * 2018-03-15 2018-08-24 湖南工业大学 A kind of construction method and equipment of Packaging Industry big data knowledge mapping
CN109446341A (en) * 2018-10-23 2019-03-08 国家电网公司 The construction method and device of knowledge mapping
CN109597855A (en) * 2018-11-29 2019-04-09 北京邮电大学 Domain knowledge map construction method and system based on big data driving

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姬慎达: ""关系数据库中基于知识库的Top-N关键词查询"", 《中国优秀硕士学位论文全文数据库 信息科技辑》, pages 1 - 43 *

Also Published As

Publication number Publication date
CN112463984B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
CN110472068B (en) Big data processing method, equipment and medium based on heterogeneous distributed knowledge graph
US20180285886A1 (en) System and method for global third party intermediary identification system with anti-bribery and anti-corruption risk assessment
WO2015074503A1 (en) Statistical method and apparatus for webpage access data
DE202014010893U1 (en) Rufwegsucher
CN111382956A (en) Enterprise group relationship mining method and device
CN111899089A (en) Enterprise risk early warning method and system based on knowledge graph
CN103686244A (en) Video data managing method and system
KR20150009798A (en) System for online monitering individual information and method of online monitering the same
CN112149135A (en) Method and device for constructing security vulnerability knowledge graph
Hu et al. How matchable are four thousand ontologies on the semantic web
Sangameswar et al. An algorithm for identification of natural disaster affected area
CN104348871A (en) Similar account expanding method and device
CN103605742A (en) Method and device for recognizing network resource entity content page
CN113553444A (en) Audit knowledge graph representation model based on excess edges and associated reasoning method
CN112463985A (en) Government affair map model construction method, device, equipment and computer readable medium
CN112463984A (en) Database mode expansion method, device, equipment and computer readable medium
CN108038233B (en) Method and device for collecting articles, electronic equipment and storage medium
CN107329956B (en) Project information standardization method and device
Kalameyets et al. Social networks bot detection using Benford’s law
CN115470489A (en) Detection model training method, detection method, device and computer readable medium
CN110083701B (en) Network space group event early warning system based on average influence
CN111291029B (en) Data cleaning method and device
CN108255831B (en) Method and system for generating website map for website
US20190131000A1 (en) Clinical trial support network data security
CN111368550A (en) Public opinion information management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant