CN112463984B - Database schema extension method, device, equipment and computer readable medium - Google Patents
Database schema extension method, device, equipment and computer readable medium Download PDFInfo
- Publication number
- CN112463984B CN112463984B CN202011408081.0A CN202011408081A CN112463984B CN 112463984 B CN112463984 B CN 112463984B CN 202011408081 A CN202011408081 A CN 202011408081A CN 112463984 B CN112463984 B CN 112463984B
- Authority
- CN
- China
- Prior art keywords
- data
- target
- association
- database
- entity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000004891 communication Methods 0.000 claims description 18
- 238000010276 construction Methods 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 11
- 238000012937 correction Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 abstract description 9
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000001556 precipitation Methods 0.000 abstract description 3
- 238000012545 processing Methods 0.000 description 9
- 238000009411 base construction Methods 0.000 description 6
- 230000009471 action Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013480 data collection Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000009193 crawling Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 1
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present disclosure relates to the field of knowledge graph technologies, and in particular, to a method, an apparatus, a device, and a computer readable medium for extending a database schema. The method comprises the following steps: extracting first business domain boundary parameters of a first database mode from a knowledge base; adjusting the boundary parameters of the first service domain by using the reference data set to obtain boundary parameters of the second service domain, wherein the data in the reference data set is service data in a target application range; adding a target data source in a target range indicated by the boundary parameter of the second service domain; and expanding the first database mode by using the data acquired by the target data source to acquire a second database mode. According to the method and the device, the knowledge in the process of constructing the schema (database mode) is accumulated in the knowledge base, so that the knowledge of the earlier precipitation can be reused when the schema is expanded, the efficiency of the schema expansion is improved, and non-professional technicians can participate in the work of expanding the schema, so that the cost investment of the personnel is reduced.
Description
Technical Field
The present disclosure relates to the field of knowledge graph technologies, and in particular, to a method, an apparatus, a device, and a computer readable medium for extending a database schema.
Background
By constructing an intelligent knowledge extraction engine in the government industry, summarizing and fusing various government data, extracting entity types, establishing mutual relations according to attribute relations, semantic relations, characteristic relations and the like in the entity types, forming a government knowledge graph, providing basic data support and service for various government service applications, macro decisions and the like, and improving government governance capability. The scheme of the government knowledge graph provides data support for the knowledge graph.
At present, in the related technology, manual input is adopted for schema data acquisition of the knowledge graph, and data are scattered in each business database, so that a knowledge system which can be repeatedly utilized is not formed, and unified management becomes a difficult problem.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The application provides a database mode expansion method, a device, equipment and a computer readable medium, which are used for solving the technical problems that early experience cannot be reused and expansion efficiency is low.
According to an aspect of an embodiment of the present application, there is provided a database schema extension method, including:
extracting first business domain boundary parameters of a first database mode from a knowledge base, wherein the knowledge base is obtained according to construction parameters of the first database mode, the first database mode is used for constructing a government affair map, and the first business domain boundary parameters are used for determining a target application range of the government affair map;
adjusting the boundary parameters of the first service domain by using the reference data set to obtain boundary parameters of the second service domain, wherein the data in the reference data set is service data in a target application range;
adding a target data source in a target range indicated by the boundary parameter of the second service domain;
and expanding the first database mode by using the data acquired by the target data source to acquire a second database mode.
Optionally, before extracting the first traffic domain boundary parameter of the first database schema from the knowledge base, the method further comprises constructing the first database schema as follows:
determining a boundary parameter of a first service domain according to an application range of a government map to be constructed;
determining a first data source in a first range indicated by a first service domain boundary parameter;
Collecting data of a first data source;
extracting target data indicating an entity, an event and an association relationship between the entity and the event from data of a first data source;
dividing target data into a plurality of association pairs according to the association form of the entity-event, wherein the entity-event in each association pair is combined according to the association relation between the entity and the event;
the plurality of association pairs are merged into a first database schema.
Optionally, after obtaining the first database schema, the method further comprises building a knowledge base in the following manner:
and storing the boundary parameters of the first service domain, the first data source and the first database mode in a database to obtain a knowledge base.
Optionally, after obtaining the first database schema, constructing the knowledge base further includes:
determining a first target word with the occurrence number greater than or equal to a number threshold in a first database mode;
deleting the interference words from the first target words to obtain second target words;
and saving the second target word as a domain high-frequency characteristic word in a knowledge base.
Optionally, expanding the first database schema with data acquired from the target data source to obtain the second database schema includes:
Collecting first target data of a target data source;
extracting second target data indicating an entity, an event and an association relationship between the entity and the event from the first target data;
dividing the second target data into a plurality of association pairs according to the association form of the entity-event, wherein the entity-event in each association pair is combined according to the association relation between the entity and the event;
and merging the plurality of association pairs into the first database schema to obtain a second database schema.
Optionally, collecting the first target data of the target data source includes at least one of:
sequentially grabbing first target data in each page in the first grabbing link from the initial page of the first grabbing link; under the condition that all pages of the first grabbing link are grabbed and the ending condition is not met, continuing to grab first target data in each page in the second grabbing link from the initial page of the second grabbing link in sequence until the ending condition is met, and ending grabbing data;
grabbing first target data in a current page; and under the condition that the ending condition is not met, determining a target link from a plurality of links in the current page, and grabbing first target data in the target page pointed by the target link until the ending condition is met, and ending grabbing data.
According to another aspect of the embodiments of the present application, there is provided a database schema extension apparatus, including:
the system comprises an initial parameter extraction module, a first database mode and a second database mode, wherein the initial parameter extraction module is used for extracting first business domain boundary parameters of the first database mode from a knowledge base, the knowledge base is obtained according to construction parameters of the first database mode, the first database mode is used for constructing a government map, and the first business domain boundary parameters are used for determining a target application range of the government map;
the parameter correction module is used for adjusting the boundary parameters of the first service domain by using the reference data set to obtain the boundary parameters of the second service domain, wherein the data in the reference data set is the service data in the target application range;
the extended data source module is used for adding a target data source in a target range indicated by the boundary parameter of the second service domain;
and the data set expansion module is used for expanding the first database mode by utilizing the data acquired by the target data source to acquire a second database mode.
Optionally, the apparatus further comprises an initial dataset construction module comprising:
the initial parameter determining unit is used for determining a first service domain boundary parameter according to the application range of the government map to be constructed;
An initial data source determining unit, configured to determine a first data source within a first range indicated by a first service domain boundary parameter;
the first acquisition unit is used for acquiring data of the first data source;
the first extraction unit is used for extracting target data indicating entities, events and association relations between the entities and the events from the data of the first data source;
the first dividing unit is used for dividing the target data into a plurality of association pairs according to the association form of the entity-event, and the entity-event in each association pair is combined according to the association relation between the entity and the event;
and the first merging unit is used for merging the plurality of association pairs into a first database mode.
Optionally, the apparatus further includes a knowledge base construction module, including:
the knowledge base construction unit is used for storing the first business domain boundary parameters, the first data source and the first database mode in the database to obtain a knowledge base.
Optionally, the knowledge base construction module further includes:
the high-frequency word determining unit is used for determining a first target word with the occurrence frequency greater than or equal to a frequency threshold value in the first database mode;
the interference word deleting unit is used for deleting the interference word from the first target word to obtain a second target word;
And the high-frequency characteristic word storage unit is used for storing the second target word as the domain high-frequency characteristic word in the knowledge base.
Optionally, the data set expansion module includes:
the second acquisition unit is used for acquiring first target data of the target data source;
the second extraction unit is used for extracting second target data indicating an entity, an event and an association relationship between the entity and the event in the first target data;
the second dividing unit is used for dividing the second target data into a plurality of association pairs according to the association form of the entity-event, and the entity-event in each association pair is combined according to the association relation between the entity and the event;
and the second merging unit is used for merging the plurality of association pairs into the first database mode to obtain a second database mode.
Optionally, the second acquisition unit further comprises:
the first acquisition subunit is used for sequentially grabbing first target data in each page in the first grabbing link from the initial page of the first grabbing link; under the condition that all pages of the first grabbing link are grabbed and the ending condition is not met, continuing to grab first target data in each page in the second grabbing link from the initial page of the second grabbing link in sequence until the ending condition is met, and ending grabbing data;
The second acquisition subunit is used for capturing the first target data in the current page; and under the condition that the ending condition is not met, determining a target link from a plurality of links in the current page, and grabbing first target data in the target page pointed by the target link until the ending condition is met, and ending grabbing data.
According to another aspect of the embodiments of the present application, there is provided an electronic device including a memory, a processor, a communication interface, and a communication bus, where the memory stores a computer program executable on the processor, the memory, the processor, and the processor communicate through the communication bus and the communication interface, and the processor executes the steps of the method.
According to another aspect of embodiments of the present application, there is also provided a computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the above-described method.
Compared with the related art, the technical scheme provided by the embodiment of the application has the following advantages:
the technical scheme includes that first business domain boundary parameters of a first database mode are extracted from a knowledge base, the knowledge base is obtained according to construction parameters of the first database mode, the first database mode is used for constructing a government map, and the first business domain boundary parameters are used for determining a target application range of the government map; adjusting the boundary parameters of the first service domain by using the reference data set to obtain boundary parameters of the second service domain, wherein the data in the reference data set is service data in a target application range; adding a target data source in a target range indicated by the boundary parameter of the second service domain; and expanding the first database mode by using the data acquired by the target data source to acquire a second database mode. According to the method and the device, knowledge in the process of constructing the schema (database mode) is accumulated in the knowledge base, so that knowledge of early precipitation can be reused when the schema is expanded, the efficiency of the schema expansion is improved, non-professional technicians can participate in the work of expanding the schema, and the cost investment of the personnel is reduced.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it is obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort.
FIG. 1 is a schematic diagram of an alternative hardware environment for a database schema extension method according to an embodiment of the present application;
FIG. 2 is a flowchart of an alternative database schema extension method provided in accordance with an embodiment of the present application;
FIG. 3 is a block diagram of an alternative database schema extension apparatus provided in accordance with an embodiment of the present application;
fig. 4 is a schematic structural diagram of an alternative electronic device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein.
In the following description, suffixes such as "module", "component", or "unit" for representing elements are used only for facilitating the description of the present application, and are not of specific significance per se. Thus, "module" and "component" may be used in combination.
In the related art, the schema data of the knowledge graph is manually input, and the data are scattered in each business database, so that a knowledge system which can be repeatedly utilized is not formed, and unified management becomes a difficult problem.
To solve the problems mentioned in the background art, according to an aspect of the embodiments of the present application, an embodiment of a database schema extension method is provided.
Alternatively, in the embodiment of the present application, the above database schema extension method may be applied to a hardware environment configured by the terminal 101 and the server 103 as shown in fig. 1. As shown in fig. 1, the server 103 is connected to the terminal 101 through a network, which may be used to provide services to the terminal or a client installed on the terminal, and a database 105 may be provided on the server or independent of the server, for providing data storage services to the server 103, where the network includes, but is not limited to: a wide area network, metropolitan area network, or local area network, and terminal 101 includes, but is not limited to, a PC, a cell phone, a tablet computer, etc.
A method for expanding database schema in the embodiments of the present application may be executed by the server 103, or may be executed by the server 103 and the terminal 101 together, as shown in fig. 2, and the method may include the following steps:
step S202, first business domain boundary parameters of a first database mode are extracted from a knowledge base, the knowledge base is obtained according to construction parameters of the first database mode, the first database mode is used for constructing a government map, and the first business domain boundary parameters are used for determining a target application range of the government map.
In this embodiment of the present application, the database schema, i.e., schema, is an organization and structure of a database, including a table (table), a column (column), a data type (data type), a view (view), a storage procedure (stored procedures), a relationship (relationships), a primary key (primary key), an external key (foreign key), and the like, and extends the schema, which is actually the relationship between data and data of the extended database. The schema is the basis for constructing government maps. When iterating and updating government maps, the database mode which is continuously updated and expanded needs to be relied on. The iteration and update of the government affair map also lay a foundation for the development of new business and the construction of new systems.
Optionally, before extracting the first traffic domain boundary parameter of the first database schema from the knowledge base, the method further comprises constructing the first database schema as follows:
step 1, determining a first business domain boundary parameter according to an application range of a government map to be constructed;
step 2, determining a first data source in a first range indicated by a first service domain boundary parameter;
step 3, collecting data of a first data source;
step 4, extracting target data indicating entities, events and association relations between the entities and the events from the data of the first data source;
step 5, dividing the target data into a plurality of association pairs according to the association form of the entity-event, wherein the entity-event in each association pair is combined according to the association relation between the entity and the event;
and 6, combining the plurality of association pairs into a first database mode.
In this embodiment of the present application, the application range of the government affair map includes the domain of the standard map model of the government affair industry, the scene of the business, the intention of processing, and the range of data, where these ranges can be identified and carded according to the relevant legal regulations of the government affair industry of the national published version, the policy files can be identified and carded by combining the domain, the business scene, and the data range of the actual experience division of the government in each place, and also can be configured by self-defining according to the standard specification. Through the method, the screening words and the filtering words can be determined, and parameters such as a data acquisition website and the like can be determined to serve as the boundary parameters of the first service domain. The determined data acquisition website is the first data source.
The data collection may be to obtain a business corpus, text, table, existing knowledge base, etc. of the government industry, where important terms related to the domain and existing domain ontology may be main collection objects. Domain ontology is a description of subject concepts, including concepts in the subject, attributes of the concepts, relationships between the concepts, and constraints of the attributes and relationships, etc. Since knowledge has remarkable domain characteristics, domain entities can represent knowledge more reasonably and efficiently. Domain ontology may represent a specific knowledge within a certain domain. The specific domain can be determined according to the requirement of the ontology constructor, can be a discipline domain, can be a combination of a plurality of domains, and can be a small range in one domain.
After the data is collected, effective data, namely data which can reflect the entity, the event and the association relation between the entity and the event are extracted, wherein the entity can be an object related to government affairs such as enterprises, legal persons and the like, and the event is the management activity, legal action and the like of the entity. In particular, valid data may be extracted by a data analysis tool.
And finally, dividing the effective data, namely the target data, into a plurality of association pairs according to the association form of the entity-event, and merging the plurality of association pairs into a data set to obtain the first database mode.
After the schema is constructed, the schema is required to be updated continuously to ensure the correctness of the standard graph model, and the schema can be expanded through related technologies such as big data, knowledge graph model construction and the like.
Optionally, after obtaining the first database schema, the method further comprises building a knowledge base in the following manner:
and storing the boundary parameters of the first service domain, the first data source and the first database mode in a database to obtain a knowledge base.
In the embodiment of the application, the first service domain boundary parameters, the first data source, the first database mode and other data contents, the data logic relation structure, the service term and industry standard knowledge which are pre-carded can be subjected to programmed landing by technical means, and the first service domain boundary parameters, the first data source, the first database mode and other data contents, the data logic relation structure, the service term and the industry standard knowledge are imported into a knowledge base and can be called. The technical means for completing programmed landing means that when a knowledge base is constructed, a data logic relation structure, a business term and an industry standard used in the field are determined according to field knowledge, and the knowledge is imported into a database to form the knowledge base. When the definition and description information of the database storage structure change, the content of the knowledge base is updated timely, so that the freshness of the knowledge is ensured.
Optionally, after obtaining the first database schema, constructing the knowledge base further includes:
determining a first target word with the occurrence number greater than or equal to a number threshold in a first database mode;
deleting the interference words from the first target words to obtain second target words;
and saving the second target word as a domain high-frequency characteristic word in a knowledge base.
In the embodiment of the application, the high-frequency characteristic words in the field can be extracted through data analysis, first, the first target words with the occurrence times greater than or equal to the time threshold are counted, and the time threshold can be set according to actual needs. Then, deleting the interference words, such as the words of the first target word, the actual words, each word and the like, from the first target word to obtain a second target word, namely obtaining high-frequency characteristic words in the field, wherein the high-frequency characteristic words can be accumulated as knowledge in a knowledge base.
Step S204, the first service domain boundary parameter is adjusted by using the reference data set to obtain the second service domain boundary parameter, and the data in the reference data set is the service data in the target application range.
In the embodiment of the present application, the domain of the standard map model of the government industry, the scene of the service, the intention of the process, and the range of the data need to be redetermined in the extended database mode, because the actual real service data in the domain can be obtained from the required domain to form the reference data set, the boundary parameters of the first service domain can be corrected, and the boundary parameters of the second service domain can be obtained, so that the data more in line with the actual situation of the project or the customer can be obtained.
In the embodiment of the application, in the correction process, the method for determining the boundary parameters of the first service domain can be adopted, that is, the policy files can be identified and combed according to relevant legal regulations of government industry of the national published version, the identification and the combing can be performed by combining the field, the service scene and the data range of the actual experience division of each government, and the parameters such as the screening word, the filtering word, the data acquisition website and the like can be finally determined by custom configuration according to the standard specification.
In step S206, the target data source is added in the target range indicated by the second service domain boundary parameter.
In the embodiment of the application, new data sources can be added, such as business corpora, texts, forms, existing knowledge bases and the like of the government industry, such as enterprise websites, financial websites and the like of the government industry related to the construction scheme, and important terms related to the field and existing field ontology can be main collection objects.
In the embodiment of the present application, the existing collected data cannot support the development of the new service and cannot support the construction of the new system, so that the existing data needs to be supplemented based on the new service and the requirements of the new system. According to specific requirements, the data collection range is enlarged by adopting modes such as web crawlers, data reporting systems, new data collection equipment and the like.
Step S208, the first database mode is expanded by utilizing the data acquired by the target data source, and a second database mode is obtained.
Optionally, expanding the first database schema with data acquired from the target data source to obtain the second database schema includes:
step 1, collecting first target data of a target data source;
step 2, extracting second target data indicating an entity, an event and an association relationship between the entity and the event from the first target data;
step 3, dividing the second target data into a plurality of association pairs according to the association form of the entity-event, wherein the entity-event in each association pair is combined according to the association relation between the entity and the event;
and 4, merging the plurality of association pairs into the first database mode to obtain a second database mode.
In the embodiment of the present application, the database schema may be extended in a manner of constructing the database schema, first target data may be collected from a newly added target data source, and then effective data may be extracted from the first target data, that is, second target data capable of reflecting an entity, an event, and an association relationship between the entity and the event. And finally, dividing a new entity-event association pair by using the second target data, thereby expanding the original database mode.
Optionally, collecting the first target data of the target data source includes at least one of:
sequentially grabbing first target data in each page in the first grabbing link from the initial page of the first grabbing link; under the condition that all pages of the first grabbing link are grabbed and the ending condition is not met, continuing to grab first target data in each page in the second grabbing link from the initial page of the second grabbing link in sequence until the ending condition is met, and ending grabbing data;
grabbing first target data in a current page; and under the condition that the ending condition is not met, determining a target link from a plurality of links in the current page, and grabbing first target data in the target page pointed by the target link until the ending condition is met, and ending grabbing data.
In the embodiment of the application, the depth traversal may be performed according to the depth-first traversal policy to obtain the data, and the breadth traversal may also be performed according to the breadth-first traversal policy to obtain the data.
The depth-first traversal strategy is to select one link from a plurality of grabbing links, namely a first link, then the web crawler starts from a start page of the first link, tracks one link after another along the first link, transfers the first link into a second link after finishing processing the first link, starts from the start page in the second link, and then crawls data one link after another along the second link, and if an end condition is met in the process of grabbing data, the crawling of the data is stopped, the end condition can be determined according to the amount of the data to be acquired, and can also be determined according to the number of the links.
The breadth-first traversal strategy refers to that links found in the newly downloaded web page are directly inserted into the end of the address queue to be crawled. That is, the web crawler will first capture all the web pages linked in the current web page, then select one of the linked web pages, and continue capturing all the web pages linked in the web page. Likewise, crawling of data is stopped when an end condition is satisfied, which may be determined according to the amount of data to be acquired, and may also be determined according to the lateral extent of the link.
By adopting the technical scheme, the knowledge in the process of constructing the schema is accumulated in the knowledge base, so that the knowledge of the earlier-stage precipitation can be reused when the schema is expanded, the efficiency of the schema expansion is further improved, and non-professional technicians can participate in the work of expanding the schema, thereby reducing the cost input of the personnel.
According to still another aspect of the embodiments of the present application, as shown in fig. 3, there is provided a database schema extension apparatus, including:
the initial parameter extraction module 301 is configured to extract a first service domain boundary parameter of a first database schema from a knowledge base, where the knowledge base is obtained according to a construction parameter of the first database schema, the first database schema is used to construct a government map, and the first service domain boundary parameter is used to determine a target application range of the government map;
The parameter correction module 303 is configured to adjust the boundary parameter of the first service domain by using the reference data set to obtain a boundary parameter of the second service domain, where data in the reference data set is service data in the target application range;
an extended data source module 305, configured to increase a target data source within a target range indicated by the second service domain boundary parameter;
the data set expansion module 307 is configured to expand the first database schema by using data acquired from the target data source to obtain the second database schema.
It should be noted that, the initial parameter extraction module 301 in this embodiment may be used to perform step S202 in the embodiment of the present application, the parameter correction module 303 in this embodiment may be used to perform step S204 in the embodiment of the present application, the extended data source module 305 in this embodiment may be used to perform step S206 in the embodiment of the present application, and the data set extension module 307 in this embodiment may be used to perform step S208 in the embodiment of the present application.
It should be noted that the above modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to what is disclosed in the above embodiments. It should be noted that the above modules may be implemented in software or hardware as a part of the apparatus in the hardware environment shown in fig. 1.
Optionally, the apparatus further comprises an initial dataset construction module comprising:
the initial parameter determining unit is used for determining a first service domain boundary parameter according to the application range of the government map to be constructed;
an initial data source determining unit, configured to determine a first data source within a first range indicated by a first service domain boundary parameter;
the first acquisition unit is used for acquiring data of the first data source;
the first extraction unit is used for extracting target data indicating entities, events and association relations between the entities and the events from the data of the first data source;
the first dividing unit is used for dividing the target data into a plurality of association pairs according to the association form of the entity-event, and the entity-event in each association pair is combined according to the association relation between the entity and the event;
and the first merging unit is used for merging the plurality of association pairs into a first database mode.
Optionally, the apparatus further includes a knowledge base construction module, including:
the knowledge base construction unit is used for storing the first business domain boundary parameters, the first data source and the first database mode in the database to obtain a knowledge base.
Optionally, the knowledge base construction module further includes:
The high-frequency word determining unit is used for determining a first target word with the occurrence frequency greater than or equal to a frequency threshold value in the first database mode;
the interference word deleting unit is used for deleting the interference word from the first target word to obtain a second target word;
and the high-frequency characteristic word storage unit is used for storing the second target word as the domain high-frequency characteristic word in the knowledge base.
Optionally, the data set expansion module includes:
the second acquisition unit is used for acquiring first target data of the target data source;
the second extraction unit is used for extracting second target data indicating an entity, an event and an association relationship between the entity and the event in the first target data;
the second dividing unit is used for dividing the second target data into a plurality of association pairs according to the association form of the entity-event, and the entity-event in each association pair is combined according to the association relation between the entity and the event;
and the second merging unit is used for merging the plurality of association pairs into the first database mode to obtain a second database mode.
Optionally, the second acquisition unit further comprises:
the first acquisition subunit is used for sequentially grabbing first target data in each page in the first grabbing link from the initial page of the first grabbing link; under the condition that all pages of the first grabbing link are grabbed and the ending condition is not met, continuing to grab first target data in each page in the second grabbing link from the initial page of the second grabbing link in sequence until the ending condition is met, and ending grabbing data;
The second acquisition subunit is used for capturing the first target data in the current page; and under the condition that the ending condition is not met, determining a target link from a plurality of links in the current page, and grabbing first target data in the target page pointed by the target link until the ending condition is met, and ending grabbing data.
According to another aspect of the embodiments of the present application, as shown in fig. 4, the present application provides an electronic device, including a memory 401, a processor 403, a communication interface 405 and a communication bus 407, where the memory 401 stores a computer program that can be executed on the processor 403, and the memory 401 and the processor 403 communicate with each other through the communication interface 405 and the communication bus 407, and the processor 403 executes the steps of the method.
The memory and the processor in the electronic device communicate with the communication interface through a communication bus. The communication bus may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The communication bus may be classified as an address bus, a data bus, a control bus, or the like.
The memory may include random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
There is also provided, in accordance with yet another aspect of an embodiment of the present application, a computer readable medium having non-volatile program code executable by a processor.
Optionally, in an embodiment of the present application, the computer readable medium is configured to store program code for the processor to perform the steps of:
extracting first business domain boundary parameters of a first database mode from a knowledge base, wherein the knowledge base is obtained according to construction parameters of the first database mode, the first database mode is used for constructing a government affair map, and the first business domain boundary parameters are used for determining a target application range of the government affair map;
Adjusting the boundary parameters of the first service domain by using the reference data set to obtain boundary parameters of the second service domain, wherein the data in the reference data set is service data in a target application range;
adding a target data source in a target range indicated by the boundary parameter of the second service domain;
and expanding the first database mode by using the data acquired by the target data source to acquire a second database mode.
Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments, and this embodiment is not described herein.
In specific implementation, the embodiments of the present application may refer to the above embodiments, which have corresponding technical effects.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (DSP devices, DSPD), programmable logic devices (Programmable Logic Device, PLD), field programmable gate arrays (Field-Programmable Gate Array, FPGA), general purpose processors, controllers, microcontrollers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques described herein may be implemented by means of units that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or, what contributes to the prior art, or part of the technical solutions, may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc. It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely a specific embodiment of the application to enable one skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (7)
1. A method for expanding database schema, comprising:
extracting a first business domain boundary parameter of a first database mode from a knowledge base, wherein the knowledge base is obtained according to construction parameters of the first database mode, the first database mode is used for constructing a government map, and the first business domain boundary parameter is used for determining a target application range of the government map;
adjusting the boundary parameters of the first service domain by using a reference data set to obtain boundary parameters of a second service domain, wherein the data in the reference data set is the service data in the target application range;
Adding a target data source in a target application range indicated by the boundary parameter of the second service domain;
expanding the first database mode by utilizing the data acquired by the target data source to acquire a second database mode;
before extracting the first business domain boundary parameters of the first database schema from the knowledge base, the method further comprises constructing the first database schema as follows: determining the boundary parameters of the first service domain according to the application range of the government map to be constructed; determining a first data source in a first range indicated by the first service domain boundary parameter; collecting data of the first data source; extracting target data indicating entities, events and association relations between the entities and the events from the data of the first data source; dividing the target data into a plurality of association pairs according to the association form of the entity-event, wherein the entity-event in each association pair is combined according to the association relation between the entity and the event; merging a plurality of said association pairs into said first database schema;
expanding the first database schema by using the data acquired by the target data source, and acquiring a second database schema comprises: collecting first target data of the target data source; extracting second target data indicating an entity, an event and an association relationship between the entity and the event from the first target data; dividing the second target data into a plurality of association pairs according to the association form of the entity-event, wherein the entity-event in each association pair is combined according to the association relation between the entity and the event; and merging a plurality of association pairs into the first database schema to obtain the second database schema.
2. The method of claim 1, wherein after obtaining the first database schema, the method further comprises building the knowledge base as follows:
and storing the first business domain boundary parameter, the first data source and the first database mode in a database to obtain the knowledge base.
3. The method of claim 2, wherein building the knowledge base after obtaining the first database schema further comprises:
determining a first target word with the occurrence number greater than or equal to a frequency threshold value in the first database mode;
deleting the interference words from the first target words to obtain second target words;
and storing the second target words in the knowledge base as domain high-frequency characteristic words.
4. The method of claim 1, wherein collecting first target data of the target data source comprises at least one of:
sequentially grabbing the first target data in each page in the first grabbing link from the initial page of the first grabbing link; under the condition that all pages of the first grabbing link are grabbed and the ending condition is not met, continuing to grab the first target data in each page in the second grabbing link from the initial page of the second grabbing link in sequence until the ending condition is met, and ending grabbing data;
Capturing the first target data in the current page; and under the condition that the ending condition is not met, determining a target link from a plurality of links in the current page, and grabbing the first target data in the target page pointed by the target link until the ending condition is met, and ending grabbing data.
5. A database schema extension apparatus, comprising:
the system comprises an initial parameter extraction module, a first database mode and a second database mode, wherein the initial parameter extraction module is used for extracting first business domain boundary parameters of a first database mode from a knowledge base, the knowledge base is obtained according to construction parameters of the first database mode, the first database mode is used for constructing a government map, and the first business domain boundary parameters are used for determining a target application range of the government map;
the parameter correction module is used for adjusting the boundary parameters of the first service domain by using a reference data set to obtain boundary parameters of a second service domain, wherein the data in the reference data set is the service data in the target application range;
the extended data source module is used for adding a target data source in a target application range indicated by the boundary parameter of the second service domain;
The data set expansion module is used for expanding the first database mode by utilizing the data acquired by the target data source to acquire a second database mode;
an initial dataset construction module for: determining the boundary parameters of the first service domain according to the application range of the government map to be constructed; determining a first data source in a first range indicated by the first service domain boundary parameter; collecting data of the first data source; extracting target data indicating entities, events and association relations between the entities and the events from the data of the first data source; dividing the target data into a plurality of association pairs according to the association form of the entity-event, wherein the entity-event in each association pair is combined according to the association relation between the entity and the event; merging a plurality of said association pairs into said first database schema;
a data set expansion module for: collecting first target data of the target data source; extracting second target data indicating an entity, an event and an association relationship between the entity and the event from the first target data; dividing the second target data into a plurality of association pairs according to the association form of the entity-event, wherein the entity-event in each association pair is combined according to the association relation between the entity and the event; and merging a plurality of association pairs into the first database schema to obtain the second database schema.
6. An electronic device comprising a memory, a processor, a communication interface and a communication bus, said memory storing a computer program executable on said processor, said memory, said processor communicating with said communication interface via said communication bus, characterized in that said processor, when executing said computer program, implements the steps of the method of any of the preceding claims 1 to 4.
7. A computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method of any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011408081.0A CN112463984B (en) | 2020-12-04 | 2020-12-04 | Database schema extension method, device, equipment and computer readable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011408081.0A CN112463984B (en) | 2020-12-04 | 2020-12-04 | Database schema extension method, device, equipment and computer readable medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112463984A CN112463984A (en) | 2021-03-09 |
CN112463984B true CN112463984B (en) | 2024-02-27 |
Family
ID=74806551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011408081.0A Active CN112463984B (en) | 2020-12-04 | 2020-12-04 | Database schema extension method, device, equipment and computer readable medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112463984B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446368A (en) * | 2018-03-15 | 2018-08-24 | 湖南工业大学 | A kind of construction method and equipment of Packaging Industry big data knowledge mapping |
CN109446341A (en) * | 2018-10-23 | 2019-03-08 | 国家电网公司 | The construction method and device of knowledge mapping |
CN109597855A (en) * | 2018-11-29 | 2019-04-09 | 北京邮电大学 | Domain knowledge map construction method and system based on big data driving |
WO2019165456A1 (en) * | 2018-02-26 | 2019-08-29 | Fractal Industries, Inc. | Automated scalable contextual data collection and extraction system |
-
2020
- 2020-12-04 CN CN202011408081.0A patent/CN112463984B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019165456A1 (en) * | 2018-02-26 | 2019-08-29 | Fractal Industries, Inc. | Automated scalable contextual data collection and extraction system |
CN108446368A (en) * | 2018-03-15 | 2018-08-24 | 湖南工业大学 | A kind of construction method and equipment of Packaging Industry big data knowledge mapping |
CN109446341A (en) * | 2018-10-23 | 2019-03-08 | 国家电网公司 | The construction method and device of knowledge mapping |
CN109597855A (en) * | 2018-11-29 | 2019-04-09 | 北京邮电大学 | Domain knowledge map construction method and system based on big data driving |
Non-Patent Citations (1)
Title |
---|
"关系数据库中基于知识库的Top-N关键词查询";姬慎达;《中国优秀硕士学位论文全文数据库 信息科技辑》;1-43 * |
Also Published As
Publication number | Publication date |
---|---|
CN112463984A (en) | 2021-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103390066B (en) | A kind of database overall automation optimizes prior-warning device and disposal route thereof | |
CN111382956A (en) | Enterprise group relationship mining method and device | |
CN111899089A (en) | Enterprise risk early warning method and system based on knowledge graph | |
CN108629413A (en) | Neural network model training, trading activity Risk Identification Method and device | |
CN111127105A (en) | User hierarchical model construction method and system, and operation analysis method and system | |
CN110019267A (en) | A kind of metadata updates method, apparatus, system, electronic equipment and storage medium | |
CN109543925A (en) | Risk Forecast Method, device, computer equipment and storage medium based on machine learning | |
CN113034000B (en) | Wind control processing method and device, computing equipment and storage medium | |
CN111580874A (en) | System safety control method and system for data application and computer equipment | |
CN113553444A (en) | Audit knowledge graph representation model based on excess edges and associated reasoning method | |
CN111061679A (en) | Method and system for rapid configuration of technological innovation policy based on rete and drools rules | |
CN112463984B (en) | Database schema extension method, device, equipment and computer readable medium | |
CN106250456A (en) | Bid winning announcement extraction method and device | |
CN109918623A (en) | A kind of Policy Updates compare management system automatically | |
CN115329011A (en) | Data model construction method, data query method, data model construction device and data query device, and storage medium | |
CN113450067B (en) | Risk control method, device and system based on decision engine and electronic device | |
KR102596718B1 (en) | Method for managing integrated labor and computing device for executing the method | |
CN109213909A (en) | A kind of big data analysis system and its analysis method fusion search and calculated | |
KR102596717B1 (en) | Method for managing integrated labor and computing device for executing the method | |
CN114756685A (en) | Complaint risk identification method and device for complaint sheet | |
CN111291029B (en) | Data cleaning method and device | |
CN114491563A (en) | Method for acquiring risk level of information security event and related device | |
Anh | Web Scraping: A Big Data Building Tool And Its Status In The Fintech Sector In Viet Nam | |
Hoang et al. | Extraction of TimeER model from a relational database | |
CN115269879B (en) | Knowledge structure data generation method, data search method and risk warning method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |