CN113821554A

CN113821554A - Method for realizing data acquisition of heterogeneous database

Info

Publication number: CN113821554A
Application number: CN202110941795.6A
Authority: CN
Inventors: 和雄伟; 师丹华; 杨光华; 魏专利; 梁晓霞
Original assignee: Taiyuan Great Times Technology Co ltd
Current assignee: Taiyuan Great Times Technology Co ltd
Priority date: 2021-08-17
Filing date: 2021-08-17
Publication date: 2021-12-21
Anticipated expiration: 2041-08-17
Also published as: CN113821554B

Abstract

The invention provides a method for realizing data acquisition of a heterogeneous database, which comprises the following steps: configuring database configuration information and trigger rules for each branch company database according to the basic information of each branch company database; selecting a database driver according to database configuration information of each branch company, and determining a data acquisition interface based on the database driver; logging in a branch company database according to the trigger rule, and acquiring data by using the data acquisition interface; processing the acquired data and summarizing the data to a head office database; according to the invention, the data of each branch company database is directly collected and summarized to the main company database, so that the problem that the data format is inconsistent and the great inconvenience is brought to later data summarization is avoided, and one-time development and adaptation to each branch company are realized by configuring database information, triggering rules and database driving for each branch company database.

Description

Method for realizing data acquisition of heterogeneous database

Technical Field

The invention relates to the technical field of data acquisition, in particular to a method for realizing data acquisition of a heterogeneous database.

Background

Data has become a major priority for businesses of various sizes. As the technology for collecting and analyzing data has proliferated, the ability of enterprises to place data in context and obtain new insights therefrom has also increased. In order to predict the behavior path of the consumer more accurately, enterprises need to do information collection work every day, and store and analyze the data. Without data support, the marketing of the enterprise may be blind, and the brand's intent of desiring products and services to cover the target audience or character is most likely to fall short. However, many companies do not have such requirements in the early stage of the project, or the consideration is incomplete, so that the main company and the branch company respectively use different systems, the databases and the tables are different, and great inconvenience is brought to later data summarization.

And a general data acquisition system is developed for each branch company respectively, and then data are transmitted to a general company, so that the development work is complicated, the later-stage manual maintenance is needed, and the time and the labor are consumed.

Disclosure of Invention

The invention provides a method for realizing data acquisition of a heterogeneous database, which directly acquires and summarizes data of each branch company database to a main company database to avoid causing inconsistency of data formats and bring great inconvenience to later-stage data summarization.

The invention provides a method for realizing data acquisition of a heterogeneous database, which comprises the following steps:

step 1: configuring database configuration information and trigger rules for each branch company database according to the basic information of each branch company database;

step 2: selecting a database driver according to database configuration information of each branch company, and determining a data acquisition interface based on the database driver;

and step 3: logging in a branch company database according to the trigger rule, and acquiring data by using the data acquisition interface;

and 4, step 4: and processing the collected data and summarizing the data to a head office database.

In one possible way of realisation,

before step 1, the method further comprises the following steps: acquiring basic information of each branch company database, wherein the process comprises the following steps:

obtaining key values of all branch company databases and determining the data types of the key values;

determining a preset analysis rule corresponding to the data type, and analyzing the key value by using the preset analysis rule to obtain characteristic data corresponding to the key value;

determining a port range of each branch company database service based on the characteristic data, and determining a database type of each branch company database according to a mapping relation between the port range and the database type;

and determining a characteristic extension rule of the database type, and scanning each branch company database by using the characteristic extension rule to acquire the basic information of each branch company database.

In one possible way of realisation,

in step 1, according to the basic information of each branch company database, configuring database configuration information and triggering rules for each branch company database comprises:

determining a configuration server and configuration information attributes based on the basic information of each branch database;

determining a configuration transmission starting point and a configuration transmission end point based on the configuration server;

acquiring a dynamic configuration process based on the configuration information attribute, and determining a configuration transmission process point;

establishing a configuration path based on the fixed configuration transmission starting point, the configuration transmission process point and the configuration transmission end point;

according to the configuration information attribute, acquiring database information from the configuration server, completing the transmission of the database information by using the configuration path, and configuring the database configuration information to a corresponding branch company database;

determining trigger information based on the database configuration information, and generating a trigger strategy according to the trigger information;

determining trigger resources corresponding to each trigger object in the trigger strategy, and establishing an object-resource mapping relation;

and constructing a trigger rule according to the object-resource mapping relation.

In one possible way of realisation,

in step 2, selecting a database driver according to the database configuration information of each branch company comprises:

step 201: acquiring identification information corresponding to each database driver, and matching the identification information with the database configuration information;

step 202: if the matching is successful, determining a database driver corresponding to the branch database;

step 203: otherwise, customizing a driver based on the database configuration information, and establishing a database driver based on the driver.

In one possible way of realisation,

acquiring identification information corresponding to each database driver, and matching the identification information with the database configuration information comprises:

determining the configuration layout of a branch company database based on the database configuration information, and performing hierarchical division on the configuration layout according to configuration attributes to obtain a plurality of hierarchical layouts;

acquiring the nodes of the plurality of hierarchical layouts, and judging whether each hierarchical layout is matched with each node in the rest hierarchical layouts;

if yes, not performing any operation on the hierarchical layout;

otherwise, if the number of the nodes in the current hierarchical layout is less than that of any one of the nodes in the remaining hierarchical layout, determining excessive nodes in the remaining hierarchical layout, and performing node supplementation on the current hierarchical layout according to the positions of the excessive nodes and the attributes of the current hierarchical layout;

acquiring node information in each hierarchical layout, and determining a first constraint relation between nodes according to the node information;

determining a second constraint relation between corresponding nodes in the parallel hierarchical layout according to the node information;

setting a two-dimensional identification set for the nodes in the multiple hierarchical layouts according to the first constraint relation and the second constraint relation;

the two-dimensional identification comprises a first identification set and a second identification set, the first identification set reflects the relationship between nodes in one hierarchical layout, and the second identification set reflects the relationship between corresponding nodes in a plurality of hierarchical layouts;

analyzing the identification information corresponding to each database drive to obtain sub-identification information corresponding to each sub-drive in the database drive;

matching each identifier in the second identifier set with identifier information corresponding to each database driver, acquiring a first matching degree, and judging whether the first matching degree is smaller than a first preset matching degree;

if yes, judging that all database drivers do not meet the database requirements of the branch companies;

otherwise, further matching detection is carried out on the database driver meeting the requirements;

matching each identifier in the first identifier set with sub-identifier information in a database driver meeting requirements to obtain a second matching degree, and judging whether the second matching degree is smaller than a second preset matching degree;

if yes, judging that the database driver meeting the requirements does not conform to the database driver requirements of the branch company;

otherwise, determining the branch database to determine the corresponding database driver.

In one possible way of realisation,

in step 3, determining a data acquisition interface based on the database driver includes:

extracting a driving program related to the database driving and data acquisition, and determining the number and the type of data acquisition interfaces based on the driving program;

setting interface format parameters for the data acquisition interface based on the type of the data acquisition interface;

based on the number and the type of the data acquisition interfaces, sequencing the data acquisition interfaces to obtain an interface arrangement sequence;

establishing an interface set for the data acquisition interface with the interface parameters according to the interface arrangement sequence;

testing each data acquisition interface in the interface set based on an operation testing tool, and judging whether each data acquisition interface can normally operate or not;

if yes, saving the interface set;

otherwise, determining an error point of the data acquisition interface based on the test result, and correcting the error point according to a preset correction scheme to obtain a final interface set.

In one possible way of realisation,

in step 3, according to the trigger rule, logging in the branch database comprises:

matching an IP address of a rule analyzer based on the development environment of the branch company database, analyzing the trigger rule based on the IP address, and generating a trigger description language;

performing semantic analysis on the trigger description language to generate one or more corresponding semantic results;

when a plurality of semantic results are available, determining a trigger sub-event corresponding to each semantic result according to the object information of the trigger description language, determining the priority information of the trigger sub-event, and selecting the semantic result with the highest priority as a final semantic result;

dividing the trigger description language based on the final semantic result and generating a plurality of trigger sub-events;

acquiring a first trigger corresponding to the plurality of trigger sub-events based on a preset trigger linked list;

acquiring second triggers provided by logging in the branch database, and selecting a third trigger matched with the first trigger from the second triggers;

acquiring a historical trigger record of the third trigger, and determining the activation correlation degree and the controllability of the third trigger according to the historical trigger record and the historical trigger record;

judging whether the activation correlation degree and the controllability of the third trigger meet preset requirements or not;

if so, taking the third trigger as a target trigger;

otherwise, correcting the third trigger based on the preset requirement, and taking the corrected third trigger as a target trigger;

analyzing the target trigger to obtain an abstract syntax tree corresponding to the target trigger, and traversing the abstract syntax tree according to a preset execution sequence to obtain a trigger statement set;

setting a trigger path based on the trigger statement set, and realizing the login of the branch database according to the trigger path.

In one possible way of realisation,

in step 3, the data acquisition by using the data acquisition interface comprises:

acquiring a data acquisition instruction, analyzing the data acquisition instruction and determining a data acquisition type;

and selecting a corresponding data acquisition interface according to the data acquisition type to acquire data from the branch database.

In one possible way of realisation,

in step 4, the step of processing the collected data and summarizing the data to a head office database comprises the following steps:

step 401: receiving data collected from each branch company database, performing noise reduction processing on the data, and performing standardization processing on the data to obtain standard collected data;

step 402: classifying the standard collected data according to data types to obtain a plurality of groups of data to be stored;

step 403: and respectively storing the data to be stored in different storage units in the head company database according to groups.

In one possible way of realisation,

in step 402, classifying the standard collected data according to data types to obtain a plurality of groups of data to be stored includes:

preliminarily dividing the standard data according to data types to obtain a plurality of groups of data sets;

inputting each group of data set into a data test model, and obtaining the dividing accuracy of each group of data set according to a test result;

the data testing model samples a current data set to obtain sampling data, and tests the sampling data to obtain biased measurement variance and unbiased measurement variance of the sampling data;

calculating to obtain the accuracy of the current data set division according to the biased measurement variance and the unbiased measurement variance;

judging whether the accuracy meets a preset requirement;

if so, obtaining a plurality of groups of storage data according to the plurality of groups of data sets;

otherwise, the data set which does not meet the requirements is divided again until the accuracy requirements are met.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flowchart of a method for implementing data collection of a heterogeneous database according to an embodiment of the present invention;

FIG. 2 is a flow chart of determining database drivers in an embodiment of the present invention;

FIG. 3 is a flow chart of data processing summary according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

Example 1

The embodiment of the invention provides a method for realizing data acquisition of a heterogeneous database, which comprises the following steps of:

step 1: configuring database information and triggering rules for each branch company database according to the basic information of each branch company database;

In this embodiment, the database driver is essentially a driver for implementing the startup and various operations of the database.

In this embodiment, the trigger rule is used to wake up the branch database and provide a basis for subsequent data collection.

The beneficial effect of above-mentioned design is: the data of each branch company database is directly collected and summarized to the main company database, so that the problem that the data formats are inconsistent and great inconvenience is brought to later data summarization due to the fact that different systems are respectively used by a main company and branch companies for processing and analyzing the data is solved, one-time development is realized by configuring database information, triggering rules and database driving for each branch company database, the branch companies are adapted, the development of a branch company data collection system is avoided, the development work is reduced, the starting control of the branch company database by the main company is realized by determining the triggering rules, the data collection in the branch company database is realized by determining a data collection interface, the collected data is processed and summarized to the main company database, and the collected data is unified, the method ensures that the database of the head office obtains consistent data, and facilitates the analysis of later data.

Example 2

Based on embodiment 1, an embodiment of the present invention provides a method for implementing data acquisition of a heterogeneous database, where before step 1, the method further includes: acquiring basic information of each branch company database, wherein the process comprises the following steps:

In this embodiment, the key value of each branch database contains actual configuration information and data used when the current database is executed.

In this embodiment, the data types of the key value include a string value type, a binary value type, and a DWOPD value type, and different data types correspond to different parsing rules.

In this embodiment, the feature data corresponding to the key value includes a branch database identifier, and different identifiers correspond to different service port ranges.

In this embodiment, the database types of the affiliates include a hierarchical database, a network database and a relational database, wherein the port service range of the hierarchical database is 0-1023, the port service range of the network database is 1024-.

In this embodiment, the feature extension rule is used to set a branch database scanning method based on the branch database type.

In this embodiment, the basic information of the branch database includes IP address information, reading mode information, index information, and data storage tree structure information.

The beneficial effect of above-mentioned design is: the database type is obtained according to the key value of the branch company database, and then different scanning methods are selected according to different types to scan the branch company database to obtain the basic information of the database, so that a basis is provided for configuring the information of the database and triggering rules.

Example 3

Based on embodiment 1, an embodiment of the present invention provides a method for acquiring data of a heterogeneous database, where, in step 1, configuring database information and trigger rules for each branch company database according to basic information of each branch company database includes:

according to the configuration information attribute, acquiring database information from the configuration server, completing the transmission of the database information by using the configuration path, and configuring the database information to a corresponding branch company database;

determining trigger information based on the database information, and generating a trigger strategy according to the trigger information;

In this embodiment, the configuration server refers to a server used for configuring database information for a branch company.

In this embodiment, the configuration information attribute includes a configuration route attribute, a configuration search attribute, and the like, which are used to indicate transmission and acquisition of the configuration.

In this embodiment, the trigger information includes a plurality of trigger points (trigger objects) required to open the branch database and trigger resources related to the trigger objects.

The beneficial effect of above-mentioned design is: the configuration route is determined according to the acquired dynamic configuration process, the accurate configuration information is acquired, the configuration can be efficiently and accurately completed, a base is provided for the development by configuring the database information and the trigger rule for each branch company database, one-time development is realized, and the branch companies are adapted.

Example 4

Based on embodiment 1, an embodiment of the present invention provides a method for implementing data acquisition of a heterogeneous database, where as shown in fig. 2, in step 2, selecting a database driver according to database configuration information of each branch company includes:

In this embodiment, the identification information is used to identify the database driver, one corresponding to one identification information.

The beneficial effect of above-mentioned design is: the database drive is selected for the database of each branch company, so that the normal operation of the database of each branch company is ensured, one-time development is realized, and the method is suitable for each branch company.

Example 5

Based on embodiment 4, an embodiment of the present invention provides a method for acquiring data of heterogeneous databases, where acquiring identification information corresponding to each database driver, and matching the identification information with the database configuration information includes:

if yes, not performing any operation on the hierarchical layout;

In this embodiment, the configuration layout includes a comprehensive configuration layout of the structure of the database, the number of sub-databases, the data input/output manner, and the like.

In this embodiment, the configuration attributes include a resource attribute, a driver attribute, a data type attribute, and an ip address attribute, each attribute corresponds to a hierarchical layout, and nodes in the hierarchical layout are obtained by analyzing a branch database.

In this embodiment, supplementing the nodes of the hierarchy may facilitate comparison between each hierarchy, so that the determined second constraint relationship is more accurate.

In this embodiment, the first constraint relationship is used to represent a configuration relationship inside the hierarchical layout, and the second constraint relationship is used to represent a configuration relationship between the hierarchical layouts.

In this embodiment, the database driver is substantially a driver, and is used to implement the startup and various operations of the database.

In this embodiment, the first matching degree is used to represent the matching condition of the database driver with the branch database as a whole.

In this embodiment, the second matching degree is used to represent the matching of the database driver with the branch database on each configuration attribute.

The beneficial effect of above-mentioned design is: the identification information is matched with the database configuration information, the database driver is configured for each branch company database, one-time development is realized, the database driver is adapted to each branch company, the development times are reduced, and a foundation is provided for data acquisition.

Example 6

Based on embodiment 1, an embodiment of the present invention provides a method for implementing data acquisition of a heterogeneous database, where in step 3, determining a data acquisition interface based on the database driver includes:

if yes, saving the interface set;

The beneficial effect of above-mentioned design is: the operation capability of the data acquisition interface is tested and corrected in the process of determining the data acquisition interface according to the database drive, the normal operation of the determined data acquisition interface is ensured, the data acquisition interface can be selected according to the interface set during data acquisition, the efficiency of data interface selection is improved, and a foundation is provided for data acquisition.

Example 7

Based on embodiment 1, an embodiment of the present invention provides a method for acquiring data of a heterogeneous database, where, in step 3, according to the trigger rule, logging in a branch database includes:

if so, taking the third trigger as a target trigger;

and setting a trigger link based on the trigger statement set, and realizing the login of the branch database according to the trigger link.

In this embodiment, based on the development environment of the branch database, the IP address of the rule resolver is matched, so that the matched rule resolver is more suitable for the development environment of the branch database, and the completeness and speed of resolution are ensured.

In this embodiment, the trigger description language may be, for example, a programming language.

In this embodiment, the triggering sub-events are a plurality of events that need to be triggered when logging in the branch database, and the branch database can be logged after all the sub-triggering events are triggered.

In this embodiment, the preset trigger chain table is used to represent the corresponding relationship between the trigger event and the trigger.

In this embodiment, the third flip-flop is modified based on the preset requirement, specifically, the clock precision of the flip-flop is adjusted.

In this embodiment, the trigger path provides support for logging into the branch database.

The beneficial effect of above-mentioned design is: according to the trigger rule, the branch company is logged, the development times are reduced, a proper trigger is selected according to the trigger rule, and a trigger path is selected according to the trigger, so that the stability and the speed of data logging of the branch company are ensured, and a foundation is provided for data acquisition.

Example 8

Based on embodiment 1, an embodiment of the present invention provides a method for implementing data acquisition of a heterogeneous database, where, in step 3, acquiring data by using the data acquisition interface includes:

The beneficial effect of above-mentioned design is: and the accuracy of data acquisition is ensured by selecting the corresponding data interface according to the data acquisition instruction.

Example 9

Based on embodiment 1, an embodiment of the present invention provides a method for acquiring data of a heterogeneous database, and as shown in fig. 3, in step 4, processing and summarizing acquired data to a head office database includes:

The beneficial effect of above-mentioned design is: the collected data are stored in different storage units in the head office database according to the data types of the collected data, so that the analysis of the data in the later period is facilitated.

Example 10

Based on embodiment 9, an embodiment of the present invention provides a method for implementing data acquisition of a heterogeneous database, where in step 402, classifying the standard acquisition data according to data types to obtain multiple sets of data to be stored includes:

the calculation process is as follows:

wherein the content of the first and second substances,

representing said biased measurement deviation, n representing the number of samples, G_iA biased measurement value, k, representing the ith sample data_iA biased reference value representing the ith sample data,

representing said unbiased measurement deviation, R_iRepresenting unbiased measurements, s, of the ith sample data_iAn unbiased reference value representing the ith sample data;

the calculation process is as follows:

where P denotes the accuracy of the current dataset partitioning, Z_aIndicating a value of an index, Z, of the biased measurement interval_bRepresenting an unbiased measurement interval index value;

judging whether the accuracy meets a preset requirement;

In this embodiment, the biased measurement variance is used to represent the goodness of the data test model, the better the goodness, the smaller the biased measurement variance.

In this embodiment, the unbiased measurement variance is used to represent the systematic error of the data test model, and the smaller the systematic error, the smaller the unbiased variance.

In this embodiment, the unbiased measurement value and the biased measurement value are used to represent the reliability of the sample data under the sample type, and the greater the reliability, the greater the value.

In this embodiment, the biased reference value and the unbiased reference value are used to represent reference values of biased measurement and unbiased measurement, and different sampling data correspond to different values and are related to sampling time and the like of the sampling data.

In this embodiment, the biased measurement interval index value and the unbiased measurement interval index value are used to evaluate the accuracy of the biased measurement and the unbiased measurement of the data test model, and the higher the accuracy, the larger the value.

In this embodiment, for

To say, for example, G_iThe value range of (1) is (0), and G is taken_i＝0.8，k_iThe value range of (0.8, 1.2) is taken as k_i1, then corresponding

Is approximately 0.55 for

To say, for example, R_iIs (0, 1), R is taken_i＝0.9，s_iThe value range of (1) is (0.5), s is taken_i0.5, then corresponding

Approximately 0.46; for the

By way of example, Z_a＝1.2，Z_bIf the preset required accuracy is 2, it means that the accuracy is not satisfactory, and the data should be divided again.

The beneficial effect of above-mentioned design is: by calculating the accuracy of data division according to the data test model, in the calculation process, biased measurement and unbiased measurement are combined, the accuracy of the acquisition accuracy is ensured, a basis is provided for classified storage of data, and the data can be conveniently analyzed later.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for realizing data acquisition of a heterogeneous database is characterized by comprising the following steps:

2. The method for realizing data collection of the heterogeneous database according to claim 1, wherein step 1 is preceded by further comprising: acquiring basic information of each branch company database, wherein the process comprises the following steps:

3. The method of claim 1, wherein the step 2 of selecting the database driver according to the database configuration information of each branch company comprises:

4. The method for realizing data acquisition of the heterogeneous database according to claim 1, wherein in step 3, the data acquisition by using the data acquisition interface includes:

5. The method for realizing data collection of heterogeneous databases according to claim 1, wherein in step 4, the processing and summarizing the collected data to the head office database comprises:

6. The method for realizing data collection of heterogeneous databases according to claim 1, wherein in step 3, determining a data collection interface based on the database driver comprises:

if yes, saving the interface set;

7. The method of claim 1, wherein the step 3 of registering the branch database according to the triggering rule comprises:

if so, taking the third trigger as a target trigger;

8. The method for realizing data collection of heterogeneous databases according to claim 1, wherein in step 1, configuring database configuration information and triggering rules for each branch database according to the basic information of each branch database includes:

9. The method of claim 4, wherein obtaining identification information corresponding to each database driver, and matching the identification information with the database configuration information comprises:

if yes, not performing any operation on the hierarchical layout;

10. The method of claim 9, wherein in step 402, classifying the standard collected data according to data types to obtain a plurality of sets of data to be stored comprises:

judging whether the accuracy meets a preset requirement;