CN113821554A - Method for realizing data acquisition of heterogeneous database - Google Patents

Method for realizing data acquisition of heterogeneous database Download PDF

Info

Publication number
CN113821554A
CN113821554A CN202110941795.6A CN202110941795A CN113821554A CN 113821554 A CN113821554 A CN 113821554A CN 202110941795 A CN202110941795 A CN 202110941795A CN 113821554 A CN113821554 A CN 113821554A
Authority
CN
China
Prior art keywords
database
data
trigger
determining
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110941795.6A
Other languages
Chinese (zh)
Other versions
CN113821554B (en
Inventor
和雄伟
师丹华
杨光华
魏专利
梁晓霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan Great Times Technology Co ltd
Original Assignee
Taiyuan Great Times Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan Great Times Technology Co ltd filed Critical Taiyuan Great Times Technology Co ltd
Priority to CN202110941795.6A priority Critical patent/CN113821554B/en
Publication of CN113821554A publication Critical patent/CN113821554A/en
Application granted granted Critical
Publication of CN113821554B publication Critical patent/CN113821554B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for realizing data acquisition of a heterogeneous database, which comprises the following steps: configuring database configuration information and trigger rules for each branch company database according to the basic information of each branch company database; selecting a database driver according to database configuration information of each branch company, and determining a data acquisition interface based on the database driver; logging in a branch company database according to the trigger rule, and acquiring data by using the data acquisition interface; processing the acquired data and summarizing the data to a head office database; according to the invention, the data of each branch company database is directly collected and summarized to the main company database, so that the problem that the data format is inconsistent and the great inconvenience is brought to later data summarization is avoided, and one-time development and adaptation to each branch company are realized by configuring database information, triggering rules and database driving for each branch company database.

Description

Method for realizing data acquisition of heterogeneous database
Technical Field
The invention relates to the technical field of data acquisition, in particular to a method for realizing data acquisition of a heterogeneous database.
Background
Data has become a major priority for businesses of various sizes. As the technology for collecting and analyzing data has proliferated, the ability of enterprises to place data in context and obtain new insights therefrom has also increased. In order to predict the behavior path of the consumer more accurately, enterprises need to do information collection work every day, and store and analyze the data. Without data support, the marketing of the enterprise may be blind, and the brand's intent of desiring products and services to cover the target audience or character is most likely to fall short. However, many companies do not have such requirements in the early stage of the project, or the consideration is incomplete, so that the main company and the branch company respectively use different systems, the databases and the tables are different, and great inconvenience is brought to later data summarization.
And a general data acquisition system is developed for each branch company respectively, and then data are transmitted to a general company, so that the development work is complicated, the later-stage manual maintenance is needed, and the time and the labor are consumed.
Disclosure of Invention
The invention provides a method for realizing data acquisition of a heterogeneous database, which directly acquires and summarizes data of each branch company database to a main company database to avoid causing inconsistency of data formats and bring great inconvenience to later-stage data summarization.
The invention provides a method for realizing data acquisition of a heterogeneous database, which comprises the following steps:
step 1: configuring database configuration information and trigger rules for each branch company database according to the basic information of each branch company database;
step 2: selecting a database driver according to database configuration information of each branch company, and determining a data acquisition interface based on the database driver;
and step 3: logging in a branch company database according to the trigger rule, and acquiring data by using the data acquisition interface;
and 4, step 4: and processing the collected data and summarizing the data to a head office database.
In one possible way of realisation,
before step 1, the method further comprises the following steps: acquiring basic information of each branch company database, wherein the process comprises the following steps:
obtaining key values of all branch company databases and determining the data types of the key values;
determining a preset analysis rule corresponding to the data type, and analyzing the key value by using the preset analysis rule to obtain characteristic data corresponding to the key value;
determining a port range of each branch company database service based on the characteristic data, and determining a database type of each branch company database according to a mapping relation between the port range and the database type;
and determining a characteristic extension rule of the database type, and scanning each branch company database by using the characteristic extension rule to acquire the basic information of each branch company database.
In one possible way of realisation,
in step 1, according to the basic information of each branch company database, configuring database configuration information and triggering rules for each branch company database comprises:
determining a configuration server and configuration information attributes based on the basic information of each branch database;
determining a configuration transmission starting point and a configuration transmission end point based on the configuration server;
acquiring a dynamic configuration process based on the configuration information attribute, and determining a configuration transmission process point;
establishing a configuration path based on the fixed configuration transmission starting point, the configuration transmission process point and the configuration transmission end point;
according to the configuration information attribute, acquiring database information from the configuration server, completing the transmission of the database information by using the configuration path, and configuring the database configuration information to a corresponding branch company database;
determining trigger information based on the database configuration information, and generating a trigger strategy according to the trigger information;
determining trigger resources corresponding to each trigger object in the trigger strategy, and establishing an object-resource mapping relation;
and constructing a trigger rule according to the object-resource mapping relation.
In one possible way of realisation,
in step 2, selecting a database driver according to the database configuration information of each branch company comprises:
step 201: acquiring identification information corresponding to each database driver, and matching the identification information with the database configuration information;
step 202: if the matching is successful, determining a database driver corresponding to the branch database;
step 203: otherwise, customizing a driver based on the database configuration information, and establishing a database driver based on the driver.
In one possible way of realisation,
acquiring identification information corresponding to each database driver, and matching the identification information with the database configuration information comprises:
determining the configuration layout of a branch company database based on the database configuration information, and performing hierarchical division on the configuration layout according to configuration attributes to obtain a plurality of hierarchical layouts;
acquiring the nodes of the plurality of hierarchical layouts, and judging whether each hierarchical layout is matched with each node in the rest hierarchical layouts;
if yes, not performing any operation on the hierarchical layout;
otherwise, if the number of the nodes in the current hierarchical layout is less than that of any one of the nodes in the remaining hierarchical layout, determining excessive nodes in the remaining hierarchical layout, and performing node supplementation on the current hierarchical layout according to the positions of the excessive nodes and the attributes of the current hierarchical layout;
acquiring node information in each hierarchical layout, and determining a first constraint relation between nodes according to the node information;
determining a second constraint relation between corresponding nodes in the parallel hierarchical layout according to the node information;
setting a two-dimensional identification set for the nodes in the multiple hierarchical layouts according to the first constraint relation and the second constraint relation;
the two-dimensional identification comprises a first identification set and a second identification set, the first identification set reflects the relationship between nodes in one hierarchical layout, and the second identification set reflects the relationship between corresponding nodes in a plurality of hierarchical layouts;
analyzing the identification information corresponding to each database drive to obtain sub-identification information corresponding to each sub-drive in the database drive;
matching each identifier in the second identifier set with identifier information corresponding to each database driver, acquiring a first matching degree, and judging whether the first matching degree is smaller than a first preset matching degree;
if yes, judging that all database drivers do not meet the database requirements of the branch companies;
otherwise, further matching detection is carried out on the database driver meeting the requirements;
matching each identifier in the first identifier set with sub-identifier information in a database driver meeting requirements to obtain a second matching degree, and judging whether the second matching degree is smaller than a second preset matching degree;
if yes, judging that the database driver meeting the requirements does not conform to the database driver requirements of the branch company;
otherwise, determining the branch database to determine the corresponding database driver.
In one possible way of realisation,
in step 3, determining a data acquisition interface based on the database driver includes:
extracting a driving program related to the database driving and data acquisition, and determining the number and the type of data acquisition interfaces based on the driving program;
setting interface format parameters for the data acquisition interface based on the type of the data acquisition interface;
based on the number and the type of the data acquisition interfaces, sequencing the data acquisition interfaces to obtain an interface arrangement sequence;
establishing an interface set for the data acquisition interface with the interface parameters according to the interface arrangement sequence;
testing each data acquisition interface in the interface set based on an operation testing tool, and judging whether each data acquisition interface can normally operate or not;
if yes, saving the interface set;
otherwise, determining an error point of the data acquisition interface based on the test result, and correcting the error point according to a preset correction scheme to obtain a final interface set.
In one possible way of realisation,
in step 3, according to the trigger rule, logging in the branch database comprises:
matching an IP address of a rule analyzer based on the development environment of the branch company database, analyzing the trigger rule based on the IP address, and generating a trigger description language;
performing semantic analysis on the trigger description language to generate one or more corresponding semantic results;
when a plurality of semantic results are available, determining a trigger sub-event corresponding to each semantic result according to the object information of the trigger description language, determining the priority information of the trigger sub-event, and selecting the semantic result with the highest priority as a final semantic result;
dividing the trigger description language based on the final semantic result and generating a plurality of trigger sub-events;
acquiring a first trigger corresponding to the plurality of trigger sub-events based on a preset trigger linked list;
acquiring second triggers provided by logging in the branch database, and selecting a third trigger matched with the first trigger from the second triggers;
acquiring a historical trigger record of the third trigger, and determining the activation correlation degree and the controllability of the third trigger according to the historical trigger record and the historical trigger record;
judging whether the activation correlation degree and the controllability of the third trigger meet preset requirements or not;
if so, taking the third trigger as a target trigger;
otherwise, correcting the third trigger based on the preset requirement, and taking the corrected third trigger as a target trigger;
analyzing the target trigger to obtain an abstract syntax tree corresponding to the target trigger, and traversing the abstract syntax tree according to a preset execution sequence to obtain a trigger statement set;
setting a trigger path based on the trigger statement set, and realizing the login of the branch database according to the trigger path.
In one possible way of realisation,
in step 3, the data acquisition by using the data acquisition interface comprises:
acquiring a data acquisition instruction, analyzing the data acquisition instruction and determining a data acquisition type;
and selecting a corresponding data acquisition interface according to the data acquisition type to acquire data from the branch database.
In one possible way of realisation,
in step 4, the step of processing the collected data and summarizing the data to a head office database comprises the following steps:
step 401: receiving data collected from each branch company database, performing noise reduction processing on the data, and performing standardization processing on the data to obtain standard collected data;
step 402: classifying the standard collected data according to data types to obtain a plurality of groups of data to be stored;
step 403: and respectively storing the data to be stored in different storage units in the head company database according to groups.
In one possible way of realisation,
in step 402, classifying the standard collected data according to data types to obtain a plurality of groups of data to be stored includes:
preliminarily dividing the standard data according to data types to obtain a plurality of groups of data sets;
inputting each group of data set into a data test model, and obtaining the dividing accuracy of each group of data set according to a test result;
the data testing model samples a current data set to obtain sampling data, and tests the sampling data to obtain biased measurement variance and unbiased measurement variance of the sampling data;
calculating to obtain the accuracy of the current data set division according to the biased measurement variance and the unbiased measurement variance;
judging whether the accuracy meets a preset requirement;
if so, obtaining a plurality of groups of storage data according to the plurality of groups of data sets;
otherwise, the data set which does not meet the requirements is divided again until the accuracy requirements are met.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart of a method for implementing data collection of a heterogeneous database according to an embodiment of the present invention;
FIG. 2 is a flow chart of determining database drivers in an embodiment of the present invention;
FIG. 3 is a flow chart of data processing summary according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Example 1
The embodiment of the invention provides a method for realizing data acquisition of a heterogeneous database, which comprises the following steps of:
step 1: configuring database information and triggering rules for each branch company database according to the basic information of each branch company database;
step 2: selecting a database driver according to database configuration information of each branch company, and determining a data acquisition interface based on the database driver;
and step 3: logging in a branch company database according to the trigger rule, and acquiring data by using the data acquisition interface;
and 4, step 4: and processing the collected data and summarizing the data to a head office database.
In this embodiment, the database driver is essentially a driver for implementing the startup and various operations of the database.
In this embodiment, the trigger rule is used to wake up the branch database and provide a basis for subsequent data collection.
The beneficial effect of above-mentioned design is: the data of each branch company database is directly collected and summarized to the main company database, so that the problem that the data formats are inconsistent and great inconvenience is brought to later data summarization due to the fact that different systems are respectively used by a main company and branch companies for processing and analyzing the data is solved, one-time development is realized by configuring database information, triggering rules and database driving for each branch company database, the branch companies are adapted, the development of a branch company data collection system is avoided, the development work is reduced, the starting control of the branch company database by the main company is realized by determining the triggering rules, the data collection in the branch company database is realized by determining a data collection interface, the collected data is processed and summarized to the main company database, and the collected data is unified, the method ensures that the database of the head office obtains consistent data, and facilitates the analysis of later data.
Example 2
Based on embodiment 1, an embodiment of the present invention provides a method for implementing data acquisition of a heterogeneous database, where before step 1, the method further includes: acquiring basic information of each branch company database, wherein the process comprises the following steps:
obtaining key values of all branch company databases and determining the data types of the key values;
determining a preset analysis rule corresponding to the data type, and analyzing the key value by using the preset analysis rule to obtain characteristic data corresponding to the key value;
determining a port range of each branch company database service based on the characteristic data, and determining a database type of each branch company database according to a mapping relation between the port range and the database type;
and determining a characteristic extension rule of the database type, and scanning each branch company database by using the characteristic extension rule to acquire the basic information of each branch company database.
In this embodiment, the key value of each branch database contains actual configuration information and data used when the current database is executed.
In this embodiment, the data types of the key value include a string value type, a binary value type, and a DWOPD value type, and different data types correspond to different parsing rules.
In this embodiment, the feature data corresponding to the key value includes a branch database identifier, and different identifiers correspond to different service port ranges.
In this embodiment, the database types of the affiliates include a hierarchical database, a network database and a relational database, wherein the port service range of the hierarchical database is 0-1023, the port service range of the network database is 1024-.
In this embodiment, the feature extension rule is used to set a branch database scanning method based on the branch database type.
In this embodiment, the basic information of the branch database includes IP address information, reading mode information, index information, and data storage tree structure information.
The beneficial effect of above-mentioned design is: the database type is obtained according to the key value of the branch company database, and then different scanning methods are selected according to different types to scan the branch company database to obtain the basic information of the database, so that a basis is provided for configuring the information of the database and triggering rules.
Example 3
Based on embodiment 1, an embodiment of the present invention provides a method for acquiring data of a heterogeneous database, where, in step 1, configuring database information and trigger rules for each branch company database according to basic information of each branch company database includes:
determining a configuration server and configuration information attributes based on the basic information of each branch database;
determining a configuration transmission starting point and a configuration transmission end point based on the configuration server;
acquiring a dynamic configuration process based on the configuration information attribute, and determining a configuration transmission process point;
establishing a configuration path based on the fixed configuration transmission starting point, the configuration transmission process point and the configuration transmission end point;
according to the configuration information attribute, acquiring database information from the configuration server, completing the transmission of the database information by using the configuration path, and configuring the database information to a corresponding branch company database;
determining trigger information based on the database information, and generating a trigger strategy according to the trigger information;
determining trigger resources corresponding to each trigger object in the trigger strategy, and establishing an object-resource mapping relation;
and constructing a trigger rule according to the object-resource mapping relation.
In this embodiment, the configuration server refers to a server used for configuring database information for a branch company.
In this embodiment, the configuration information attribute includes a configuration route attribute, a configuration search attribute, and the like, which are used to indicate transmission and acquisition of the configuration.
In this embodiment, the trigger information includes a plurality of trigger points (trigger objects) required to open the branch database and trigger resources related to the trigger objects.
The beneficial effect of above-mentioned design is: the configuration route is determined according to the acquired dynamic configuration process, the accurate configuration information is acquired, the configuration can be efficiently and accurately completed, a base is provided for the development by configuring the database information and the trigger rule for each branch company database, one-time development is realized, and the branch companies are adapted.
Example 4
Based on embodiment 1, an embodiment of the present invention provides a method for implementing data acquisition of a heterogeneous database, where as shown in fig. 2, in step 2, selecting a database driver according to database configuration information of each branch company includes:
step 201: acquiring identification information corresponding to each database driver, and matching the identification information with the database configuration information;
step 202: if the matching is successful, determining a database driver corresponding to the branch database;
step 203: otherwise, customizing a driver based on the database configuration information, and establishing a database driver based on the driver.
In this embodiment, the identification information is used to identify the database driver, one corresponding to one identification information.
The beneficial effect of above-mentioned design is: the database drive is selected for the database of each branch company, so that the normal operation of the database of each branch company is ensured, one-time development is realized, and the method is suitable for each branch company.
Example 5
Based on embodiment 4, an embodiment of the present invention provides a method for acquiring data of heterogeneous databases, where acquiring identification information corresponding to each database driver, and matching the identification information with the database configuration information includes:
determining the configuration layout of a branch company database based on the database configuration information, and performing hierarchical division on the configuration layout according to configuration attributes to obtain a plurality of hierarchical layouts;
acquiring the nodes of the plurality of hierarchical layouts, and judging whether each hierarchical layout is matched with each node in the rest hierarchical layouts;
if yes, not performing any operation on the hierarchical layout;
otherwise, if the number of the nodes in the current hierarchical layout is less than that of any one of the nodes in the remaining hierarchical layout, determining excessive nodes in the remaining hierarchical layout, and performing node supplementation on the current hierarchical layout according to the positions of the excessive nodes and the attributes of the current hierarchical layout;
acquiring node information in each hierarchical layout, and determining a first constraint relation between nodes according to the node information;
determining a second constraint relation between corresponding nodes in the parallel hierarchical layout according to the node information;
setting a two-dimensional identification set for the nodes in the multiple hierarchical layouts according to the first constraint relation and the second constraint relation;
the two-dimensional identification comprises a first identification set and a second identification set, the first identification set reflects the relationship between nodes in one hierarchical layout, and the second identification set reflects the relationship between corresponding nodes in a plurality of hierarchical layouts;
analyzing the identification information corresponding to each database drive to obtain sub-identification information corresponding to each sub-drive in the database drive;
matching each identifier in the second identifier set with identifier information corresponding to each database driver, acquiring a first matching degree, and judging whether the first matching degree is smaller than a first preset matching degree;
if yes, judging that all database drivers do not meet the database requirements of the branch companies;
otherwise, further matching detection is carried out on the database driver meeting the requirements;
matching each identifier in the first identifier set with sub-identifier information in a database driver meeting requirements to obtain a second matching degree, and judging whether the second matching degree is smaller than a second preset matching degree;
if yes, judging that the database driver meeting the requirements does not conform to the database driver requirements of the branch company;
otherwise, determining the branch database to determine the corresponding database driver.
In this embodiment, the configuration layout includes a comprehensive configuration layout of the structure of the database, the number of sub-databases, the data input/output manner, and the like.
In this embodiment, the configuration attributes include a resource attribute, a driver attribute, a data type attribute, and an ip address attribute, each attribute corresponds to a hierarchical layout, and nodes in the hierarchical layout are obtained by analyzing a branch database.
In this embodiment, supplementing the nodes of the hierarchy may facilitate comparison between each hierarchy, so that the determined second constraint relationship is more accurate.
In this embodiment, the first constraint relationship is used to represent a configuration relationship inside the hierarchical layout, and the second constraint relationship is used to represent a configuration relationship between the hierarchical layouts.
In this embodiment, the database driver is substantially a driver, and is used to implement the startup and various operations of the database.
In this embodiment, the first matching degree is used to represent the matching condition of the database driver with the branch database as a whole.
In this embodiment, the second matching degree is used to represent the matching of the database driver with the branch database on each configuration attribute.
The beneficial effect of above-mentioned design is: the identification information is matched with the database configuration information, the database driver is configured for each branch company database, one-time development is realized, the database driver is adapted to each branch company, the development times are reduced, and a foundation is provided for data acquisition.
Example 6
Based on embodiment 1, an embodiment of the present invention provides a method for implementing data acquisition of a heterogeneous database, where in step 3, determining a data acquisition interface based on the database driver includes:
extracting a driving program related to the database driving and data acquisition, and determining the number and the type of data acquisition interfaces based on the driving program;
setting interface format parameters for the data acquisition interface based on the type of the data acquisition interface;
based on the number and the type of the data acquisition interfaces, sequencing the data acquisition interfaces to obtain an interface arrangement sequence;
establishing an interface set for the data acquisition interface with the interface parameters according to the interface arrangement sequence;
testing each data acquisition interface in the interface set based on an operation testing tool, and judging whether each data acquisition interface can normally operate or not;
if yes, saving the interface set;
otherwise, determining an error point of the data acquisition interface based on the test result, and correcting the error point according to a preset correction scheme to obtain a final interface set.
The beneficial effect of above-mentioned design is: the operation capability of the data acquisition interface is tested and corrected in the process of determining the data acquisition interface according to the database drive, the normal operation of the determined data acquisition interface is ensured, the data acquisition interface can be selected according to the interface set during data acquisition, the efficiency of data interface selection is improved, and a foundation is provided for data acquisition.
Example 7
Based on embodiment 1, an embodiment of the present invention provides a method for acquiring data of a heterogeneous database, where, in step 3, according to the trigger rule, logging in a branch database includes:
matching an IP address of a rule analyzer based on the development environment of the branch company database, analyzing the trigger rule based on the IP address, and generating a trigger description language;
performing semantic analysis on the trigger description language to generate one or more corresponding semantic results;
when a plurality of semantic results are available, determining a trigger sub-event corresponding to each semantic result according to the object information of the trigger description language, determining the priority information of the trigger sub-event, and selecting the semantic result with the highest priority as a final semantic result;
dividing the trigger description language based on the final semantic result and generating a plurality of trigger sub-events;
acquiring a first trigger corresponding to the plurality of trigger sub-events based on a preset trigger linked list;
acquiring second triggers provided by logging in the branch database, and selecting a third trigger matched with the first trigger from the second triggers;
acquiring a historical trigger record of the third trigger, and determining the activation correlation degree and the controllability of the third trigger according to the historical trigger record and the historical trigger record;
judging whether the activation correlation degree and the controllability of the third trigger meet preset requirements or not;
if so, taking the third trigger as a target trigger;
otherwise, correcting the third trigger based on the preset requirement, and taking the corrected third trigger as a target trigger;
analyzing the target trigger to obtain an abstract syntax tree corresponding to the target trigger, and traversing the abstract syntax tree according to a preset execution sequence to obtain a trigger statement set;
and setting a trigger link based on the trigger statement set, and realizing the login of the branch database according to the trigger link.
In this embodiment, based on the development environment of the branch database, the IP address of the rule resolver is matched, so that the matched rule resolver is more suitable for the development environment of the branch database, and the completeness and speed of resolution are ensured.
In this embodiment, the trigger description language may be, for example, a programming language.
In this embodiment, the triggering sub-events are a plurality of events that need to be triggered when logging in the branch database, and the branch database can be logged after all the sub-triggering events are triggered.
In this embodiment, the preset trigger chain table is used to represent the corresponding relationship between the trigger event and the trigger.
In this embodiment, the third flip-flop is modified based on the preset requirement, specifically, the clock precision of the flip-flop is adjusted.
In this embodiment, the trigger path provides support for logging into the branch database.
The beneficial effect of above-mentioned design is: according to the trigger rule, the branch company is logged, the development times are reduced, a proper trigger is selected according to the trigger rule, and a trigger path is selected according to the trigger, so that the stability and the speed of data logging of the branch company are ensured, and a foundation is provided for data acquisition.
Example 8
Based on embodiment 1, an embodiment of the present invention provides a method for implementing data acquisition of a heterogeneous database, where, in step 3, acquiring data by using the data acquisition interface includes:
acquiring a data acquisition instruction, analyzing the data acquisition instruction and determining a data acquisition type;
and selecting a corresponding data acquisition interface according to the data acquisition type to acquire data from the branch database.
The beneficial effect of above-mentioned design is: and the accuracy of data acquisition is ensured by selecting the corresponding data interface according to the data acquisition instruction.
Example 9
Based on embodiment 1, an embodiment of the present invention provides a method for acquiring data of a heterogeneous database, and as shown in fig. 3, in step 4, processing and summarizing acquired data to a head office database includes:
step 401: receiving data collected from each branch company database, performing noise reduction processing on the data, and performing standardization processing on the data to obtain standard collected data;
step 402: classifying the standard collected data according to data types to obtain a plurality of groups of data to be stored;
step 403: and respectively storing the data to be stored in different storage units in the head company database according to groups.
The beneficial effect of above-mentioned design is: the collected data are stored in different storage units in the head office database according to the data types of the collected data, so that the analysis of the data in the later period is facilitated.
Example 10
Based on embodiment 9, an embodiment of the present invention provides a method for implementing data acquisition of a heterogeneous database, where in step 402, classifying the standard acquisition data according to data types to obtain multiple sets of data to be stored includes:
preliminarily dividing the standard data according to data types to obtain a plurality of groups of data sets;
inputting each group of data set into a data test model, and obtaining the dividing accuracy of each group of data set according to a test result;
the data testing model samples a current data set to obtain sampling data, and tests the sampling data to obtain biased measurement variance and unbiased measurement variance of the sampling data;
the calculation process is as follows:
Figure BDA0003215320310000161
Figure BDA0003215320310000162
wherein the content of the first and second substances,
Figure BDA0003215320310000163
representing said biased measurement deviation, n representing the number of samples, GiA biased measurement value, k, representing the ith sample dataiA biased reference value representing the ith sample data,
Figure BDA0003215320310000164
representing said unbiased measurement deviation, RiRepresenting unbiased measurements, s, of the ith sample dataiAn unbiased reference value representing the ith sample data;
calculating to obtain the accuracy of the current data set division according to the biased measurement variance and the unbiased measurement variance;
the calculation process is as follows:
Figure BDA0003215320310000171
where P denotes the accuracy of the current dataset partitioning, ZaIndicating a value of an index, Z, of the biased measurement intervalbRepresenting an unbiased measurement interval index value;
judging whether the accuracy meets a preset requirement;
if so, obtaining a plurality of groups of storage data according to the plurality of groups of data sets;
otherwise, the data set which does not meet the requirements is divided again until the accuracy requirements are met.
In this embodiment, the biased measurement variance is used to represent the goodness of the data test model, the better the goodness, the smaller the biased measurement variance.
In this embodiment, the unbiased measurement variance is used to represent the systematic error of the data test model, and the smaller the systematic error, the smaller the unbiased variance.
In this embodiment, the unbiased measurement value and the biased measurement value are used to represent the reliability of the sample data under the sample type, and the greater the reliability, the greater the value.
In this embodiment, the biased reference value and the unbiased reference value are used to represent reference values of biased measurement and unbiased measurement, and different sampling data correspond to different values and are related to sampling time and the like of the sampling data.
In this embodiment, the biased measurement interval index value and the unbiased measurement interval index value are used to evaluate the accuracy of the biased measurement and the unbiased measurement of the data test model, and the higher the accuracy, the larger the value.
In this embodiment, for
Figure BDA0003215320310000172
To say, for example, GiThe value range of (1) is (0), and G is takeni=0.8,kiThe value range of (0.8, 1.2) is taken as ki1, then corresponding
Figure BDA0003215320310000173
Is approximately 0.55 for
Figure BDA0003215320310000181
To say, for example, RiIs (0, 1), R is takeni=0.9,siThe value range of (1) is (0.5), s is takeni0.5, then corresponding
Figure BDA0003215320310000182
Approximately 0.46; for the
Figure BDA0003215320310000183
By way of example, Za=1.2,ZbIf the preset required accuracy is 2, it means that the accuracy is not satisfactory, and the data should be divided again.
The beneficial effect of above-mentioned design is: by calculating the accuracy of data division according to the data test model, in the calculation process, biased measurement and unbiased measurement are combined, the accuracy of the acquisition accuracy is ensured, a basis is provided for classified storage of data, and the data can be conveniently analyzed later.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method for realizing data acquisition of a heterogeneous database is characterized by comprising the following steps:
step 1: configuring database configuration information and trigger rules for each branch company database according to the basic information of each branch company database;
step 2: selecting a database driver according to database configuration information of each branch company, and determining a data acquisition interface based on the database driver;
and step 3: logging in a branch company database according to the trigger rule, and acquiring data by using the data acquisition interface;
and 4, step 4: and processing the collected data and summarizing the data to a head office database.
2. The method for realizing data collection of the heterogeneous database according to claim 1, wherein step 1 is preceded by further comprising: acquiring basic information of each branch company database, wherein the process comprises the following steps:
obtaining key values of all branch company databases and determining the data types of the key values;
determining a preset analysis rule corresponding to the data type, and analyzing the key value by using the preset analysis rule to obtain characteristic data corresponding to the key value;
determining a port range of each branch company database service based on the characteristic data, and determining a database type of each branch company database according to a mapping relation between the port range and the database type;
and determining a characteristic extension rule of the database type, and scanning each branch company database by using the characteristic extension rule to acquire the basic information of each branch company database.
3. The method of claim 1, wherein the step 2 of selecting the database driver according to the database configuration information of each branch company comprises:
step 201: acquiring identification information corresponding to each database driver, and matching the identification information with the database configuration information;
step 202: if the matching is successful, determining a database driver corresponding to the branch database;
step 203: otherwise, customizing a driver based on the database configuration information, and establishing a database driver based on the driver.
4. The method for realizing data acquisition of the heterogeneous database according to claim 1, wherein in step 3, the data acquisition by using the data acquisition interface includes:
acquiring a data acquisition instruction, analyzing the data acquisition instruction and determining a data acquisition type;
and selecting a corresponding data acquisition interface according to the data acquisition type to acquire data from the branch database.
5. The method for realizing data collection of heterogeneous databases according to claim 1, wherein in step 4, the processing and summarizing the collected data to the head office database comprises:
step 401: receiving data collected from each branch company database, performing noise reduction processing on the data, and performing standardization processing on the data to obtain standard collected data;
step 402: classifying the standard collected data according to data types to obtain a plurality of groups of data to be stored;
step 403: and respectively storing the data to be stored in different storage units in the head company database according to groups.
6. The method for realizing data collection of heterogeneous databases according to claim 1, wherein in step 3, determining a data collection interface based on the database driver comprises:
extracting a driving program related to the database driving and data acquisition, and determining the number and the type of data acquisition interfaces based on the driving program;
setting interface format parameters for the data acquisition interface based on the type of the data acquisition interface;
based on the number and the type of the data acquisition interfaces, sequencing the data acquisition interfaces to obtain an interface arrangement sequence;
establishing an interface set for the data acquisition interface with the interface parameters according to the interface arrangement sequence;
testing each data acquisition interface in the interface set based on an operation testing tool, and judging whether each data acquisition interface can normally operate or not;
if yes, saving the interface set;
otherwise, determining an error point of the data acquisition interface based on the test result, and correcting the error point according to a preset correction scheme to obtain a final interface set.
7. The method of claim 1, wherein the step 3 of registering the branch database according to the triggering rule comprises:
matching an IP address of a rule analyzer based on the development environment of the branch company database, analyzing the trigger rule based on the IP address, and generating a trigger description language;
performing semantic analysis on the trigger description language to generate one or more corresponding semantic results;
when a plurality of semantic results are available, determining a trigger sub-event corresponding to each semantic result according to the object information of the trigger description language, determining the priority information of the trigger sub-event, and selecting the semantic result with the highest priority as a final semantic result;
dividing the trigger description language based on the final semantic result and generating a plurality of trigger sub-events;
acquiring a first trigger corresponding to the plurality of trigger sub-events based on a preset trigger linked list;
acquiring second triggers provided by logging in the branch database, and selecting a third trigger matched with the first trigger from the second triggers;
acquiring a historical trigger record of the third trigger, and determining the activation correlation degree and the controllability of the third trigger according to the historical trigger record and the historical trigger record;
judging whether the activation correlation degree and the controllability of the third trigger meet preset requirements or not;
if so, taking the third trigger as a target trigger;
otherwise, correcting the third trigger based on the preset requirement, and taking the corrected third trigger as a target trigger;
analyzing the target trigger to obtain an abstract syntax tree corresponding to the target trigger, and traversing the abstract syntax tree according to a preset execution sequence to obtain a trigger statement set;
setting a trigger path based on the trigger statement set, and realizing the login of the branch database according to the trigger path.
8. The method for realizing data collection of heterogeneous databases according to claim 1, wherein in step 1, configuring database configuration information and triggering rules for each branch database according to the basic information of each branch database includes:
determining a configuration server and configuration information attributes based on the basic information of each branch database;
determining a configuration transmission starting point and a configuration transmission end point based on the configuration server;
acquiring a dynamic configuration process based on the configuration information attribute, and determining a configuration transmission process point;
establishing a configuration path based on the fixed configuration transmission starting point, the configuration transmission process point and the configuration transmission end point;
according to the configuration information attribute, acquiring database information from the configuration server, completing the transmission of the database information by using the configuration path, and configuring the database configuration information to a corresponding branch company database;
determining trigger information based on the database configuration information, and generating a trigger strategy according to the trigger information;
determining trigger resources corresponding to each trigger object in the trigger strategy, and establishing an object-resource mapping relation;
and constructing a trigger rule according to the object-resource mapping relation.
9. The method of claim 4, wherein obtaining identification information corresponding to each database driver, and matching the identification information with the database configuration information comprises:
determining the configuration layout of a branch company database based on the database configuration information, and performing hierarchical division on the configuration layout according to configuration attributes to obtain a plurality of hierarchical layouts;
acquiring the nodes of the plurality of hierarchical layouts, and judging whether each hierarchical layout is matched with each node in the rest hierarchical layouts;
if yes, not performing any operation on the hierarchical layout;
otherwise, if the number of the nodes in the current hierarchical layout is less than that of any one of the nodes in the remaining hierarchical layout, determining excessive nodes in the remaining hierarchical layout, and performing node supplementation on the current hierarchical layout according to the positions of the excessive nodes and the attributes of the current hierarchical layout;
acquiring node information in each hierarchical layout, and determining a first constraint relation between nodes according to the node information;
determining a second constraint relation between corresponding nodes in the parallel hierarchical layout according to the node information;
setting a two-dimensional identification set for the nodes in the multiple hierarchical layouts according to the first constraint relation and the second constraint relation;
the two-dimensional identification comprises a first identification set and a second identification set, the first identification set reflects the relationship between nodes in one hierarchical layout, and the second identification set reflects the relationship between corresponding nodes in a plurality of hierarchical layouts;
analyzing the identification information corresponding to each database drive to obtain sub-identification information corresponding to each sub-drive in the database drive;
matching each identifier in the second identifier set with identifier information corresponding to each database driver, acquiring a first matching degree, and judging whether the first matching degree is smaller than a first preset matching degree;
if yes, judging that all database drivers do not meet the database requirements of the branch companies;
otherwise, further matching detection is carried out on the database driver meeting the requirements;
matching each identifier in the first identifier set with sub-identifier information in a database driver meeting requirements to obtain a second matching degree, and judging whether the second matching degree is smaller than a second preset matching degree;
if yes, judging that the database driver meeting the requirements does not conform to the database driver requirements of the branch company;
otherwise, determining the branch database to determine the corresponding database driver.
10. The method of claim 9, wherein in step 402, classifying the standard collected data according to data types to obtain a plurality of sets of data to be stored comprises:
preliminarily dividing the standard data according to data types to obtain a plurality of groups of data sets;
inputting each group of data set into a data test model, and obtaining the dividing accuracy of each group of data set according to a test result;
the data testing model samples a current data set to obtain sampling data, and tests the sampling data to obtain biased measurement variance and unbiased measurement variance of the sampling data;
calculating to obtain the accuracy of the current data set division according to the biased measurement variance and the unbiased measurement variance;
judging whether the accuracy meets a preset requirement;
if so, obtaining a plurality of groups of storage data according to the plurality of groups of data sets;
otherwise, the data set which does not meet the requirements is divided again until the accuracy requirements are met.
CN202110941795.6A 2021-08-17 2021-08-17 Method for realizing heterogeneous database data acquisition Active CN113821554B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110941795.6A CN113821554B (en) 2021-08-17 2021-08-17 Method for realizing heterogeneous database data acquisition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110941795.6A CN113821554B (en) 2021-08-17 2021-08-17 Method for realizing heterogeneous database data acquisition

Publications (2)

Publication Number Publication Date
CN113821554A true CN113821554A (en) 2021-12-21
CN113821554B CN113821554B (en) 2023-10-13

Family

ID=78913183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110941795.6A Active CN113821554B (en) 2021-08-17 2021-08-17 Method for realizing heterogeneous database data acquisition

Country Status (1)

Country Link
CN (1) CN113821554B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115118487A (en) * 2022-06-24 2022-09-27 山东旗帜信息有限公司 SSH data acquisition method and system
CN115116224A (en) * 2022-06-24 2022-09-27 山东旗帜信息有限公司 Edge terminal data acquisition and transmission system and method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5721904A (en) * 1993-12-20 1998-02-24 Hitachi, Ltd. Database access system and method of controlling access management to a database access system for a plurality of heterogeneous database servers using SQL
CN101082996A (en) * 2007-07-09 2007-12-05 北京邮电大学 Work attendance management system based on mobile terminal and realizing method thereof
CN104346377A (en) * 2013-07-31 2015-02-11 克拉玛依红有软件有限责任公司 Method for integrating and exchanging data on basis of unique identification
US20160342655A1 (en) * 2015-05-20 2016-11-24 Commvault Systems, Inc. Efficient database search and reporting, such as for enterprise customers having large and/or numerous files
CN110266677A (en) * 2019-06-13 2019-09-20 广州中国科学院沈阳自动化研究所分所 A kind of edge calculations intelligent gateway and implementation method towards industry manufacture
CN112667697A (en) * 2020-12-30 2021-04-16 北京来也网络科技有限公司 Method and device for acquiring real estate information by combining RPA and AI
CN113254519A (en) * 2021-05-28 2021-08-13 北京奇岱松科技有限公司 Access method, device, equipment and storage medium of multi-source heterogeneous database

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5721904A (en) * 1993-12-20 1998-02-24 Hitachi, Ltd. Database access system and method of controlling access management to a database access system for a plurality of heterogeneous database servers using SQL
CN101082996A (en) * 2007-07-09 2007-12-05 北京邮电大学 Work attendance management system based on mobile terminal and realizing method thereof
CN104346377A (en) * 2013-07-31 2015-02-11 克拉玛依红有软件有限责任公司 Method for integrating and exchanging data on basis of unique identification
US20160342655A1 (en) * 2015-05-20 2016-11-24 Commvault Systems, Inc. Efficient database search and reporting, such as for enterprise customers having large and/or numerous files
CN110266677A (en) * 2019-06-13 2019-09-20 广州中国科学院沈阳自动化研究所分所 A kind of edge calculations intelligent gateway and implementation method towards industry manufacture
CN112667697A (en) * 2020-12-30 2021-04-16 北京来也网络科技有限公司 Method and device for acquiring real estate information by combining RPA and AI
CN113254519A (en) * 2021-05-28 2021-08-13 北京奇岱松科技有限公司 Access method, device, equipment and storage medium of multi-source heterogeneous database

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙健: "分布式信息共享平台技术及其实现", 《中国优秀博硕士学位论文全文数据库 (硕士)》, pages 139 - 146 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115118487A (en) * 2022-06-24 2022-09-27 山东旗帜信息有限公司 SSH data acquisition method and system
CN115116224A (en) * 2022-06-24 2022-09-27 山东旗帜信息有限公司 Edge terminal data acquisition and transmission system and method
CN115116224B (en) * 2022-06-24 2023-08-18 山东旗帜信息有限公司 Edge end data acquisition and transmission system and method
CN115118487B (en) * 2022-06-24 2023-08-25 山东旗帜信息有限公司 SSH data acquisition method and system

Also Published As

Publication number Publication date
CN113821554B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN106445795B (en) A kind of database SQL Efficiency testing method and device
JP5306360B2 (en) Method and system for analysis of systems for matching data records
US7313514B2 (en) Validating content of localization data files
US20110154117A1 (en) Methods and apparatus to perform log file analyses
CN110765639B (en) Electrical simulation modeling method and device and readable storage medium
CN113821554B (en) Method for realizing heterogeneous database data acquisition
CN112817865A (en) Coverage precision test method and system based on componentized distributed system
CN111782265A (en) Software resource system based on field level blood relationship and establishment method thereof
CN108710571B (en) Method and device for generating automatic test code
CN112069069A (en) Defect automatic positioning analysis method, device and readable storage medium
CN116661756B (en) Object analysis method and device based on low-code DSL
WO2024067358A1 (en) Efficiency analysis method and system for warehouse management system, and computer device
CN113157978B (en) Data label establishing method and device
CN113779261A (en) Knowledge graph quality evaluation method and device, computer equipment and storage medium
CN111177016B (en) Software test defect management method
CN116303641B (en) Laboratory report management method supporting multi-data source visual configuration
CN116955154A (en) Method and device for testing application program interface
CN110956030A (en) Method and system for comparing configuration information of remote machine of transformer substation
CN113742213A (en) Method, system, and medium for data analysis
CN114328572A (en) Data query method, device, system and medium based on SQL parser
CN110717032A (en) Method for generating and displaying multi-system method call link diagram
CN113377801A (en) Data inspection method, data inspection device, electronic equipment and computer storage medium
CN117648339B (en) Data exploration method and device, server and storage medium
CN112925856B (en) Entity relationship analysis method, entity relationship analysis device, entity relationship analysis equipment and computer storage medium
CN116136825B (en) Data detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant