CN113434486A - Heterogeneous data integration system design method based on virtual table - Google Patents

Heterogeneous data integration system design method based on virtual table Download PDF

Info

Publication number
CN113434486A
CN113434486A CN202110807659.8A CN202110807659A CN113434486A CN 113434486 A CN113434486 A CN 113434486A CN 202110807659 A CN202110807659 A CN 202110807659A CN 113434486 A CN113434486 A CN 113434486A
Authority
CN
China
Prior art keywords
virtual
real
fields
sql
tables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110807659.8A
Other languages
Chinese (zh)
Inventor
薛晓龙
汤家平
宁延硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN202110807659.8A priority Critical patent/CN113434486A/en
Publication of CN113434486A publication Critical patent/CN113434486A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations

Abstract

The invention discloses a virtual table-based heterogeneous data integration system design method, which comprises the following specific steps: firstly, a data source is connected through a database type, an IP address, a port, a user name and a password; then, the mapping relation from the virtual table, the virtual field to the real table and the real field and the reflection relation from the real table, the real field to the virtual table and the virtual field are configured; then, for a global SQL statement, decomposing the global SQL statement into a plurality of sub SQL statements according to virtual fields by SQL statement decomposition; then, converting the virtual tables and the virtual fields related to the sub SQL into the sub SQL of the real tables and the real fields by utilizing the mapping relation; then merging the sub SQL of the same table to reduce the access times to the database; and finally, executing the SQL sentence to obtain a result set, merging the result set, and converting the real table and the real field into a virtual table and a virtual field by using the reflection relation.

Description

Heterogeneous data integration system design method based on virtual table
Technical Field
The invention relates to a design method of a heterogeneous data integration system based on a virtual table, and belongs to the technical field of multi-source heterogeneous data integration.
Background
With the popularization of computer networks, the sharing of data resources has become a hot topic. Many enterprises need to integrate the DBMS, MIS and OA data to form a network information platform for management decision of the enterprises.
The new capital construction becomes the focus of attention of recent advices, and emerging technologies or concepts such as big data, artificial intelligence, industrial internet and the like enter an accelerated development and application stage. The new infrastructure provides services such as digital transformation, intelligent upgrading, fusion innovation and the like for enterprises facing high-quality development. How an enterprise utilizes a new generation technology to realize the self digitization and intelligent transformation of the enterprise is very important.
Enterprises are usually composed of different departments, each department is usually responsible for different businesses, and data information generated by each department is usually stored in a database of each department. However, for most business segments, which often do not have data processing or analysis functionality, the large amount of data generated historically cannot be effectively utilized. In addition, the enterprise decision maker often needs to grasp the information of each department in time to be able to clearly understand the development condition of the enterprise, so as to make the best decision, and the information interaction between each department often appears lag or error in the transmission, which is not beneficial for the decision maker to grasp the first-hand data of the enterprise in time.
Therefore, in the era of digital economy, the enterprise side hopes to break the information barrier between departments by using the emerging technology; data interconnection and intercommunication are realized, data of all departments are integrated rapidly, and unified management of the data is realized; value information is mined by utilizing data on an integrated platform and combining technologies such as big data and artificial intelligence.
Disclosure of Invention
The invention provides a design method of a heterogeneous data integration system based on a virtual table, which can solve the problems disclosed in the background technology.
In order to solve the technical problem of the complaint, the technical scheme adopted by the invention comprises data source access, mapping relation configuration, reflection relation configuration, global SQL statement decomposition, same-table sub SQL statement combination, sub SQL query, sub result set combination and real field to virtual field replacement.
Data source access: the data source is a heterogeneous data source to be integrated by the system and can be a relational database such as MySQL, SQLserver and the like.
And (3) mapping relation configuration: in the system, a field for facing a user is called a virtual field, and a table for facing the user is called a virtual table; the fields that actually exist in the data source are called real fields, and the tables that actually exist in the data source are called real tables. Mapping relationship configuration means that for a virtual field, a mapping relationship from the virtual field to a field in a plurality of database entities needs to be configured.
Reflection mapping relation configuration: the reflective mapping relation configuration is that the mapping relation of each virtual field to the real field needs to be recorded in the form of key value pairs.
And (3) decomposing the global SQL statement: for an SQL statement containing virtual fields and virtual tables, an SQL parser is needed to extract the virtual fields, the virtual tables and the query conditions of the query. And then converting the data into a plurality of sub SQL queries of real fields in the database according to the preset mapping relation.
Merging with the child SQL statement: considering that multiple sub-SQL may perform query processing on the same database, in order to reduce multiple accesses to the same database, it is necessary to combine the accesses to the same database into one SQL for access, so that the access pressure of the database can be reduced.
And (5) sub SQL query: and acquiring database connection, and then executing the SQL statement to obtain a query result set.
And combining the sub-result sets: since one global SQL is decomposed into a plurality of sub-SQL queries at the time of query, after the sub-SQL queries, a plurality of sub-SQL query result sets need to be merged.
Real field to virtual field replacement: the meaning represented by each field in the database is transparent to the user. The user only knows what each virtual field represents, so it is necessary to translate the real fields in the database into the virtual fields familiar to the user through reflection.
Drawings
FIG. 1 is a flow chart of the whole heterogeneous data integration system
FIG. 2 is a flow of query decomposition of global SQL statement and merging of sub-SQL statements of the same table
FIG. 3 is a sub-result set and flow diagram
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the aspects of the present invention more clearly, and should not be taken as limiting the scope of the present invention.
As shown in fig. 1, a method for designing a heterogeneous data integration system based on a virtual table includes the following steps:
step 1: and accessing a data source, wherein the data source can be a relational database such as MySQL, SQLServer and the like.
The data source access can be accessed in an automatic configuration mode or can be input in an interface manual mode. When the automatic configuration mode is adopted, the data source access information needs to be written into a data source configuration file DB.xml.
The format is as follows:
<?xml version="1.0"encoding="UTF-8"?>
<databases>
<database>
<type>mysql</type>
<ip>0.0.0.0</ip>
<port>3306</port>
<userName>xxx</userName>
<password>xxx</password>
<name>xxx</name>
<alias>xxx</alias>
</database>
</databases>
for each data source accessed, a 16 character length identifier is required to uniquely identify it. Defining: UUID ═ MD5(type + ip + port + userName + databae). toString (16) is a unique identifier for each data source.
Step 2: and configuring the mapping relation, namely configuring the mapping relation from the virtual field to the real field in the database.
When the mapping relation is configured, a plurality of virtual fields can be packaged into one virtual table, so that the same virtual field can be multiplexed in a plurality of different virtual tables without generating conflict. The mapping relation configuration can be configured in an automatic configuration mode or an interface manual mode. When the automatic configuration mode is adopted, the configuration relationship needs to be written into the mapping relationship configuration file mapper.
The format is as follows:
Figure BDA0003167187740000051
and step 3: and (3) configuring the reflection relation, and configuring the reflection relation from the real field to the virtual field on the basis of the step (2).
And 4, step 4: and data query, namely decomposing a global query SQL statement containing a virtual field and a virtual table into a plurality of sub SQL sentences for query through a configured mapping relation, and then returning a query result to a user. For the data query step, the data query can be refined into global SQL statement analysis, global SQL statement decomposition, merging with table sub SQL statements, sub SQL query, sub result set merging and real field to virtual field replacement.
As shown in fig. 2, the figure shows the general processing steps from one global SQL statement to a plurality of sub-SQL statements.
And 5: and (3) analyzing the global SQL statement, and extracting a query table, a query field and a query condition contained in the SQL statement by using an open source tool sqlparser analyzer.
Step 6: and (3) decomposing the global SQL statement, obtaining the query fields contained in the SQL by analyzing the global SQL statement, then grouping each query field, and reversely restoring each query field into a sub SQL query statement by using a character string splicing method. And then, taking the query table as a key, and inquiring the mapping relation configured in the step 2 to obtain the mapping relation from the virtual field to the real field related in the query table. And then converting the virtual fields and the virtual tables in the sub SQL statements into real fields and real tables in the data source through the mapping relation. At the same time, the uuid involved in each sub-SQL query needs to be saved, which is a unique ID that identifies the different tables in the data source.
And 7: and combining with the table sub SQL statement, in order to reduce the access pressure of the system to the database, multiple accesses to the same table in the same database can be combined into one access. When merging, whether the tables are the same table in the same database can be judged through the uuid carried by each sub SQL. If yes, the multiple sub SQL queries of the same table of the same database are merged into one sub SQL query with more query fields. This may reduce the number of accesses to the database, thereby reducing the concurrent pressure on the database.
And 8: and (3) sub SQL query, for each sub SQL, obtaining database Connection pool (DataSource) information through the uuid carried by the sub SQL, then obtaining a database Connection (Connection), and then executing an SQL statement to obtain a result set. For subsequent processing, the primary key of each Record (Record) is also queried and stored in the result set by default at the time of each sub-SQL query.
The general steps of the merging of multiple result sets are shown in fig. 3.
And step 9: the result set is merged, and since the result set contains the primary key of each record in step 7, it is only necessary to determine whether to add the result set to the existing result set or to add a result set again according to whether the primary keys of each record are equal during merging.
Step 10: real field to virtual field replacement, for each real field, it is necessary to replace each real field with a virtual field using an inverse mapping relationship.

Claims (6)

1. A method for designing a heterogeneous data integration system based on a virtual table is characterized by comprising the following steps: accessing a data source according to a database type, an IP, a PORT, a user name and a password; the mapping relation from the virtual field and the virtual table to the real field and the real table is configured; reflecting the relationship from a real field, a real table to a virtual field and a virtual table by configuration; analyzing the global SQL statement by adopting an SQLParse analyzer; decomposing the global SQL statement into a plurality of sub SQL statements by utilizing the mapping relation; merging a plurality of sub SQL sentences of the same database and the same table; and replacing the real tables and the real fields in the result set with the virtual tables and the virtual fields by utilizing the reflection relation.
2. The method of claim 1, wherein mapping relationships between virtual fields and virtual tables to real fields and real tables are configured.
3. The method of claim 1, wherein the real fields, real tables to virtual fields, and virtual tables are configured in a reflective relationship.
4. The method according to claim 1, wherein a configured mapping relationship is used to perform query decomposition on a global SQL statement containing virtual tables and virtual fields to obtain a plurality of sub SQL containing real tables and real fields.
5. The method of claim 1, wherein the sub-SQL of the same table in the same database is merged to reduce the access pressure of the database.
6. The method as claimed in claim 1, wherein the real tables and real fields in the result set are replaced by virtual tables and virtual fields by reflection relations.
CN202110807659.8A 2021-07-16 2021-07-16 Heterogeneous data integration system design method based on virtual table Pending CN113434486A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110807659.8A CN113434486A (en) 2021-07-16 2021-07-16 Heterogeneous data integration system design method based on virtual table

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110807659.8A CN113434486A (en) 2021-07-16 2021-07-16 Heterogeneous data integration system design method based on virtual table

Publications (1)

Publication Number Publication Date
CN113434486A true CN113434486A (en) 2021-09-24

Family

ID=77760656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110807659.8A Pending CN113434486A (en) 2021-07-16 2021-07-16 Heterogeneous data integration system design method based on virtual table

Country Status (1)

Country Link
CN (1) CN113434486A (en)

Similar Documents

Publication Publication Date Title
Corcho et al. Towards a new generation of ontology based data access
US7502807B2 (en) Defining and extracting a flat list of search properties from a rich structured type
CN110837492B (en) Method for providing data service by multi-source data unified SQL
US7526503B2 (en) Interactive schema translation with instance-level mapping
CN106934062A (en) A kind of realization method and system of inquiry elasticsearch
CN110688397B (en) SQL-based distributed data unified access system and method
CN102708203A (en) Database dynamic management method based on XML metadata
CN113468149B (en) Data model development platform
CN113157723B (en) SQL access method for Hyperridge Fabric
KR100417569B1 (en) Search method of distributed/heterogeneous gis databases using metadata interchange standard
CN107526790A (en) A kind of implementation based on the database language technology for realizing data unified standard
CN113434486A (en) Heterogeneous data integration system design method based on virtual table
CN112783836A (en) Information exchange method, device and computer storage medium
Athanasopoulos et al. Service substitution revisited
CN115495440A (en) Data migration method, device and equipment of heterogeneous database and storage medium
US20200356570A1 (en) Interactive user interface having transform operators for schema transformation
Namdeo et al. A Middleware Model for SQL to NoSQL Query Translation
US10956419B2 (en) Enhanced search functions against custom indexes
Stoimenov et al. Development of GIS interoperability infrastructure in local community environment
Ghawi Ontology-based cooperation of information systems: contributions to database-to-ontology mapping and XML-to-ontology mapping
CN111309708B (en) Database read-only user data query method and device
US11755620B1 (en) Invoking supported non-relational database operations to provide results consistent with commands specified in a relational query language
Shekhovtsov et al. Implementing tool support for effective stakeholder communication in software development–a project report
CN113254419A (en) Internet of things cloud platform management system and method based on big data micro-service
KR20230030281A (en) Apparatus and method for converting of common information model based on ontology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication