CN116414902A - Quick data source access method - Google Patents

Quick data source access method Download PDF

Info

Publication number
CN116414902A
CN116414902A CN202310342953.5A CN202310342953A CN116414902A CN 116414902 A CN116414902 A CN 116414902A CN 202310342953 A CN202310342953 A CN 202310342953A CN 116414902 A CN116414902 A CN 116414902A
Authority
CN
China
Prior art keywords
data
source
database
target
synchronization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310342953.5A
Other languages
Chinese (zh)
Inventor
杨铭
戚红建
韩硕
王宇飞
李伟
刘誉杰
邓旭楠
唐鑫湄
陈璐
张明涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Bidding Branch Of China Huaneng Group Co ltd
Huaneng Information Technology Co Ltd
Original Assignee
Beijing Bidding Branch Of China Huaneng Group Co ltd
Huaneng Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Bidding Branch Of China Huaneng Group Co ltd, Huaneng Information Technology Co Ltd filed Critical Beijing Bidding Branch Of China Huaneng Group Co ltd
Priority to CN202310342953.5A priority Critical patent/CN116414902A/en
Publication of CN116414902A publication Critical patent/CN116414902A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a rapid data source access method, which relates to the technical field of data access and comprises the steps of dividing data table types according to data attributes in a source database, and exporting a data table to be accessed into a preset format file; importing stock data of a file in a preset format into a transfer database, and importing data in the transfer database into a target database; synchronously copying the incremental data in the source database into a cache database according to a preset synchronous rate, adding an adding and deleting identification and a time stamp field, analyzing and converting the incremental data in the cache database, and accessing the incremental data into a target database according to a preset access frequency; when the source database synchronizes data to the cache database, the synchronization rate is updated in real time according to the synchronization information, and when the cache database analyzes access to the target database, the access frequency is updated in real time according to the analysis access information. The synchronous rate and the access frequency are updated in real time, so that the data access speed is improved, and the safety and stability of the data are ensured.

Description

Quick data source access method
Technical Field
The present application relates to the field of data access technologies, and in particular, to a fast data source access method.
Background
The business structured data of the existing enterprise comprises: work order class data, archive class data, operation management class data, telephone traffic class data, marketing class data, common sense class data and the like. The mass of data scale is several examples in the existing data centers at home and abroad, and the integration of the data is very challenging and creative. In the process of accessing and storing data from a source system to a data warehouse, the correctness and the integrity of the data must be ensured, and meanwhile, the use requirements of all services can be met after the data enters a new environment. Furthermore, from the perspective of optimizing storage, there should be no duplication of data storage other than the need for redundant backup.
In the prior art, because of a large number of data types and a large number of data types, the data access condition is complex, and the database cannot meet various requirements due to the adoption of fixed transmission information, so that the access speed is low.
Therefore, how to increase the data access speed is a technical problem to be solved at present.
Disclosure of Invention
The invention provides a rapid data source access method which is used for solving the technical problem of slower data access speed in the prior art. The method is applied to a system comprising a source database, a transit database, a cache database and a target database, and comprises the following steps:
in a source database, dividing the data table type according to the data attribute, exporting the data table to be accessed into a preset format file, and simultaneously recording the stock data range and distinguishing the stock data and the increment data;
importing stock data of a file in a preset format into a transfer database, importing data in the transfer database into a target database, and deleting the data in the transfer database after the data are successfully imported;
synchronously copying the incremental data in the source database into a cache database according to a preset synchronous rate, adding an adding and deleting identification and a time stamp field, analyzing and converting the incremental data in the cache database, and accessing the incremental data into a target database according to a preset access frequency;
when the source database synchronizes data to the cache database, synchronization information is acquired, the synchronization rate is updated in real time according to the synchronization information, when the cache database analyzes access to the target database, analysis access information is acquired, and the access frequency is updated in real time according to the analysis access information.
In some embodiments of the present application, after the synchronization information is acquired, the method further includes:
the synchronization information comprises average synchronization time of each piece of data;
if the average synchronization time of each piece of data is larger than a first preset time threshold value, the synchronization rate is updated in real time according to the synchronization information;
if the average synchronization time of each data is not greater than the first preset time threshold, the synchronization rate is not updated.
In some embodiments of the present application, after obtaining the resolved access information, the method further includes:
the analysis access information comprises all data analysis completion time;
if the analysis completion time of all the data is greater than a second preset time threshold value, synchronously updating the access frequency according to the analysis access information;
if the total data analysis completion time is not greater than the second preset time threshold, the access frequency is not updated.
In some embodiments of the present application, updating the synchronization rate in real time according to the synchronization information includes:
the synchronous information also comprises source end and target end server performance information, adding and deleting time and source end database inflow data, wherein the source end server performance information comprises source end average CPU utilization rate and source end memory consumption, and the target end server performance information comprises target end average CPU utilization rate and target end memory consumption;
establishing a first source end correction array according to the average CPU utilization rate of the source end, the memory consumption of the source end, the adding and deleting time and the inflow data of the source end database, and obtaining a first influence value according to the first source end correction array;
establishing a first target end correction array according to the average CPU utilization rate of the target end and the memory consumption of the target end, and obtaining a second influence value according to the first target end correction array;
a synchronization process impact value is determined based on the first impact value and the second impact value, and the synchronization rate is updated based on the synchronization process impact value and the data table type.
In some embodiments of the present application, a first source correction array is established according to a source average CPU utilization, source memory consumption, addition and deletion time, and source database inflow data, and a first impact value is obtained according to the first source correction array, including:
obtaining a plurality of impact scores based on the average CPU utilization rate of the source terminal, the memory consumption of the source terminal, the adding and deleting and modifying time, the inflow data of the source terminal database and the first preset weight;
determining the position sequence of a first source end correction array based on the magnitude relation among the influence scores, and constructing the first source end correction array according to the position sequence;
and obtaining a first influence value based on the local factor corresponding to the position sequence in the first source end correction array and the first source end correction array.
In some embodiments of the present application, a first target-side correction array is established according to an average CPU utilization rate of a target side and memory consumption of the target side, and a second impact value is obtained according to the first target-side correction array, including:
obtaining corresponding influence scores of the target terminal average CPU utilization rate, the target terminal memory consumption and the second preset weight based on the target terminal average CPU utilization rate and the second preset weight;
determining the position sequence of a first target end correction array based on the magnitude relation among the influence scores, and constructing the first target end correction array according to the position sequence;
and obtaining a second influence value based on the local factor corresponding to the position sequence in the first target end correction array and the first target end correction array.
In some embodiments of the present application, updating the synchronization rate based on the synchronization process impact value and the data table type includes:
determining endpoint values of two sides of the influence of the synchronization process according to the data table type, and selecting a plurality of preset synchronization process influence values based on the endpoint values of the two sides of the influence of the synchronization process;
and determining an update coefficient based on the relation between the synchronization process influence value and a plurality of preset synchronization process influence values, and updating the preset synchronization rate based on the update coefficient.
In some embodiments of the present application, updating the access frequency in real time according to the resolved access information includes:
the analysis access information also comprises source end and target end server performance information and source end database inflow data, the source end server performance information comprises source end average CPU utilization rate and source end memory consumption, and the target end server performance information comprises target end average CPU utilization rate and target end memory consumption;
establishing a second source correction array according to the average CPU utilization rate of the source, the memory consumption of the source and the inflow data of the source database and obtaining a third influence value;
establishing a second target end correction array according to the average CPU utilization rate of the target end and the memory consumption of the target end and obtaining a fourth influence value;
and determining an analysis access process influence value based on the third influence value and the fourth influence value, and updating the access frequency based on the analysis access process influence value and the data table type.
In some embodiments of the present application, establishing a second source-side correction array according to a source-side average CPU utilization, source-side memory consumption, and source-side database inflow data and obtaining a third impact value, and establishing a second target-side correction array according to a target-side average CPU utilization and target-side memory consumption and obtaining a fourth impact value, including:
obtaining a plurality of impact scores based on the average CPU utilization rate of the source terminal, the memory consumption of the source terminal, the inflow data of the source terminal database and the third preset weight;
determining the position sequence of a second source end correction array based on the magnitude relation among the influence scores, and constructing the second source end correction array according to the position sequence;
obtaining a third influence value based on the local factor corresponding to the position sequence in the second source end correction array and the second source end correction array;
obtaining corresponding influence scores of the target terminal average CPU utilization rate, the target terminal memory consumption and the fourth preset weight based on the target terminal average CPU utilization rate and the fourth preset weight;
determining the position sequence of a second target end correction array based on the magnitude relation among the influence scores, and constructing the second target end correction array according to the position sequence;
and obtaining a fourth influence value based on the local factor corresponding to the position sequence in the second target end correction array and the second target end correction array.
In some embodiments of the present application, updating the access frequency based on resolving the access procedure impact value and the data table type includes:
determining end point values of two sides of the influence of the analytic access process according to the data table type, and selecting a plurality of preset analytic access process influence values based on the end point values of the two sides of the influence of the analytic access process;
and determining an update coefficient based on the relation between the analysis access process influence value and a plurality of preset analysis access process influence values, and updating the preset access frequency based on the update coefficient.
By applying the technical scheme, in the source database, dividing the data table types according to the data attributes, exporting the data table to be accessed into a preset format file, and simultaneously recording the stock data range and distinguishing the stock data and the incremental data; importing stock data of a file in a preset format into a transfer database, importing data in the transfer database into a target database, and deleting the data in the transfer database after the data are successfully imported; synchronously copying the incremental data in the source database into a cache database according to a preset synchronous rate, adding an adding and deleting identification and a time stamp field, analyzing and converting the incremental data in the cache database, and accessing the incremental data into a target database according to a preset access frequency; when the source database synchronizes data to the cache database, synchronization information is acquired, the synchronization rate is updated in real time according to the synchronization information, when the cache database analyzes access to the target database, analysis access information is acquired, and the access frequency is updated in real time according to the analysis access information. According to the method and the device, when incremental data are accessed, whether the access parameters need to be updated is judged according to the standard, the synchronous rate and the access frequency are updated in real time, the data access speed is improved, and the safety and stability of the data are guaranteed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 shows a flow diagram of a fast data source access method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an access of stock data according to another embodiment of the present invention;
fig. 3 shows a schematic diagram of incremental data access according to another embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The embodiment of the application provides a rapid data source access method, which is applied to a system comprising a source database, a transfer database, a cache database and a target database, as shown in fig. 1, and comprises the following steps:
in step S101, in the source database, the data table types are divided according to the data attributes, the data table to be accessed is exported as a preset format file, and the stock data range is recorded and the stock data and the incremental data are distinguished.
In this embodiment, the data attribute includes three types of data including simple growth, full update and short-term archiving, and there are three types of data tables corresponding to the data attributes.
Step S102, importing stock data of the file in the preset format into a transfer database, importing data in the transfer database into a target database, and deleting the data in the transfer database after the importing is successful.
Step S103, synchronously copying the incremental data in the source database into a cache database according to a preset synchronous rate, adding an adding and deleting identification and a time stamp field, analyzing and converting the incremental data in the cache database, and accessing the incremental data into the target database according to a preset access frequency.
In this embodiment, the synchronization rate and the access frequency are both fixed values, and the update operation is performed along with the subsequent judgment.
Step S104, when the source database synchronizes data to the cache database, synchronization information is obtained, the synchronization rate is updated in real time according to the synchronization information, when the cache database analyzes access to the target database, analysis access information is obtained, and the access frequency is updated in real time according to the analysis access information.
In this embodiment, the synchronization rate and the access frequency are updated in real time, so as to ensure rapid access of data.
In some embodiments of the present application, after the synchronization information is acquired, the method further includes:
the synchronization information comprises average synchronization time of each piece of data;
if the average synchronization time of each piece of data is larger than a first preset time threshold value, the synchronization rate is updated in real time according to the synchronization information;
if the average synchronization time of each data is not greater than the first preset time threshold, the synchronization rate is not updated.
In some embodiments of the present application, after obtaining the resolved access information, the method further includes:
the analysis access information comprises all data analysis completion time;
if the analysis completion time of all the data is greater than a second preset time threshold value, synchronously updating the access frequency according to the analysis access information;
if the total data analysis completion time is not greater than the second preset time threshold, the access frequency is not updated.
To increase the data access rate, in some embodiments of the present application, updating the synchronization rate in real time according to the synchronization information, including: the synchronous information also comprises source end and target end server performance information, adding and deleting time and source end database inflow data, wherein the source end server performance information comprises source end average CPU utilization rate and source end memory consumption, and the target end server performance information comprises target end average CPU utilization rate and target end memory consumption; establishing a first source end correction array according to the average CPU utilization rate of the source end, the memory consumption of the source end, the adding and deleting time and the inflow data of the source end database, and obtaining a first influence value according to the first source end correction array; establishing a first target end correction array according to the average CPU utilization rate of the target end and the memory consumption of the target end, and obtaining a second influence value according to the first target end correction array; a synchronization process impact value is determined based on the first impact value and the second impact value, and the synchronization rate is updated based on the synchronization process impact value and the data table type.
In this embodiment, in the process of synchronizing data from the source database to the cache database, synchronization information is obtained, where the synchronization information is some influencing factors that influence the synchronization speed. The method comprises the steps of source end and target end server performance information, adding and deleting and modifying time, source end database inflow data and the like. The method is divided into two parts, namely a source end influence and a target end influence, wherein the source end influence comprises source end server performance information, adding and deleting time and source end database inflow data. The target side impact includes target side server performance information. And establishing a first source end correction array according to the source end influence, establishing a first target end correction array according to the target end influence, so as to obtain a first influence value and a second influence value, determining a synchronization process influence value (total influence) based on the first influence value and the second influence value, and updating the synchronization rate based on the synchronization process influence value and the data table type.
In order to further improve data synchronization efficiency, in some embodiments of the present application, a first source correction array is established according to a source average CPU utilization, source memory consumption, addition and deletion time, and source database inflow data, and a first impact value is obtained according to the first source correction array, including: obtaining a plurality of impact scores based on the average CPU utilization rate of the source terminal, the memory consumption of the source terminal, the adding and deleting and modifying time, the inflow data of the source terminal database and the first preset weight; determining the position sequence of a first source end correction array based on the magnitude relation among the influence scores, and constructing the first source end correction array according to the position sequence; and obtaining a first influence value based on the local factor corresponding to the position sequence in the first source end correction array and the first source end correction array.
In this embodiment, for example, the impact scores corresponding to the average CPU utilization (percentage), the memory consumption (percentage), the adding/deleting/modifying time, and the source database inflow data are S1, S2, S3, and S4, respectively, and if the magnitude relation is sequentially reduced, S1 > S2 > S3 > S4, the position sequence is S1 first bit, S2 second bit, S3 third bit, and S4 fourth bit. The first source correction array is (S1, S2, S3, S4), the corresponding prime factors of the first source correction array are (α1, α2, α3, α4), and the first influence value=α1s1+α2s2+α3s3+α4s4.
It should be noted that the following technical features are the same, and the following description is omitted.
In some embodiments of the present application, a first target-side correction array is established according to an average CPU utilization rate of a target side and memory consumption of the target side, and a second impact value is obtained according to the first target-side correction array, including: obtaining corresponding influence scores of the target terminal average CPU utilization rate, the target terminal memory consumption and the second preset weight based on the target terminal average CPU utilization rate and the second preset weight; determining the position sequence of a first target end correction array based on the magnitude relation among the influence scores, and constructing the first target end correction array according to the position sequence; and obtaining a second influence value based on the local factor corresponding to the position sequence in the first target end correction array and the first target end correction array.
In this embodiment, the total influence value (the influence value of the synchronization process) is determined based on the first influence value and the second influence value, and the specific means is not limited herein, and is within the scope of protection of the present application as long as the integrity can be expressed.
In order to improve the reliability of synchronization, in some embodiments of the present application, updating the synchronization rate based on the synchronization process influence value and the data table type includes: determining endpoint values of two sides of the influence of the synchronization process according to the data table type, and selecting a plurality of preset synchronization process influence values based on the endpoint values of the two sides of the influence of the synchronization process; and determining an update coefficient based on the relation between the synchronization process influence value and a plurality of preset synchronization process influence values, and updating the preset synchronization rate based on the update coefficient.
In this embodiment, the three data table types are respectively corresponding to different synchronization process influence two side end point values, and the synchronization process influence two side end point values are equivalent to interval values. The method comprises the following steps:
for example, the synchronization process corresponding to the simply growing data table affects both side end points A11 and A22, A11-A22 being a section,
setting the influence value of the synchronous process as A, and presetting an array A0 (A1, A2, A3 and A4) of the influence value of the synchronous process, wherein A1, A2, A3 and A4 are all preset values, and A11 is more than A1 and less than A2 and less than A3 and less than A4 and less than A22;
setting a preset synchronous rate as V, and presetting an update coefficient array F0 (F1, F2, F3 and F4), wherein F1, F2, F3 and F4 are all preset values, and F1 is more than 0.6 and less than F2 and F3 is more than 0 and less than 1.4;
determining an update coefficient according to the relation between the influence value of the synchronization process and each preset influence value of the synchronization process to obtain an updated synchronization rate;
if A is less than A1, determining a first preset updating coefficient F1 as an updating coefficient, wherein the updated synchronization rate is V.times.F1;
if A1 is less than or equal to A2, determining a second preset updating coefficient F2 as an updating coefficient, wherein the updated synchronization rate is V x F2;
if A2 is less than or equal to A3, determining a third preset updating coefficient F3 as an updating coefficient, wherein the updated synchronization rate is V x F3;
if A3 is less than or equal to A4, determining a fourth preset updating coefficient F4 as an updating coefficient, wherein the updated synchronization rate is V x F4.
It can be appreciated that the following technical features are the same, and the following description is omitted.
In some embodiments of the present application, updating the access frequency in real time according to the resolved access information includes:
the analysis access information also comprises source end and target end server performance information and source end database inflow data, the source end server performance information comprises source end average CPU utilization rate and source end memory consumption, and the target end server performance information comprises target end average CPU utilization rate and target end memory consumption;
establishing a second source correction array according to the average CPU utilization rate of the source, the memory consumption of the source and the inflow data of the source database and obtaining a third influence value;
establishing a second target end correction array according to the average CPU utilization rate of the target end and the memory consumption of the target end and obtaining a fourth influence value;
and determining an analysis access process influence value based on the third influence value and the fourth influence value, and updating the access frequency based on the analysis access process influence value and the data table type.
In some embodiments of the present application, establishing a second source-side correction array according to a source-side average CPU utilization, source-side memory consumption, and source-side database inflow data and obtaining a third impact value, and establishing a second target-side correction array according to a target-side average CPU utilization and target-side memory consumption and obtaining a fourth impact value, including:
obtaining a plurality of impact scores based on the average CPU utilization rate of the source terminal, the memory consumption of the source terminal, the inflow data of the source terminal database and the third preset weight;
determining the position sequence of a second source end correction array based on the magnitude relation among the influence scores, and constructing the second source end correction array according to the position sequence;
obtaining a third influence value based on the local factor corresponding to the position sequence in the second source end correction array and the second source end correction array;
obtaining corresponding influence scores of the target terminal average CPU utilization rate, the target terminal memory consumption and the fourth preset weight based on the target terminal average CPU utilization rate and the fourth preset weight;
determining the position sequence of a second target end correction array based on the magnitude relation among the influence scores, and constructing the second target end correction array according to the position sequence;
and obtaining a fourth influence value based on the local factor corresponding to the position sequence in the second target end correction array and the second target end correction array.
In some embodiments of the present application, updating the access frequency based on resolving the access procedure impact value and the data table type includes:
determining end point values of two sides of the influence of the analytic access process according to the data table type, and selecting a plurality of preset analytic access process influence values based on the end point values of the two sides of the influence of the analytic access process;
and determining an update coefficient based on the relation between the analysis access process influence value and a plurality of preset analysis access process influence values, and updating the preset access frequency based on the update coefficient.
By applying the technical scheme, in the source database, dividing the data table types according to the data attributes, exporting the data table to be accessed into a preset format file, and simultaneously recording the stock data range and distinguishing the stock data and the incremental data; importing stock data of a file in a preset format into a transfer database, importing data in the transfer database into a target database, and deleting the data in the transfer database after the data are successfully imported; synchronously copying the incremental data in the source database into a cache database according to a preset synchronous rate, adding an adding and deleting identification and a time stamp field, analyzing and converting the incremental data in the cache database, and accessing the incremental data into a target database according to a preset access frequency; when the source database synchronizes data to the cache database, synchronization information is acquired, the synchronization rate is updated in real time according to the synchronization information, when the cache database analyzes access to the target database, analysis access information is acquired, and the access frequency is updated in real time according to the analysis access information. According to the method and the device, when incremental data are accessed, whether the access parameters need to be updated is judged according to the standard, the synchronous rate and the access frequency are updated in real time, the data access speed is improved, and the safety and stability of the data are guaranteed.
From the above description of the embodiments, it will be clear to those skilled in the art that the present invention may be implemented in hardware, or may be implemented by means of software plus necessary general hardware platforms. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective implementation scenario of the present invention.
In order to further explain the technical idea of the invention, the technical scheme of the invention is described with specific application scenarios.
The processing data mainly comprises the following three types:
1) Simply grow classes, such tables have only insert operations. Once the data is inserted into the database, no changes will occur. Such tables are mainly: real-time growing traffic tables, regularly growing archive tables, irregularly growing or non-growing regular tables.
2) The full update class, the data table has all the operations of adding and deleting. These operations may be real-time, timed or non-timed in terms of insertion and deletion, but in terms of modification, there is only real-time and non-timed modification, and no timed modification operation.
3) Short-term archival class, the data is used for generating batch adding and deleting operations at regular time without modifying operations. Taking a work order table as an example, a record table of a certain service work order can be stored in three tables: on-line worksheet forms with real-time insert, modify and timed delete operations belong to the full update class; a long-term filing table only has timed insertion operation, and belongs to the simple growth class; short-term archive forms, in which only timed insert and delete operations exist, work order records completed in the online form will be deleted from the online form and inserted into the short-term archive form, while records exceeding the time limit in the short-term archive form will be deleted and inserted into the long-term archive form.
The method is divided into two stages, one is the access of stock data, and the other is the access of incremental data.
Access of stock data:
the stock data only needs to be accessed once. Considering the huge stock, if the ETL mode is directly adopted, the efficiency is too low and the stability is poor in the process of data transmission in different places. Therefore, we choose to export the stock data from the source in the form of files before importing it into the local system where the data warehouse is located. Because the source end and the target end are heterogeneous databases and the source end only allows the dmp file to be exported according to the operation rule of the production environment, a small Oracle database needs to be deployed locally in a data warehouse to serve as a data transfer station. The dmp file exported by the source is imported into the Oracle, and then the full data is extracted from the Oracle to the data warehouse by using the ETL mode.
The process is shown in fig. 2, and the steps are as follows:
1) The source end exports the table to be accessed into a dmp file, and simultaneously records the range of the stock data, and the range is used as the basis for distinguishing the stock data from the incremental data.
2) And sending the exported dmp file to a server where the transfer database of the target end is located in a file transmission mode.
3) And importing the dmp file into a target terminal transfer database.
4) And importing the data in the target-side transfer database into an MPP data warehouse by using an ETL mode.
5) And deleting the data in the transit database after the successful completion of the importing.
6) Repeating the steps 1) to 5) until all the access of the inventory data is completed.
ETL is a data extraction, conversion, loading mode. Firstly, data is read from a source end in a data extraction stage, the process uses a reading mode suitable for a source end database to read, and the structured databases such as Oracle, mysql and the like which are commonly used at present are generally read in a jdbc mode. After the data is read out, the source end data is processed and converted properly in the data conversion stage, so that the data meets the storage requirement of the target end, and finally the processed data is loaded to the target end database in the data loading stage.
The access of incremental data needs to take into account the frequency of access. For transactional queries, the data requirements are real-time or near real-time (on the order of minutes), but the data time range of the query is small, typically not more than 1 year. For the scenes of offline computation such as BI analysis, report computation, query analysis, data mining, business prediction and the like, the time interval is usually calculated according to days or months as long as the incremental data meets the computation interval of the offline computation, but the time range of the queried data is larger, and the maximum can reach the full-scale level. In view of such data access requirements, we use Mysql, in which 1 year data is cached to support the requirements of transactional queries, as the incremental data is accessed into the cache database prior to the MPP data warehouse. We synchronize data from Oracle to Mysql using OGG mode, and then extract incremental data from Mysql into the MPP data warehouse based on the minimum time interval of the offline computing scenario using ETL mode.
The incremental data process is as follows:
as shown in the figure 3 of the drawings,
1) Synchronously copying data in the source Oracle to Mysql in an OGG mode, and adding an adding and deleting and modifying identifier and a time stamp field in the OGG process.
2) And periodically analyzing and converting the data in an ETL mode and storing the data in an MPP data warehouse.
It will be appreciated that the specific means or databases described above may be adapted or modified as appropriate to the particular situation.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, one of ordinary skill in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. The fast data source access method is applied to a system comprising a source database, a transit database, a cache database and a target database, and is characterized by comprising the following steps:
in a source database, dividing the data table type according to the data attribute, exporting the data table to be accessed into a preset format file, and simultaneously recording the stock data range and distinguishing the stock data and the increment data;
importing stock data of a file in a preset format into a transfer database, importing data in the transfer database into a target database, and deleting the data in the transfer database after the data are successfully imported;
synchronously copying the incremental data in the source database into a cache database according to a preset synchronous rate, adding an adding and deleting identification and a time stamp field, analyzing and converting the incremental data in the cache database, and accessing the incremental data into a target database according to a preset access frequency;
when the source database synchronizes data to the cache database, synchronization information is acquired, the synchronization rate is updated in real time according to the synchronization information, when the cache database analyzes access to the target database, analysis access information is acquired, and the access frequency is updated in real time according to the analysis access information.
2. The method of claim 1, wherein after acquiring the synchronization information, the method further comprises:
the synchronization information comprises average synchronization time of each piece of data;
if the average synchronization time of each piece of data is larger than a first preset time threshold value, the synchronization rate is updated in real time according to the synchronization information;
if the average synchronization time of each data is not greater than the first preset time threshold, the synchronization rate is not updated.
3. The method of claim 1, wherein after obtaining the resolved access information, the method further comprises:
the analysis access information comprises all data analysis completion time;
if the analysis completion time of all the data is greater than a second preset time threshold value, synchronously updating the access frequency according to the analysis access information;
if the total data analysis completion time is not greater than the second preset time threshold, the access frequency is not updated.
4. The method of claim 2, wherein updating the synchronization rate in real time based on the synchronization information comprises:
the synchronous information also comprises source end and target end server performance information, adding and deleting time and source end database inflow data, wherein the source end server performance information comprises source end average CPU utilization rate and source end memory consumption, and the target end server performance information comprises target end average CPU utilization rate and target end memory consumption;
establishing a first source end correction array according to the average CPU utilization rate of the source end, the memory consumption of the source end, the adding and deleting time and the inflow data of the source end database, and obtaining a first influence value according to the first source end correction array;
establishing a first target end correction array according to the average CPU utilization rate of the target end and the memory consumption of the target end, and obtaining a second influence value according to the first target end correction array;
a synchronization process impact value is determined based on the first impact value and the second impact value, and the synchronization rate is updated based on the synchronization process impact value and the data table type.
5. The method of claim 4, wherein establishing a first source correction array based on the source average CPU utilization, the source memory consumption, the addition and deletion time, and the source database inflow data, and obtaining a first impact value based on the first source correction array, comprises:
obtaining a plurality of impact scores based on the average CPU utilization rate of the source terminal, the memory consumption of the source terminal, the adding and deleting and modifying time, the inflow data of the source terminal database and the first preset weight;
determining the position sequence of a first source end correction array based on the magnitude relation among the influence scores, and constructing the first source end correction array according to the position sequence;
and obtaining a first influence value based on the local factor corresponding to the position sequence in the first source end correction array and the first source end correction array.
6. The method of claim 4, wherein establishing a first target-side correction array based on the target-side average CPU utilization and the target-side memory consumption, and obtaining a second impact value based on the first target-side correction array, comprises:
obtaining corresponding influence scores of the target terminal average CPU utilization rate, the target terminal memory consumption and the second preset weight based on the target terminal average CPU utilization rate and the second preset weight;
determining the position sequence of a first target end correction array based on the magnitude relation among the influence scores, and constructing the first target end correction array according to the position sequence;
and obtaining a second influence value based on the local factor corresponding to the position sequence in the first target end correction array and the first target end correction array.
7. The method of claim 5 or 6, wherein updating the synchronization rate based on the synchronization process impact value and the data table type comprises:
determining endpoint values of two sides of the influence of the synchronization process according to the data table type, and selecting a plurality of preset synchronization process influence values based on the endpoint values of the two sides of the influence of the synchronization process;
and determining an update coefficient based on the relation between the synchronization process influence value and a plurality of preset synchronization process influence values, and updating the preset synchronization rate based on the update coefficient.
8. The method of claim 1, wherein updating the access frequency in real time based on the parsed access information comprises:
the analysis access information also comprises source end and target end server performance information and source end database inflow data, the source end server performance information comprises source end average CPU utilization rate and source end memory consumption, and the target end server performance information comprises target end average CPU utilization rate and target end memory consumption;
establishing a second source correction array according to the average CPU utilization rate of the source, the memory consumption of the source and the inflow data of the source database and obtaining a third influence value;
establishing a second target end correction array according to the average CPU utilization rate of the target end and the memory consumption of the target end and obtaining a fourth influence value;
and determining an analysis access process influence value based on the third influence value and the fourth influence value, and updating the access frequency based on the analysis access process influence value and the data table type.
9. The method of claim 8 wherein establishing a second source correction array based on the source average CPU utilization, the source memory consumption, and the source database inflow data and obtaining a third impact value, and establishing a second target correction array based on the target average CPU utilization and the target memory consumption and obtaining a fourth impact value, comprises:
obtaining a plurality of impact scores based on the average CPU utilization rate of the source terminal, the memory consumption of the source terminal, the inflow data of the source terminal database and the third preset weight;
determining the position sequence of a second source end correction array based on the magnitude relation among the influence scores, and constructing the second source end correction array according to the position sequence;
obtaining a third influence value based on the local factor corresponding to the position sequence in the second source end correction array and the second source end correction array;
obtaining corresponding influence scores of the target terminal average CPU utilization rate, the target terminal memory consumption and the fourth preset weight based on the target terminal average CPU utilization rate and the fourth preset weight;
determining the position sequence of a second target end correction array based on the magnitude relation among the influence scores, and constructing the second target end correction array according to the position sequence;
and obtaining a fourth influence value based on the local factor corresponding to the position sequence in the second target end correction array and the second target end correction array.
10. The method of claim 9, wherein updating the access frequency based on resolving the access procedure impact value and the data table type comprises:
determining end point values of two sides of the influence of the analytic access process according to the data table type, and selecting a plurality of preset analytic access process influence values based on the end point values of the two sides of the influence of the analytic access process;
and determining an update coefficient based on the relation between the analysis access process influence value and a plurality of preset analysis access process influence values, and updating the preset access frequency based on the update coefficient.
CN202310342953.5A 2023-03-31 2023-03-31 Quick data source access method Pending CN116414902A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310342953.5A CN116414902A (en) 2023-03-31 2023-03-31 Quick data source access method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310342953.5A CN116414902A (en) 2023-03-31 2023-03-31 Quick data source access method

Publications (1)

Publication Number Publication Date
CN116414902A true CN116414902A (en) 2023-07-11

Family

ID=87052587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310342953.5A Pending CN116414902A (en) 2023-03-31 2023-03-31 Quick data source access method

Country Status (1)

Country Link
CN (1) CN116414902A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001067355A2 (en) * 2000-03-07 2001-09-13 American Express Travel Related Services Company, Inc. System for facilitating a transaction
CN102841897A (en) * 2011-06-23 2012-12-26 阿里巴巴集团控股有限公司 Incremental data extracting method, device and system
CN107729366A (en) * 2017-09-08 2018-02-23 广东省建设信息中心 A kind of pervasive multi-source heterogeneous large-scale data synchronization system
CN110651265A (en) * 2017-03-28 2020-01-03 英国天然气控股有限公司 Data replication system
CN110674146A (en) * 2019-08-22 2020-01-10 视联动力信息技术股份有限公司 Data synchronization method, synchronization end, end to be synchronized, equipment and storage medium
CN111339103A (en) * 2020-03-13 2020-06-26 河南安冉云网络科技有限公司 Data exchange method and system based on full fragmentation and incremental log analysis
CN111506556A (en) * 2020-04-09 2020-08-07 北京市测绘设计研究院 Multi-source heterogeneous structured data synchronization method
CN111984728A (en) * 2020-08-14 2020-11-24 北京人大金仓信息技术股份有限公司 Heterogeneous database data synchronization method, device, medium and electronic equipment
CN113961546A (en) * 2021-10-27 2022-01-21 国网江苏省电力有限公司营销服务中心 Real-time query library design method supporting online analysis statistics
US20220075770A1 (en) * 2020-09-09 2022-03-10 International Business Machines Corporation Dynamic selection of synchronization update path
US11294931B1 (en) * 2019-09-20 2022-04-05 Amazon Technologies, Inc. Creating replicas from across storage groups of a time series database
CN114490869A (en) * 2021-12-29 2022-05-13 中国电信股份有限公司 Data synchronization method and device, data source end, target end and storage medium
US20230034941A1 (en) * 2021-07-27 2023-02-02 Sap Se Data replication system

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001067355A2 (en) * 2000-03-07 2001-09-13 American Express Travel Related Services Company, Inc. System for facilitating a transaction
CN102841897A (en) * 2011-06-23 2012-12-26 阿里巴巴集团控股有限公司 Incremental data extracting method, device and system
CN110651265A (en) * 2017-03-28 2020-01-03 英国天然气控股有限公司 Data replication system
CN107729366A (en) * 2017-09-08 2018-02-23 广东省建设信息中心 A kind of pervasive multi-source heterogeneous large-scale data synchronization system
CN110674146A (en) * 2019-08-22 2020-01-10 视联动力信息技术股份有限公司 Data synchronization method, synchronization end, end to be synchronized, equipment and storage medium
US11294931B1 (en) * 2019-09-20 2022-04-05 Amazon Technologies, Inc. Creating replicas from across storage groups of a time series database
CN111339103A (en) * 2020-03-13 2020-06-26 河南安冉云网络科技有限公司 Data exchange method and system based on full fragmentation and incremental log analysis
CN111506556A (en) * 2020-04-09 2020-08-07 北京市测绘设计研究院 Multi-source heterogeneous structured data synchronization method
CN111984728A (en) * 2020-08-14 2020-11-24 北京人大金仓信息技术股份有限公司 Heterogeneous database data synchronization method, device, medium and electronic equipment
US20220075770A1 (en) * 2020-09-09 2022-03-10 International Business Machines Corporation Dynamic selection of synchronization update path
US20230034941A1 (en) * 2021-07-27 2023-02-02 Sap Se Data replication system
CN113961546A (en) * 2021-10-27 2022-01-21 国网江苏省电力有限公司营销服务中心 Real-time query library design method supporting online analysis statistics
CN114490869A (en) * 2021-12-29 2022-05-13 中国电信股份有限公司 Data synchronization method and device, data source end, target end and storage medium

Similar Documents

Publication Publication Date Title
CN110879813B (en) Binary log analysis-based MySQL database increment synchronization implementation method
US11314701B2 (en) Resharding method and system for a distributed storage system
US8176088B2 (en) Incremental cardinality estimation for a set of data values
CN104794123B (en) A kind of method and device building NoSQL database indexes for semi-structured data
US7308456B2 (en) Method and apparatus for building one or more indexes on data concurrent with manipulation of data
CN109542979B (en) Fast synchronization and simple data storage mode of block chain system
CN101216821B (en) Data acquisition system storage management method
CN110096509A (en) Realize that historical data draws the system and method for storage of linked list modeling processing under big data environment
CN110555770A (en) Block chain world state checking and recovering method based on incremental hash
CN114138907A (en) Data processing method, computer device, storage medium, and computer program product
KR20190063835A (en) System for processing real-time data modification of in-memory database
CN111414424B (en) Method, system, medium and equipment for automatically synchronizing redis of configuration data
US20160203197A1 (en) Method and System for Automatic Management of Dynamically Allocated Memory in a Computing Unit
CN115033578A (en) Method for updating service data, related device and storage medium
CN112835918A (en) MySQL database increment synchronization implementation method
CN116414902A (en) Quick data source access method
CN114265875B (en) Method for establishing wide table in real time based on stream data
CN114880404A (en) Group model quasi-delay synchronization method of rail transit distributed database
CN110866068B (en) Advertisement data storage method and device based on HDFS
CN113420036A (en) Consistency checking method for internal storage relation database of power grid monitoring system
CN113656504A (en) Block chain transaction submitting, editing and inquiring method based on time sequence attribute
CN111221801A (en) Database migration method, system and related device
CN117591577B (en) Nuclear power historical data comparison method and system based on file storage
CN112256666B (en) Logic increment migration method
CN115185930B (en) IT monitoring system migration method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination