CN111259025A - Self-adaptive frequency conversion increment updating method for multi-source heterogeneous data - Google Patents

Self-adaptive frequency conversion increment updating method for multi-source heterogeneous data Download PDF

Info

Publication number
CN111259025A
CN111259025A CN202010036197.XA CN202010036197A CN111259025A CN 111259025 A CN111259025 A CN 111259025A CN 202010036197 A CN202010036197 A CN 202010036197A CN 111259025 A CN111259025 A CN 111259025A
Authority
CN
China
Prior art keywords
data
source
database
updating
constructing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010036197.XA
Other languages
Chinese (zh)
Other versions
CN111259025B (en
Inventor
朱跃龙
丁昱凯
冯钧
陆佳民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN202010036197.XA priority Critical patent/CN111259025B/en
Publication of CN111259025A publication Critical patent/CN111259025A/en
Application granted granted Critical
Publication of CN111259025B publication Critical patent/CN111259025B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a self-adaptive frequency conversion increment updating method of multi-source heterogeneous data, which comprises the following steps: determining a data source and a core database cluster; constructing a data updating model; deploying and initializing a data updating model; acquiring data at each data source through a data updating model; comparing the obtained data time stamps and judging whether the data time stamps need to be updated or not; loading the updated data to the core database cluster; and refreshing the frequency configuration table and the time stamp record table according to the updated data. The invention can dynamically update data according to the data source and the data structure, can adaptively adjust the update frequency of different data sources, and has the advantages of good flexibility, convenient configuration, high update speed and strong expandability.

Description

Self-adaptive frequency conversion increment updating method for multi-source heterogeneous data
Technical Field
The invention belongs to the field of data mining and application, and particularly relates to a self-adaptive frequency conversion increment updating method for multi-source heterogeneous data.
Background
With the development of socioeconomic and data acquisition technologies, various industries generate a large amount of data, which includes structured data and semi-structured data with strong structures, and also includes a large amount of unstructured data such as text, image, and video data. While data acquisition technologies have improved, data storage and processing technologies have also continued to evolve. The multi-source heterogeneous data means that data has multiple sources, and the data structures of the same source are often different. Common situations are "a number of multiple sources" and "a source of multiple. Due to the fact that the data collection party and the management party are different in division, data of one data source, such as precipitation data, can be collected by multiple units of data collection equipment, and redundancy is caused; on the other hand, since different services have different requirements for data, the frequency of data processing and updating is different. There may be multiple items of data of different frequencies from the same data source. Because the existing stage data storage mainly takes a structured database storage mode as a main mode, the difficulty of storing unstructured data such as texts, images, audios and videos is high. Meanwhile, different data refreshing frequencies exist in different data sources, such as network data sources, database data sources and data sources of manual filling, most of the multi-source data updating modes are mainly constant-frequency updating modes, so that the updating efficiency is low, and the updating structure flexibility is poor. Storage, processing and migration of multi-source heterogeneous data still have great difficulty.
Disclosure of Invention
The purpose of the invention is as follows: in order to solve the problems that multi-source heterogeneous data is difficult to process and update frequency is variable and difficult to determine in the prior art, a self-adaptive frequency conversion increment updating method of the multi-source heterogeneous data is provided, and the method is high in updating efficiency, stable in performance, convenient to deploy and good in expandability.
The technical scheme is as follows: in order to achieve the above object, the present invention provides a self-adaptive frequency conversion increment updating method for multi-source heterogeneous data, comprising the following steps:
s1: determining a data source and a core database cluster;
s2: constructing a data updating model;
s3: deploying and initializing the data update model constructed in the step S2;
s4: acquiring data at each data source through a data updating model;
s5: comparing the obtained data time stamps, judging whether the data time stamps need to be updated or not, if so, continuing to update, and if not, repeating the step S4;
s6: loading the updated data to the core database cluster;
s7: and refreshing the frequency configuration table and the time stamp record table according to the updated data.
Further, the step S1 is specifically:
s1-1: determining a Data Source Type Data _ Source _ Type, which comprises the following steps: manually filling a data source, a network data source and an entire database source;
s1-2: determining a data source access method according to the type of the data source;
s1-3: determining the type of a core database cluster and access, reading and writing methods;
s1-4: creating a data source basic information table SIT, wherein the fields comprise: a data source name snm, a data source IP address sip, a port number spt, a data source type stp, a target database IP address tip, a target database port number tpt, a target database user name tusnm, a target database name tnm, a target database schema name tpnm, and a target database connection password tkw.
Further, the data update model in step S2 includes a network resource obtaining unit NAU, a manually filled data obtaining unit HAU, a general database data extraction unit GDEU, an update frequency control unit FCU, a general data specification unit GDTU, and a general data loading unit GDLU.
The network resource acquisition unit NAU comprises the following construction steps:
s2 a-1: constructing an IP address resolution access module, and accessing a specified network resource address according to the IP address of the network resource inlet;
s2 a-2: constructing a network resource downloading module, and downloading the data pointed by the link to a local computer;
s2 a-3: constructing a data dump module, simply naming and arranging network resources, and storing the network resources to a computer appointed disk where the NAU is located;
s2 a-4: constructing a termination condition judgment module, and terminating the NAU program according to the input termination condition C;
the construction steps of the manual filling data acquisition unit HAU are as follows:
s2 b-1: a path index building module is used for inquiring whether the file content under the specified file path has a new data file;
s2 b-2: a file type judgment module is constructed to judge the data type of the newly added file;
s2 b-3: a data storage module is constructed, a data storage form is judged according to the type of the data file, and the data is stored to a specified disk of a computer where the HAU is located;
the manual filling data refers to a structured data file or an unstructured data file which is collected or filled manually. The manually collected or filled structured data refers to data files with clear and standard data organization structures, such as xls, csv, xlsx and the like, and the structures and the contents of the data files are not changed during storage; the unstructured data collected or filled manually refers to data files without clear standard data structures, such as texts, images, audios and the like, only the file name FileName, the file size FileSize and the file position FileLoca are stored during storage, and the information of all unstructured data is uniformly stored in a file named datainfo.
The general database data extraction unit GDEU is constructed by the following steps:
s2 c-1: creating a database basic information table DBIT, wherein the fields comprise: a database IP address dbip, a port number dbpt, a user name usnm, a database name dbnm, a mode name pnm, a database connection password dbkw and a database type dbtp;
s2 c-2: acquiring a connection driving program or manually writing according to the type of the source database;
s2 c-3: extracting test cases and testing database connection;
the update frequency control unit FCU is constructed by the following steps:
s2 d-1: creating an update timestamp record table TRT, the fields comprising: data source name snm, data source ip address sip, update timestamp uts;
s2 d-2: creating a data source update frequency configuration table (FRT), wherein fields comprise: data source name snm, data source ip address sip, update frequency suf;
s2 d-3: constructing an update timestamp record table reading module;
s2 d-4: constructing a network resource acquisition unit NAU, a manual filling data acquisition unit HAU and a database data extraction unit GDEU calling module;
s2 d-5: constructing an updating frequency calculation module for calculating and updating the frequency of each data source;
s2 d-6: constructing a data source updating frequency configuration table refreshing module, and writing the latest frequency into a configuration table;
the construction steps of the GDTU are as follows:
s2 e-1: constructing a data reading module;
s2 e-2: constructing a data merging, editing and sequencing module;
s2 e-3: constructing a data writing module;
the construction steps of the GDLU are as follows:
s2 f-1: constructing a core database cluster access module;
s2 f-2: constructing a standard data reading module;
s2 f-3: and constructing a loading data loading module of the core database.
Further, the specific process of step S3 is as follows:
s3-1: the method for deploying the data updating model comprises the following specific steps:
s3 a-1: deploying a network resource acquisition unit NAU, a manual filling data acquisition unit HAU, a general database data extraction unit GDEU and a general data specification unit GDTU on a single computer according to the data source condition, and testing;
s3 a-2: deploying and testing an updating Frequency Control Unit (FCU);
s3 a-3: deploying and testing a data loading unit GDLU;
s3-2: initializing parameters of a data update model, and specifically comprising the following steps:
s3 b-1: initializing data source basic information SIT, and the fields comprise: a data source name snm, a data source IP address sip, a port number spt, a data source type stp, a target database IP address tip, a target database port number tpt, a target database user name tusnm, a target database name tnm, a target database schema name tpnm and a target database connection password tkw;
s3 b-2: initializing an update timestamp record table TRT, the fields including: data source name snm, data source ip address sip, update timestamp uts;
s3 b-3: initializing the update frequency configuration table FCT, the fields comprising: data source name snm, data source ip address sip, update frequency suf;
s3 b-4: initializing a network resource access IP address and a termination condition C of a network resource acquisition unit NAU, and downloading a storage position NSL of a network resource;
s3 b-5: initializing datainfo.xls files of a manual filling data acquisition unit HAU and storage positions HSL of the datainfo.xls files;
s3 b-6: initializing a database basic information table DBIT, wherein fields comprise a database IP address dbip, a port number dbpt, a user name usnm, a database name dbnm, a mode name pnm, a database connection password dbkw and a database type dbtp.
Further, the specific process of step S4 is as follows: and the updating frequency control unit FCU inquires the initialized updating frequency configuration table FCT, calls the NAU, the HAU and the GDEU according to the corresponding updating frequency, and acquires data of each data source, including network data resources, manual filling data and database data.
The specific steps for acquiring the network data resources are as follows:
s4 a-1: inputting the entrance IP address and the termination condition C into a network resource acquisition unit NAU; wherein, the termination condition refers to a time interval T or a link hop count H or a termination IP address A;
s4 a-2: the network resource acquisition unit NAU continuously indexes the resource link according to the IP address and downloads the required resource link to the designated disk position NSL of the computer where the NAU is located;
the specific steps for acquiring the manual filling data are as follows:
s4 b-1: inputting a specified manual data file storage path;
s4 b-2: judging whether the data to be updated exists in the specified path, if so, further judging the type of the data;
s4 b-3: storing the data to a designated disk position HSL of a computer where the HAU is located according to the data type;
the specific steps for acquiring the database data are as follows:
s4 c-1: connecting a database basic information table DBIT according to a database information table to acquire database connection information and establish connection;
s4 c-2: and acquiring data according to the query condition.
The specific process of step S5 is as follows:
s5-1: comparing the latest data time in the acquired data, and judging whether the data needs to be updated;
s5-2: if so, the updating is continued, otherwise, the above step S4 is repeated.
Further, the specific process of step S6 is as follows:
s6-1: calling a general data specification unit GDTU, and performing specification operation on network data source data and manual filling data;
s6-2: and calling a data loading unit GDLU, and loading the database data obtained in the step S4, the manual filling data subjected to the normalization operation and the network data resources into the core database cluster.
Further, the specific process of step S7 is as follows:
s7-1: updating the data updating time stamp of each data source, taking the time of obtaining the latest data in the data as the time stamp, and reserving the original time stamp;
s7-2: refreshing the update frequency configuration table according to the current update frequency f of a certain data sourcetAnd a current time stamp TStAnd a timestamp TSt-1Calculating a new update frequency f of the data sourcet+1And writes it into the update rate configuration table.
ft+1The calculation method is as follows:
Figure BDA0002366114640000051
where α is the refresh rate, and the range is [0,1], and the larger α indicates a faster change in the update frequency.
Has the advantages that: compared with the prior art, the invention has the following advantages:
1. aiming at the problems of high difficulty, complex steps and the like of multi-source heterogeneous data processing, the invention provides a unified solution for the automatic updating of multi-source (network, manual and database) heterogeneous (numerical value, text, image and audio and video) data, has good expandability and can be deployed by a single machine.
2. Aiming at the problems that the updating frequency of multi-source heterogeneous data is not fixed and the processing efficiency of a fixed frequency method is low, the invention provides a self-adaptive variable frequency increment updating method, aiming at the characteristic that the updating frequencies of different data sources are different, an updating frequency control unit is adopted to dynamically calculate the future updating frequencies of different data sources according to the current and historical data updating time stamps, and an updating frequency configuration table and a time stamp recording table are maintained in a system; meanwhile, an incremental updating mode is adopted, so that the data transmission quantity is reduced, the communication overhead is reduced, and the speed and the updating efficiency of multi-source heterogeneous data are further improved. In addition, the invention adopts a database cluster mode to store the updated data, so that the data security is higher and the system performance is more stable.
Drawings
FIG. 1 is a flow chart of the algorithm of the present invention;
FIG. 2 is a diagram of a specific update framework of the method of the present invention.
Detailed Description
The invention is further elucidated with reference to the drawings and the embodiments.
Referring to fig. 1, the invention provides a self-adaptive frequency conversion increment updating method for multi-source heterogeneous data, which comprises the following steps:
s1: determining a data source and core database cluster:
s1-1: determining a Data Source Type Data _ Source _ Type, which comprises the following steps: manually filling a data source, a network data source and an entire database source;
s1-2: determining a data source access method according to the type of the data source;
s1-3: determining the type of a core database cluster and access, reading and writing methods;
s1-4: creating a data source basic information table SIT, wherein the fields comprise: a data source name snm, a data source IP address sip, a port number spt, a data source type stp, a target database IP address tip, a target database port number tpt, a target database user name tusnm, a target database name tnm, a target database schema name tpnm, and a target database connection password tkw.
S2: constructing a data updating model:
as shown in fig. 2, the data update model includes a network resource acquisition unit NAU, a manual filling data acquisition unit HAU, a general database data extraction unit GDEU, an update frequency control unit FCU, a general data specification unit GDTU, and a general data loading unit GDLU.
The network resource acquisition unit NAU comprises the following construction steps:
s2 a-1: constructing an IP address resolution access module, and accessing a specified network resource address according to the IP address of the network resource inlet;
s2 a-2: constructing a network resource downloading module, and downloading the data pointed by the link to a local computer;
s2 a-3: constructing a data dump module, simply naming and arranging network resources, and storing the network resources to a computer appointed disk where the NAU is located;
s2 a-4: constructing a termination condition judgment module, and terminating the NAU program according to the input termination condition C;
the construction steps of the manual filling data acquisition unit HAU are as follows:
s2 b-1: a path index building module is used for inquiring whether the file content under the specified file path has a new data file;
s2 b-2: a file type judgment module is constructed to judge the data type of the newly added file;
s2 b-3: a data storage module is constructed, a data storage form is judged according to the type of the data file, and the data is stored to a specified disk of a computer where the HAU is located;
the manual filling data refers to a structured data file or an unstructured data file which is collected or filled manually. The manually collected or filled structured data refers to data files with clear and standard data organization structures, such as xls, csv, xlsx and the like, and the structures and the contents of the data files are not changed during storage; the unstructured data collected or filled manually refers to data files without clear standard data structures, such as texts, images, audios and the like, only the file name FileName, the file size FileSize and the file position FileLoca are stored during storage, and the information of all unstructured data is uniformly stored in a file named datainfo.
The general database data extraction unit GDEU is constructed by the following steps:
s2 c-1: creating a database basic information table DBIT, wherein the fields comprise: a database IP address dbip, a port number dbpt, a user name usnm, a database name dbnm, a mode name pnm, a database connection password dbkw and a database type dbtp;
s2 c-2: acquiring a connection driving program or manually writing according to the type of the source database;
s2 c-3: extracting test cases and testing database connection;
the update frequency control unit FCU is constructed by the following steps:
s2 d-1: creating an update timestamp record table TRT, the fields comprising: data source name snm, data source ip address sip, update timestamp uts;
s2 d-2: creating a data source update frequency configuration table (FRT), wherein fields comprise: data source name snm, data source ip address sip, update frequency suf;
s2 d-3: constructing an update timestamp record table reading module;
s2 d-4: constructing a network resource acquisition unit NAU, a manual filling data acquisition unit HAU and a database data extraction unit GDEU calling module;
s2 d-5: constructing an updating frequency calculation module for calculating and updating the frequency of each data source;
s2 d-6: constructing a data source updating frequency configuration table refreshing module, and writing the latest frequency into a configuration table;
the construction steps of the GDTU are as follows:
s2 e-1: constructing a data reading module;
s2 e-2: constructing a data merging, editing and sequencing module;
s2 e-3: constructing a data writing module;
the steps of constructing the GDLU are as follows:
s2 f-1: constructing a core database cluster access module;
s2 f-2: constructing a standard data reading module;
s2 f-3: and constructing a loading data loading module of the core database.
S3: deploying and initializing a data update model:
s3-1: the method for deploying the data updating model comprises the following specific steps:
s3 a-1: deploying a network resource acquisition unit NAU, a manual filling data acquisition unit HAU, a general database data extraction unit GDEU and a general data specification unit GDTU on a single computer according to the data source condition, and testing;
s3 a-2: deploying and testing an updating Frequency Control Unit (FCU);
s3 a-3: deploying and testing a data loading unit GDLU;
s3-2: initializing parameters of a data update model, and specifically comprising the following steps:
s3 b-1: initializing data source basic information SIT, and the fields comprise: a data source name snm, a data source IP address sip, a port number spt, a data source type stp, a target database IP address tip, a target database port number tpt, a target database user name tusnm, a target database name tnm, a target database schema name tpnm and a target database connection password tkw;
s3 b-2: initializing an update timestamp record table TRT, the fields including: data source name snm, data source ip address sip, update timestamp uts;
s3 b-3: initializing the update frequency configuration table FCT, the fields comprising: data source name snm, data source ip address sip, update frequency suf;
s3 b-4: initializing a network resource access IP address and a termination condition C of a network resource acquisition unit NAU, and downloading a storage position NSL of a network resource;
s3 b-5: initializing datainfo.xls files of a manual filling data acquisition unit HAU and storage positions HSL of the datainfo.xls files; s3 b-6: initializing a database basic information table DBIT, wherein fields comprise a database IP address dbip, a port number dbpt, a user name usnm, a database name dbnm, a mode name pnm, a database connection password dbkw and a database type dbtp. S4: acquiring data at each data source through a data updating model:
and the updating frequency control unit FCU inquires the initialized updating frequency configuration table FCT, calls the NAU, the HAU and the GDEU according to the corresponding updating frequency, and acquires data of each data source, including network data resources, manual filling data and database data.
The specific steps for acquiring the network data resources in this embodiment are as follows:
s4 a-1: inputting the entrance IP address and the termination condition C into a network resource acquisition unit NAU; wherein, the termination condition refers to a time interval T or a link hop count H or a termination IP address A;
s4 a-2: the network resource acquisition unit NAU continuously indexes the resource link according to the IP address and downloads the required resource link to the designated disk position NSL of the computer where the NAU is located;
the specific steps for acquiring the manual filling data are as follows:
s4 b-1: inputting a specified manual data file storage path;
s4 b-2: judging whether the data to be updated exists in the specified path, if so, further judging the type of the data;
s4 b-3: storing the data to a designated disk position HSL of a computer where the HAU is located according to the data type;
the specific steps for acquiring the database data are as follows:
s4 c-1: connecting a database basic information table DBIT according to a database information table to acquire database connection information and establish connection;
s4 c-2: and acquiring data according to the query condition.
S5: comparing the time of the latest data in the acquired data, judging whether the data needs to be updated, if so, continuing to update, otherwise, repeating the step S4;
s6: loading the updated data to the core database cluster:
s6-1: calling a general data specification unit GDTU, and performing specification operation on network data source data and manual filling data;
s6-2: and calling a data loading unit GDLU, and loading the database data obtained in the step S4, the manual filling data subjected to the normalization operation and the network data resources into the core database cluster.
S7: refreshing a frequency configuration table and a time stamp record table according to the updated data:
s7-1: updating the data updating time stamp of each data source, taking the time of obtaining the latest data in the data as the time stamp, and reserving the original time stamp;
s7-2: refreshing the update frequency configuration table according to the current update frequency f of a certain data sourcetAnd a current time stamp TStAnd a timestamp TSt-1Calculating a new update frequency f of the data sourcet+1And writes it into the update rate configuration table.
Wherein f ist+1The calculation method is as follows:
Figure BDA0002366114640000091
where α is the refresh rate, and the range is [0,1], and the larger α indicates a faster change in the update frequency.
As can be seen from fig. 2, the specific update framework obtained in this embodiment may be deployed on a single computer, and includes a network resource obtaining unit NAU, a manual filling data obtaining unit HAU, a general database data extraction unit GDEU, a general data specification unit GDTU, a general data loading unit GDLU, and an update frequency control unit FCU.
The network resource acquisition unit NAU, the manual filling data acquisition unit HAU and the general database data extraction unit GDEU are mainly used for acquiring data corresponding to a data source, aiming at data of different data types, processing is carried out in different modes, the original structure of structured data is kept, and unstructured data are converted into structured data after being processed.
The data specification unit GDTU is used for further standardizing and processing data which needs to be updated in the manual filling data source and the network resource data source, converting the data into a data form which can be loaded to a database by the general data loading unit GDLU, and finally, passing through the general data loading unit GDLU.
The updating frequency control unit FCU is mainly used for inquiring the timestamp record table and updating the configuration table, calling the network resource acquisition unit NAU, the manual filling data acquisition unit HAU and the general database data extraction unit GDEU according to the corresponding frequency according to the inquiry result, and meanwhile, according to the current updating frequency f of a certain data sourcetAnd a current time stamp TStAnd a timestamp TSt-1Calculating a new update frequency f of the data sourcet+1And writes it into the update rate table to implement the refresh operation.

Claims (10)

1. A self-adaptive frequency conversion increment updating method of multi-source heterogeneous data is characterized by comprising the following steps: the method comprises the following steps:
s1: determining a data source and a core database cluster;
s2: constructing a data updating model;
s3: deploying and initializing the data update model constructed in the step S2;
s4: acquiring data at each data source through a data updating model;
s5: comparing the obtained data time stamps, judging whether the data time stamps need to be updated or not, if so, continuing to update, and if not, repeating the step S4;
s6: loading the updated data to the core database cluster;
s7: and refreshing the frequency configuration table and the time stamp record table according to the updated data.
2. The adaptive frequency conversion increment updating method for the multi-source heterogeneous data according to claim 1, characterized in that: the step S1 specifically includes:
s1-1: determining a data source type, comprising: manually filling a data source, a network data source and an entire database source;
s1-2: determining a data source access method according to the type of the data source;
s1-3: determining the type of a core database cluster and access, reading and writing methods;
s1-4: and creating a data source basic information table.
3. The adaptive frequency conversion increment updating method for the multi-source heterogeneous data according to claim 1, characterized in that: the data update model in step S2 includes a network resource acquisition unit NAU, a manual filling data acquisition unit HAU, a general database data extraction unit GDEU, an update frequency control unit FCU, a general data specification unit GDTU, and a general data loading unit GDLU.
4. The adaptive frequency conversion increment updating method for the multi-source heterogeneous data according to claim 3, characterized in that: the network resource obtaining unit NAU in step S2 is constructed as follows:
s2 a-1: constructing an IP address resolution access module, and accessing a specified network resource address according to the IP address of the network resource inlet;
s2 a-2: constructing a network resource downloading module, and downloading the data pointed by the link to a local computer;
s2 a-3: constructing a data dump module, simply naming and arranging network resources, and storing the network resources to a computer appointed disk where the NAU is located;
s2 a-4: constructing a termination condition judgment module, and terminating the NAU program according to the input termination condition C;
the construction steps of the manual filling data acquisition unit HAU are as follows:
s2 b-1: a path index building module is used for inquiring whether the file content under the specified file path has a new data file;
s2 b-2: a file type judgment module is constructed to judge the data type of the newly added file;
s2 b-3: a data storage module is constructed, a data storage form is judged according to the type of the data file, and the data is stored to a specified disk of a computer where the HAU is located;
the general database data extraction unit GDEU is constructed by the following steps:
s2 c-1: creating a database basic information table;
s2 c-2: acquiring a connection driving program or manually writing according to the type of the source database;
s2 c-3: extracting test cases and testing database connection;
the update frequency control unit FCU is constructed by the following steps:
s2 d-1: creating an update timestamp record table;
s2 d-2: creating a data source updating frequency configuration table;
s2 d-3: constructing an update timestamp record table reading module;
s2 d-4: constructing a network resource acquisition unit NAU, a manual filling data acquisition unit HAU and a database data extraction unit GDEU calling module;
s2 d-5: constructing an updating frequency calculation module for calculating and updating the frequency of each data source;
s2 d-6: constructing a data source updating frequency configuration table refreshing module, and writing the latest frequency into a configuration table;
the construction steps of the GDTU are as follows:
s2 e-1: constructing a data reading module;
s2 e-2: constructing a data merging, editing and sequencing module;
s2 e-3: constructing a data writing module;
the construction steps of the GDLU are as follows:
s2 f-1: constructing a core database cluster access module;
s2 f-2: constructing a standard data reading module;
s2 f-3: and constructing a loading data loading module of the core database.
5. The adaptive frequency conversion increment updating method for the multi-source heterogeneous data according to claim 1, characterized in that: the specific process of step S3 is as follows:
s3-1: the method for deploying the data updating model comprises the following specific steps:
s3 a-1: deploying a network resource acquisition unit NAU, a manual filling data acquisition unit HAU, a general database data extraction unit GDEU and a general data specification unit GDTU on a single computer according to the data source condition, and testing;
s3 a-2: deploying and testing an updating Frequency Control Unit (FCU);
s3 a-3: deploying and testing a data loading unit GDLU;
s3-2: initializing parameters of a data update model, and specifically comprising the following steps:
s3 b-1: initializing basic information of a data source;
s3 b-2: initializing and updating a timestamp record table;
s3 b-3: initializing and updating a frequency configuration table;
s3 b-4: initializing a network resource access IP address and a termination condition C of a network resource acquisition unit NAU, and downloading a storage position NSL of a network resource;
s3 b-5: initializing datainfo.xls files of a manual filling data acquisition unit HAU and storage positions HSL of the datainfo.xls files;
s3 b-6: and initializing a database basic information table DBIT.
6. The adaptive frequency conversion increment updating method for the multi-source heterogeneous data according to claim 1, characterized in that: the specific process of step S4 is as follows: and the updating frequency control unit FCU inquires the initialized updating frequency configuration table FCT, calls the NAU, the HAU and the GDEU according to the corresponding updating frequency, and acquires data of each data source, including network data resources, manual filling data and database data.
7. The adaptive frequency conversion increment updating method for the multi-source heterogeneous data according to claim 6, characterized in that: the specific steps of acquiring the network data resource in step S4 are as follows:
s4 a-1: inputting the entrance IP address and the termination condition C into a network resource acquisition unit NAU; wherein, the termination condition refers to a time interval T or a link hop count H or a termination IP address A;
s4 a-2: the network resource acquisition unit NAU continuously indexes the resource link according to the IP address and downloads the required resource link to the designated disk position NSL of the computer where the NAU is located;
the specific steps for acquiring the manual filling data are as follows:
s4 b-1: inputting a specified manual data file storage path;
s4 b-2: judging whether the data to be updated exists in the specified path, if so, further judging the type of the data;
s4 b-3: storing the data to a designated disk position HSL of a computer where the HAU is located according to the data type;
the specific steps for acquiring the database data are as follows:
s4 c-1: connecting a database basic information table DBIT according to a database information table to acquire database connection information and establish connection;
s4 c-2: and acquiring data according to the query condition.
8. The adaptive frequency conversion increment updating method for the multi-source heterogeneous data according to claim 6, characterized in that: the specific process of step S6 is as follows:
s6-1: calling a general data specification unit GDTU, and performing specification operation on network data source data and manual filling data;
s6-2: and calling a data loading unit GDLU, and loading the database data obtained in the step S4, the manual filling data subjected to the normalization operation and the network data resources into the core database cluster.
9. The adaptive frequency conversion increment updating method for the multi-source heterogeneous data according to claim 1, characterized in that: the specific process of step S7 is as follows:
s7-1: updating the data updating time stamp of each data source, taking the time of obtaining the latest data in the data as the time stamp, and reserving the original time stamp;
s7-2: refreshing the update frequency configuration table according to the current update frequency f of a certain data sourcetAnd a current time stamp TStAnd a timestamp TSt-1Calculating a new update frequency f of the data sourcet+1And writes it into the update rate configuration table.
10. The adaptive frequency conversion increment updating method for the multi-source heterogeneous data according to claim 9, characterized in that: f in the step S7t+1The calculation method is as follows:
Figure FDA0002366114630000041
wherein α is the refresh rate, and the range is [0,1 ].
CN202010036197.XA 2020-01-14 2020-01-14 Self-adaptive frequency conversion increment updating method for multi-source heterogeneous data Active CN111259025B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010036197.XA CN111259025B (en) 2020-01-14 2020-01-14 Self-adaptive frequency conversion increment updating method for multi-source heterogeneous data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010036197.XA CN111259025B (en) 2020-01-14 2020-01-14 Self-adaptive frequency conversion increment updating method for multi-source heterogeneous data

Publications (2)

Publication Number Publication Date
CN111259025A true CN111259025A (en) 2020-06-09
CN111259025B CN111259025B (en) 2022-09-23

Family

ID=70946876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010036197.XA Active CN111259025B (en) 2020-01-14 2020-01-14 Self-adaptive frequency conversion increment updating method for multi-source heterogeneous data

Country Status (1)

Country Link
CN (1) CN111259025B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117240711A (en) * 2023-09-15 2023-12-15 合芯科技有限公司 Automatic updating method, device and equipment for cluster management tool configuration file

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090006318A1 (en) * 2007-06-29 2009-01-01 Tobias Lehtipalo Multi-source data visualization system
US20140286497A1 (en) * 2013-03-15 2014-09-25 Broadcom Corporation Multi-microphone source tracking and noise suppression
CN109508000A (en) * 2018-12-16 2019-03-22 西南电子技术研究所(中国电子科技集团公司第十研究所) Isomery multi-sensor multi-target tracking method
CN110245719A (en) * 2019-03-27 2019-09-17 中国海洋大学 A kind of Feature fusion of entity-oriented and user's portrait

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090006318A1 (en) * 2007-06-29 2009-01-01 Tobias Lehtipalo Multi-source data visualization system
US20140286497A1 (en) * 2013-03-15 2014-09-25 Broadcom Corporation Multi-microphone source tracking and noise suppression
CN109508000A (en) * 2018-12-16 2019-03-22 西南电子技术研究所(中国电子科技集团公司第十研究所) Isomery multi-sensor multi-target tracking method
CN110245719A (en) * 2019-03-27 2019-09-17 中国海洋大学 A kind of Feature fusion of entity-oriented and user's portrait

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117240711A (en) * 2023-09-15 2023-12-15 合芯科技有限公司 Automatic updating method, device and equipment for cluster management tool configuration file

Also Published As

Publication number Publication date
CN111259025B (en) 2022-09-23

Similar Documents

Publication Publication Date Title
CN111723221B (en) Mass remote sensing data processing method and system based on distributed architecture
CN102906751A (en) Method and device for data storage and data query
CN1503168A (en) Automatic test method for system products
CN102902762B (en) A kind of methods, devices and systems of deleting duplicated data
US9069823B2 (en) Method for managing a relational database of the SQL type
CN106503158A (en) Method of data synchronization and device
CN106776378A (en) It is a kind of to clear up data cached method and device
CN103440282B (en) A kind of test data storage system and method
CN111259025B (en) Self-adaptive frequency conversion increment updating method for multi-source heterogeneous data
CN102508919A (en) Data processing method and system
CN112181960A (en) Intelligent operation and maintenance framework system based on AIOps
CN111523004B (en) Storage method and system for edge computing gateway data
CN113612970A (en) Safety event intelligent analysis management and control platform for industrial monitoring video
CN113382071B (en) Link creation method and device based on hybrid cloud architecture
CN101917282B (en) Method, device and system for processing alarm shielding rules
CN104182470B (en) A kind of mobile terminal application class system and method based on SVM
CN112052248A (en) Audit big data processing method and system
CN102567432B (en) Intelligent information adaptation method and device for the same
CN105681895A (en) BS (Browser/Server) end video online playing method based on cloud computing
CN115033646A (en) Method for constructing real-time warehouse system based on Flink and Doris
CN105653719A (en) Method and system for rapidly acquiring files of external storage equipment and router
CN102253853B (en) Virtual instrument and method with online reconstruction and evolution functions
CN111143280B (en) Data scheduling method, system, device and storage medium
CN111427896A (en) Big data storage platform based on block chain
CN116644039B (en) Automatic acquisition and analysis method for online capacity operation log based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant