CN108446383A - A kind of data task redistribution method based on geographically distributed data query - Google Patents
A kind of data task redistribution method based on geographically distributed data query Download PDFInfo
- Publication number
- CN108446383A CN108446383A CN201810233064.4A CN201810233064A CN108446383A CN 108446383 A CN108446383 A CN 108446383A CN 201810233064 A CN201810233064 A CN 201810233064A CN 108446383 A CN108446383 A CN 108446383A
- Authority
- CN
- China
- Prior art keywords
- data
- task
- data center
- query
- cost
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of data task redistribution methods based on geographically distributed data query, the task and data of distributed system are redistributed using MFHC algorithms, the data at different data center are rearranged and the data center for executing query task is rearranged, a task is executed in different data centers.To achieve the purpose that reduce the totle drilling cost of analysis mode query process consumption, and solve the problems, such as that partial data can not be transferred to other data centers because of privacy or other limitations.The totle drilling cost for reducing inquiry solves the problems, such as partial data because privacy or other limitations can not be transferred to other data centers, improves the inquiry velocity of task, save cost.
Description
Technical field
The present invention relates to communication technique field more particularly to a kind of data task weights based on geographically distributed data query
Distribution method.
Background technology
With the development of network, in the big data epoch, data information is very universal, is carrying out data query process
In, in order to improve the speed of user access server and reduce the bandwidth of data transmission occupancy, many companies all build in the whole world
Found their data center.Such as Microsoft and Amazon, establish tens data centers in the whole world.These are in different zones
Data center constantly generates a large amount of data, such as User Activity daily record, server admin daily record and performance logs etc..Analysis
These data being distributed in the data center of different zones are a critically important job.Such as analysis user inquires day
Will can advertise decision, and analysis network log can detect dos attacks, and analysis system daily record can establish prediction model etc..
But analysis is carried out to the data for being distributed in different data center and needs cost, this cost includes mainly that data move
Link cost and data center carry out task computation cost.For incorporated business, it is to minimize this cost very much
It is necessary to.
The main of the prior art uses centralized approach, i.e., the data that will be analyzed are passed from the data center of different zones
Then defeated to one data center carries out data analysis in this data center.The shortcomings that this method, has:1) for present needle
Application to large-scale data, centralized approach need to transmit a large amount of data, cause serious bandwidth waste, and expend
Time is also very long.2) there are the data that some data centers preserve to be related to privacy concern, can not arbitrarily be transferred in other data
The heart.
Invention content
In view of the foregoing drawbacks or insufficient, the purpose of the present invention is to provide a kind of numbers based on geographically distributed data query
According to task redistribution method, rearranged by the data to different data center and to execute query task data
Center is rearranged, and a task is executed in different data centers, reduces the total of analysis mode query process consumption
Cost.
To achieve the above objectives, the technical scheme is that:
A kind of data task redistribution method based on geographically distributed data query, including:
1) multiple queries task, is obtained;
2), according to the state of query task and the data at current time, following information is carried out with statistical method
Prediction, and using fixed temporal scalable algorithm, minimum processing is carried out to the cost in following a period of time, when obtaining following
The data processing centre at quarter;
3), by MFHC algorithms, task distribution is carried out, divides the storage location of paired data to be moved according to task, moved
Move data processing centre;
4), paired data is divided to carry out SQL operations according to task, so that data are carried out mobile place when lower a moment operates
Reason;
5) step 1) -4, is repeated), until completing all query tasks.
The step 2) specifically includes:
2.1, it obtains and exchanges bandwidth Cor, the calculating cost Ccom of operation task that initial data occupies, and appoint to executing
The data center of business reassigned caused by switching cost Csw:
Wherein, the D gathers for data center, and there are several regions in each data center;P is regional ensemble;G is task
Set, each task g belong to an analysis mode inquiry, and the inquiry of each analysis formula can be expressed as a DAG figure;OpFor region
The data volume of p initial data;lDC(p),dFor the data center where the p of region to the link cost of data center d;ck,gFor task g
Cost is calculated in the unit of data center k;bk,gIt is task g in the data center k total amount of data to be run;Ii,gExist for task g
The data volume of the data center i intermediate data to be run;xp,dFor two-valued variable, if the initial data of region p will be transferred to data
Otherwise center d, value 1 are 0;yd,gFor two-valued variable, if task g will be executed in data center d, value 1, otherwise for
0;
2.2, sum up the costs:
2.3, minimum processing is carried out to totle drilling cost by FHC algorithms:
subject to:For t=τ .. τ+ω
Two symbol X and Y being directed to are defined as:
Wherein, fpFor the minimum backup quantity of region p data;
FHC algorithms, so as to find out y, are asked by being minimized to the totle drilling cost of [t, t+w] in the period according to the value of y
Go out x.
The step 3) specifically includes:
3.1, in t moment, by mfhc algorithms combination (w+1) a fhc algorithms as a result, being obtained most using majority voting algorithm
Whole y values, the value of x is determined further according to the value of y;
3.2, according to the x values acquired, data is redistributed, specified data center, then root are transferred data to
According to the y values acquired, so that by carrying out the task that SQL operations execute data center to data.
Compared with the prior art, beneficial effects of the present invention are:
The present invention provides a kind of data task redistribution methods based on geographically distributed data query, are calculated using MFHC
Method redistributes the task and data of distributed system, is rearranged to the data at different data center and right
The data center for executing query task is rearranged, and a task is executed in different data centers.Subtracted with reaching
The purpose of the totle drilling cost of few analysis mode query process consumption, and partial data is solved because privacy or other limitations can not be transferred to
The problem of other data centers.The totle drilling cost for reducing inquiry solves partial data because privacy or other limitations can not pass
Defeated the problem of arriving other data centers, the inquiry velocity of task is improved, cost is saved.
Description of the drawings
Fig. 1 is the data task redistribution method flow chart based on geographically distributed data query of the present invention;
Fig. 2 is the control flow block diagram of the present invention;
Fig. 3 is the MFHC algorithm flow charts of the present invention.
Specific implementation mode
The present invention is described in detail below in conjunction with attached drawing, it is clear that described embodiment is only the present invention one
Divide embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making
The every other embodiment obtained under the premise of creative work, belongs to protection scope of the present invention.
As shown in Figure 1, the present invention provides a kind of data task redistribution methods based on geographically distributed data query:
1) multiple queries task, is obtained;
2), according to the state of query task and the data at current time, following information is carried out with statistical method
Prediction, and using fixed temporal scalable algorithm (fixed horizon control, FHC) in following a period of time at
This carries out minimum processing, obtains the data processing centre of future time instance;
The step 2) specifically includes:
2.1, it obtains and exchanges bandwidth Cor, the calculating cost Ccom of operation task that initial data occupies, and appoint to executing
The data center of business reassigned caused by switching cost Csw:
The D gathers for data center, and there are several regions in each data center;P is regional ensemble;G is set of tasks,
Each task g belongs to an analysis mode inquiry, and the inquiry of each analysis formula can be expressed as a DAG figure;OpIt is original for region p
The data volume of data;lDC(p),dFor the data center where the p of region to the link cost of data center d;ck,gIt is task g in number
Cost is calculated according to the unit of center k;bk,gIt is task g in the data center k total amount of data to be run;Ii,gIt is task g in data
The data volume of the center i intermediate data to be run;xp,dFor two-valued variable, if the initial data of region p will be transferred to data center
Otherwise d, value 1 are 0;yd,gFor two-valued variable, if task g will be executed in data center d, otherwise value 1 is 0;
2.2, sum up the costs:
2.3, minimum processing is carried out to totle drilling cost by FHC algorithms:
subject to:For t=τ ... τ+ω
Two symbol X and Y being directed to are defined as:
Wherein, fpFor the minimum backup quantity of region p data;
FHC algorithms, so as to find out y, are asked by being minimized to the totle drilling cost of [t, t+w] in the period according to the value of y
Go out x.
3), by MFHC algorithms, task distribution is carried out, divides the storage location of paired data to be moved according to task, moved
Move data processing centre;
As shown in figure 3, the step 3) specifically includes:
3.1, in t moment, by mfhc algorithms combination (w+1) a fhc algorithms as a result, being obtained most using majority voting algorithm
Whole y values, the value of x is determined further according to the value of y;
3.2, according to the x values acquired, data is redistributed, specified data center, then root are transferred data to
According to the y values acquired, so that by carrying out the task that SQL operations execute data center to data.
Illustratively, as shown in Fig. 2, A, which represents data center, executes task, the outer ring of A represents data center's number
According to transfer.When initial, algorithm determines to execute task in data center A, after subsequent time determine to execute task, institute in D and E
It is transferred to data center D and data center E with the intermediate data for generating data center A, the two data center's task executions
After, then last task is executed in data center D, so the intermediate result of data center E is transferred to data center D,
Finally task is executed in data center D again.
4), paired data is divided to carry out SQL operations according to task, so that data are carried out mobile place when lower a moment operates
Reason;
In the present invention, when initial, data distribution has the execution of multiple queries mission requirements at this time in different data centers;
System is determined with the minimum target of system overall cost in some period for each task and data distribution
Task executes in which data center, and which data center is the data being performed should copy in.In lower a period of time
Between section, repeat the process.
The present invention redistributes the task and data of distributed system using FHC algorithms.
5) step 1) -4, is repeated), until completing all query tasks.
It should be noted that in the description of the present application, unless otherwise indicated, the meaning of " plurality " is two or two with
On.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes
It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion
Point, and the range of the preferred embodiment of the application includes other realization, wherein can not press shown or discuss suitable
Sequence, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be by the application
Embodiment person of ordinary skill in the field understood.
It should be appreciated that each section of the application can be realized with hardware, software, firmware or combination thereof.Above-mentioned
In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be executed with storage
Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware
Any one of row technology or their combination are realized:With the logic gates for realizing logic function to data-signal
Discrete logic, with suitable combinational logic gate circuit application-specific integrated circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA, Field-Programmable Gate Array) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries
Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium
In matter, which includes the steps that one or a combination set of embodiment of the method when being executed.
It is obvious to a person skilled in the art that will appreciate that above-mentioned Concrete facts example is the preferred side of the present invention
Case, therefore improvement, the variation that those skilled in the art may make certain parts in the present invention, embodiment is still this
The principle of invention, realization is still the purpose of the present invention, belongs to the range that the present invention is protected.
Claims (3)
1. a kind of data task redistribution method based on geographically distributed data query, which is characterized in that including:
1) multiple queries task, is obtained;
2), according to the state of query task and the data at current time, following information is predicted with statistical method,
And using fixed temporal scalable algorithm, minimum processing is carried out to the cost in following a period of time, obtains future time instance
Data processing centre;
3), by MFHC algorithms, task distribution is carried out, divides the storage location of paired data to be moved according to task, is moved to
Data processing centre;
4), paired data is divided to carry out SQL operations according to task, so that data are carried out mobile processing when lower a moment operates;
5) step 1) -4, is repeated), until completing all query tasks.
2. the data task redistribution method according to claim 1 based on geographically distributed data query, feature exist
In the step 2) specifically includes:
2.1, it obtains and exchanges bandwidth Cor, the calculating cost Ccom of operation task that initial data occupies, and to executing task
Data center reassigned caused by switching cost Csw:
Wherein, the D gathers for data center, and there are several regions in each data center;P is regional ensemble;G is task-set
It closes, each task g belongs to an analysis mode inquiry, and the inquiry of each analysis formula can be expressed as a DAG figure;OpFor region p
The data volume of initial data;lDC(p),dFor the data center where the p of region to the link cost of data center d;ck,gFor task g
Cost is calculated in the unit of data center k;bk,gIt is task g in the data center k total amount of data to be run;Ii,gExist for task g
The data volume of the data center i intermediate data to be run;xp,dFor two-valued variable, if the initial data of region p will be transferred to data
Otherwise center d, value 1 are 0;yd,gFor two-valued variable, if task g will be executed in data center d, value 1, otherwise for
0;
2.2, sum up the costs:
2.3, minimum processing is carried out to totle drilling cost by FHC algorithms:
subject to:For t=τ ... τ+ω
Two symbol X and Y being directed to are defined as:
Wherein, fpFor the minimum backup quantity of region p data;
FHC algorithms are by minimizing the totle drilling cost of [t, t+w] in the period, and so as to find out y, x is found out according to the value of y.
3. the data task redistribution method according to claim 2 based on geographically distributed data query, feature exist
In the step 3) specifically includes:
3.1, in t moment, by mfhc algorithms combination (w+1) a fhc algorithms as a result, being obtained using majority voting algorithm final
Y values determine the value of x further according to the value of y;
3.2, according to the x values that acquire, data is redistributed, specified data center is transferred data to, further according to asking
The y values obtained carry out the task that SQL operations execute data center so that passing through to data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810233064.4A CN108446383B (en) | 2018-03-21 | 2018-03-21 | Data task redistribution method based on geographic distributed data query |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810233064.4A CN108446383B (en) | 2018-03-21 | 2018-03-21 | Data task redistribution method based on geographic distributed data query |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108446383A true CN108446383A (en) | 2018-08-24 |
CN108446383B CN108446383B (en) | 2021-12-10 |
Family
ID=63195914
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810233064.4A Active CN108446383B (en) | 2018-03-21 | 2018-03-21 | Data task redistribution method based on geographic distributed data query |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108446383B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112256720A (en) * | 2020-10-21 | 2021-01-22 | 平安科技(深圳)有限公司 | Data cost calculation method, system, computer device and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090083098A1 (en) * | 2007-09-24 | 2009-03-26 | Yahoo! Inc. | System and method for an online auction with optimal reserve price |
CN103646073A (en) * | 2013-12-11 | 2014-03-19 | 浪潮电子信息产业股份有限公司 | Condition query optimizing method based on HBase table |
CN107273184A (en) * | 2017-06-14 | 2017-10-20 | 沈阳师范大学 | A kind of optimized algorithm migrated based on high in the clouds big data with processing cost |
CN107341820A (en) * | 2017-07-03 | 2017-11-10 | 郑州轻工业学院 | A kind of fusion Cuckoo search and KCF mutation movement method for tracking target |
-
2018
- 2018-03-21 CN CN201810233064.4A patent/CN108446383B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090083098A1 (en) * | 2007-09-24 | 2009-03-26 | Yahoo! Inc. | System and method for an online auction with optimal reserve price |
CN103646073A (en) * | 2013-12-11 | 2014-03-19 | 浪潮电子信息产业股份有限公司 | Condition query optimizing method based on HBase table |
CN107273184A (en) * | 2017-06-14 | 2017-10-20 | 沈阳师范大学 | A kind of optimized algorithm migrated based on high in the clouds big data with processing cost |
CN107341820A (en) * | 2017-07-03 | 2017-11-10 | 郑州轻工业学院 | A kind of fusion Cuckoo search and KCF mutation movement method for tracking target |
Non-Patent Citations (1)
Title |
---|
王春凯等: "分布式数据流关系查询技术研究", 《计算机学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112256720A (en) * | 2020-10-21 | 2021-01-22 | 平安科技(深圳)有限公司 | Data cost calculation method, system, computer device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108446383B (en) | 2021-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xu et al. | Asynchronous federated learning on heterogeneous devices: A survey | |
Liu et al. | From distributed machine learning to federated learning: A survey | |
Ma et al. | On safeguarding privacy and security in the framework of federated learning | |
Gu et al. | Cost minimization for big data processing in geo-distributed data centers | |
Chen et al. | Fedgraph: Federated graph learning with intelligent sampling | |
Luo et al. | QoE-driven computation offloading for edge computing | |
CN101359333A (en) | Parallel data processing method based on latent dirichlet allocation model | |
Mishra et al. | Nature-inspired cost optimisation for enterprise cloud systems using joint allocation of resources | |
Zhang et al. | DiGA: Population diversity handling genetic algorithm for QoS-aware web services selection | |
CN103942197A (en) | Data monitoring processing method and device | |
JP2021193568A (en) | Federation learning method and device for improving matching efficiency, electronic device, and medium | |
Bhowmik et al. | Distributed control plane for software-defined networks: A case study using event-based middleware | |
Wu et al. | An ensemble of random decision trees with local differential privacy in edge computing | |
Dong et al. | Joint optimization of energy and QoE with fairness in cooperative fog computing system | |
Raouf et al. | A predictive multi-tenant database migration and replication in the cloud environment | |
CN113014649B (en) | Cloud Internet of things load balancing method, device and equipment based on deep learning | |
CN108446383A (en) | A kind of data task redistribution method based on geographically distributed data query | |
Consul et al. | FLBCPS: federated learning based secured computation offloading in blockchain-assisted cyber-physical systems | |
Zhang | Storage optimization algorithm design of cloud computing edge node based on artificial intelligence technology | |
Zheng et al. | Shadowsync: Performing synchronization in the background for highly scalable distributed training | |
CN108055321B (en) | High-reliability cluster construction method based on localization platform | |
Li et al. | Parallel k-dominant skyline queries over uncertain data streams with capability index | |
Parra-Ullauri et al. | Federated Analytics for 6G Networks: Applications, Challenges, and Opportunities | |
Nanor et al. | FedSULP: A communication-efficient federated learning framework with selective updating and loss penalization | |
Mora et al. | Serverless computing at the edge for aiot applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |