CN108446383A - A kind of data task redistribution method based on geographically distributed data query - Google Patents

A kind of data task redistribution method based on geographically distributed data query Download PDF

Info

Publication number
CN108446383A
CN108446383A CN201810233064.4A CN201810233064A CN108446383A CN 108446383 A CN108446383 A CN 108446383A CN 201810233064 A CN201810233064 A CN 201810233064A CN 108446383 A CN108446383 A CN 108446383A
Authority
CN
China
Prior art keywords
data
task
data center
query
cost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810233064.4A
Other languages
Chinese (zh)
Other versions
CN108446383B (en
Inventor
黄晶
黄蛟
高尚
杨博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN201810233064.4A priority Critical patent/CN108446383B/en
Publication of CN108446383A publication Critical patent/CN108446383A/en
Application granted granted Critical
Publication of CN108446383B publication Critical patent/CN108446383B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data task redistribution methods based on geographically distributed data query, the task and data of distributed system are redistributed using MFHC algorithms, the data at different data center are rearranged and the data center for executing query task is rearranged, a task is executed in different data centers.To achieve the purpose that reduce the totle drilling cost of analysis mode query process consumption, and solve the problems, such as that partial data can not be transferred to other data centers because of privacy or other limitations.The totle drilling cost for reducing inquiry solves the problems, such as partial data because privacy or other limitations can not be transferred to other data centers, improves the inquiry velocity of task, save cost.

Description

A kind of data task redistribution method based on geographically distributed data query
Technical field
The present invention relates to communication technique field more particularly to a kind of data task weights based on geographically distributed data query Distribution method.
Background technology
With the development of network, in the big data epoch, data information is very universal, is carrying out data query process In, in order to improve the speed of user access server and reduce the bandwidth of data transmission occupancy, many companies all build in the whole world Found their data center.Such as Microsoft and Amazon, establish tens data centers in the whole world.These are in different zones Data center constantly generates a large amount of data, such as User Activity daily record, server admin daily record and performance logs etc..Analysis These data being distributed in the data center of different zones are a critically important job.Such as analysis user inquires day Will can advertise decision, and analysis network log can detect dos attacks, and analysis system daily record can establish prediction model etc.. But analysis is carried out to the data for being distributed in different data center and needs cost, this cost includes mainly that data move Link cost and data center carry out task computation cost.For incorporated business, it is to minimize this cost very much It is necessary to.
The main of the prior art uses centralized approach, i.e., the data that will be analyzed are passed from the data center of different zones Then defeated to one data center carries out data analysis in this data center.The shortcomings that this method, has:1) for present needle Application to large-scale data, centralized approach need to transmit a large amount of data, cause serious bandwidth waste, and expend Time is also very long.2) there are the data that some data centers preserve to be related to privacy concern, can not arbitrarily be transferred in other data The heart.
Invention content
In view of the foregoing drawbacks or insufficient, the purpose of the present invention is to provide a kind of numbers based on geographically distributed data query According to task redistribution method, rearranged by the data to different data center and to execute query task data Center is rearranged, and a task is executed in different data centers, reduces the total of analysis mode query process consumption Cost.
To achieve the above objectives, the technical scheme is that:
A kind of data task redistribution method based on geographically distributed data query, including:
1) multiple queries task, is obtained;
2), according to the state of query task and the data at current time, following information is carried out with statistical method Prediction, and using fixed temporal scalable algorithm, minimum processing is carried out to the cost in following a period of time, when obtaining following The data processing centre at quarter;
3), by MFHC algorithms, task distribution is carried out, divides the storage location of paired data to be moved according to task, moved Move data processing centre;
4), paired data is divided to carry out SQL operations according to task, so that data are carried out mobile place when lower a moment operates Reason;
5) step 1) -4, is repeated), until completing all query tasks.
The step 2) specifically includes:
2.1, it obtains and exchanges bandwidth Cor, the calculating cost Ccom of operation task that initial data occupies, and appoint to executing The data center of business reassigned caused by switching cost Csw:
Wherein, the D gathers for data center, and there are several regions in each data center;P is regional ensemble;G is task Set, each task g belong to an analysis mode inquiry, and the inquiry of each analysis formula can be expressed as a DAG figure;OpFor region The data volume of p initial data;lDC(p),dFor the data center where the p of region to the link cost of data center d;ck,gFor task g Cost is calculated in the unit of data center k;bk,gIt is task g in the data center k total amount of data to be run;Ii,gExist for task g The data volume of the data center i intermediate data to be run;xp,dFor two-valued variable, if the initial data of region p will be transferred to data Otherwise center d, value 1 are 0;yd,gFor two-valued variable, if task g will be executed in data center d, value 1, otherwise for 0;
2.2, sum up the costs:
2.3, minimum processing is carried out to totle drilling cost by FHC algorithms:
subject to:For t=τ .. τ+ω
Two symbol X and Y being directed to are defined as:
Wherein, fpFor the minimum backup quantity of region p data;
FHC algorithms, so as to find out y, are asked by being minimized to the totle drilling cost of [t, t+w] in the period according to the value of y Go out x.
The step 3) specifically includes:
3.1, in t moment, by mfhc algorithms combination (w+1) a fhc algorithms as a result, being obtained most using majority voting algorithm Whole y values, the value of x is determined further according to the value of y;
3.2, according to the x values acquired, data is redistributed, specified data center, then root are transferred data to According to the y values acquired, so that by carrying out the task that SQL operations execute data center to data.
Compared with the prior art, beneficial effects of the present invention are:
The present invention provides a kind of data task redistribution methods based on geographically distributed data query, are calculated using MFHC Method redistributes the task and data of distributed system, is rearranged to the data at different data center and right The data center for executing query task is rearranged, and a task is executed in different data centers.Subtracted with reaching The purpose of the totle drilling cost of few analysis mode query process consumption, and partial data is solved because privacy or other limitations can not be transferred to The problem of other data centers.The totle drilling cost for reducing inquiry solves partial data because privacy or other limitations can not pass Defeated the problem of arriving other data centers, the inquiry velocity of task is improved, cost is saved.
Description of the drawings
Fig. 1 is the data task redistribution method flow chart based on geographically distributed data query of the present invention;
Fig. 2 is the control flow block diagram of the present invention;
Fig. 3 is the MFHC algorithm flow charts of the present invention.
Specific implementation mode
The present invention is described in detail below in conjunction with attached drawing, it is clear that described embodiment is only the present invention one Divide embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making The every other embodiment obtained under the premise of creative work, belongs to protection scope of the present invention.
As shown in Figure 1, the present invention provides a kind of data task redistribution methods based on geographically distributed data query:
1) multiple queries task, is obtained;
2), according to the state of query task and the data at current time, following information is carried out with statistical method Prediction, and using fixed temporal scalable algorithm (fixed horizon control, FHC) in following a period of time at This carries out minimum processing, obtains the data processing centre of future time instance;
The step 2) specifically includes:
2.1, it obtains and exchanges bandwidth Cor, the calculating cost Ccom of operation task that initial data occupies, and appoint to executing The data center of business reassigned caused by switching cost Csw:
The D gathers for data center, and there are several regions in each data center;P is regional ensemble;G is set of tasks, Each task g belongs to an analysis mode inquiry, and the inquiry of each analysis formula can be expressed as a DAG figure;OpIt is original for region p The data volume of data;lDC(p),dFor the data center where the p of region to the link cost of data center d;ck,gIt is task g in number Cost is calculated according to the unit of center k;bk,gIt is task g in the data center k total amount of data to be run;Ii,gIt is task g in data The data volume of the center i intermediate data to be run;xp,dFor two-valued variable, if the initial data of region p will be transferred to data center Otherwise d, value 1 are 0;yd,gFor two-valued variable, if task g will be executed in data center d, otherwise value 1 is 0;
2.2, sum up the costs:
2.3, minimum processing is carried out to totle drilling cost by FHC algorithms:
subject to:For t=τ ... τ+ω
Two symbol X and Y being directed to are defined as:
Wherein, fpFor the minimum backup quantity of region p data;
FHC algorithms, so as to find out y, are asked by being minimized to the totle drilling cost of [t, t+w] in the period according to the value of y Go out x.
3), by MFHC algorithms, task distribution is carried out, divides the storage location of paired data to be moved according to task, moved Move data processing centre;
As shown in figure 3, the step 3) specifically includes:
3.1, in t moment, by mfhc algorithms combination (w+1) a fhc algorithms as a result, being obtained most using majority voting algorithm Whole y values, the value of x is determined further according to the value of y;
3.2, according to the x values acquired, data is redistributed, specified data center, then root are transferred data to According to the y values acquired, so that by carrying out the task that SQL operations execute data center to data.
Illustratively, as shown in Fig. 2, A, which represents data center, executes task, the outer ring of A represents data center's number According to transfer.When initial, algorithm determines to execute task in data center A, after subsequent time determine to execute task, institute in D and E It is transferred to data center D and data center E with the intermediate data for generating data center A, the two data center's task executions After, then last task is executed in data center D, so the intermediate result of data center E is transferred to data center D, Finally task is executed in data center D again.
4), paired data is divided to carry out SQL operations according to task, so that data are carried out mobile place when lower a moment operates Reason;
In the present invention, when initial, data distribution has the execution of multiple queries mission requirements at this time in different data centers; System is determined with the minimum target of system overall cost in some period for each task and data distribution Task executes in which data center, and which data center is the data being performed should copy in.In lower a period of time Between section, repeat the process.
The present invention redistributes the task and data of distributed system using FHC algorithms.
5) step 1) -4, is repeated), until completing all query tasks.
It should be noted that in the description of the present application, unless otherwise indicated, the meaning of " plurality " is two or two with On.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the application includes other realization, wherein can not press shown or discuss suitable Sequence, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be by the application Embodiment person of ordinary skill in the field understood.
It should be appreciated that each section of the application can be realized with hardware, software, firmware or combination thereof.Above-mentioned In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be executed with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of row technology or their combination are realized:With the logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit application-specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA, Field-Programmable Gate Array) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium In matter, which includes the steps that one or a combination set of embodiment of the method when being executed.
It is obvious to a person skilled in the art that will appreciate that above-mentioned Concrete facts example is the preferred side of the present invention Case, therefore improvement, the variation that those skilled in the art may make certain parts in the present invention, embodiment is still this The principle of invention, realization is still the purpose of the present invention, belongs to the range that the present invention is protected.

Claims (3)

1. a kind of data task redistribution method based on geographically distributed data query, which is characterized in that including:
1) multiple queries task, is obtained;
2), according to the state of query task and the data at current time, following information is predicted with statistical method, And using fixed temporal scalable algorithm, minimum processing is carried out to the cost in following a period of time, obtains future time instance Data processing centre;
3), by MFHC algorithms, task distribution is carried out, divides the storage location of paired data to be moved according to task, is moved to Data processing centre;
4), paired data is divided to carry out SQL operations according to task, so that data are carried out mobile processing when lower a moment operates;
5) step 1) -4, is repeated), until completing all query tasks.
2. the data task redistribution method according to claim 1 based on geographically distributed data query, feature exist In the step 2) specifically includes:
2.1, it obtains and exchanges bandwidth Cor, the calculating cost Ccom of operation task that initial data occupies, and to executing task Data center reassigned caused by switching cost Csw:
Wherein, the D gathers for data center, and there are several regions in each data center;P is regional ensemble;G is task-set It closes, each task g belongs to an analysis mode inquiry, and the inquiry of each analysis formula can be expressed as a DAG figure;OpFor region p The data volume of initial data;lDC(p),dFor the data center where the p of region to the link cost of data center d;ck,gFor task g Cost is calculated in the unit of data center k;bk,gIt is task g in the data center k total amount of data to be run;Ii,gExist for task g The data volume of the data center i intermediate data to be run;xp,dFor two-valued variable, if the initial data of region p will be transferred to data Otherwise center d, value 1 are 0;yd,gFor two-valued variable, if task g will be executed in data center d, value 1, otherwise for 0;
2.2, sum up the costs:
2.3, minimum processing is carried out to totle drilling cost by FHC algorithms:
subject to:For t=τ ... τ+ω
Two symbol X and Y being directed to are defined as:
Wherein, fpFor the minimum backup quantity of region p data;
FHC algorithms are by minimizing the totle drilling cost of [t, t+w] in the period, and so as to find out y, x is found out according to the value of y.
3. the data task redistribution method according to claim 2 based on geographically distributed data query, feature exist In the step 3) specifically includes:
3.1, in t moment, by mfhc algorithms combination (w+1) a fhc algorithms as a result, being obtained using majority voting algorithm final Y values determine the value of x further according to the value of y;
3.2, according to the x values that acquire, data is redistributed, specified data center is transferred data to, further according to asking The y values obtained carry out the task that SQL operations execute data center so that passing through to data.
CN201810233064.4A 2018-03-21 2018-03-21 Data task redistribution method based on geographic distributed data query Active CN108446383B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810233064.4A CN108446383B (en) 2018-03-21 2018-03-21 Data task redistribution method based on geographic distributed data query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810233064.4A CN108446383B (en) 2018-03-21 2018-03-21 Data task redistribution method based on geographic distributed data query

Publications (2)

Publication Number Publication Date
CN108446383A true CN108446383A (en) 2018-08-24
CN108446383B CN108446383B (en) 2021-12-10

Family

ID=63195914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810233064.4A Active CN108446383B (en) 2018-03-21 2018-03-21 Data task redistribution method based on geographic distributed data query

Country Status (1)

Country Link
CN (1) CN108446383B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256720A (en) * 2020-10-21 2021-01-22 平安科技(深圳)有限公司 Data cost calculation method, system, computer device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083098A1 (en) * 2007-09-24 2009-03-26 Yahoo! Inc. System and method for an online auction with optimal reserve price
CN103646073A (en) * 2013-12-11 2014-03-19 浪潮电子信息产业股份有限公司 Condition query optimizing method based on HBase table
CN107273184A (en) * 2017-06-14 2017-10-20 沈阳师范大学 A kind of optimized algorithm migrated based on high in the clouds big data with processing cost
CN107341820A (en) * 2017-07-03 2017-11-10 郑州轻工业学院 A kind of fusion Cuckoo search and KCF mutation movement method for tracking target

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083098A1 (en) * 2007-09-24 2009-03-26 Yahoo! Inc. System and method for an online auction with optimal reserve price
CN103646073A (en) * 2013-12-11 2014-03-19 浪潮电子信息产业股份有限公司 Condition query optimizing method based on HBase table
CN107273184A (en) * 2017-06-14 2017-10-20 沈阳师范大学 A kind of optimized algorithm migrated based on high in the clouds big data with processing cost
CN107341820A (en) * 2017-07-03 2017-11-10 郑州轻工业学院 A kind of fusion Cuckoo search and KCF mutation movement method for tracking target

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王春凯等: "分布式数据流关系查询技术研究", 《计算机学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256720A (en) * 2020-10-21 2021-01-22 平安科技(深圳)有限公司 Data cost calculation method, system, computer device and storage medium

Also Published As

Publication number Publication date
CN108446383B (en) 2021-12-10

Similar Documents

Publication Publication Date Title
Xu et al. Asynchronous federated learning on heterogeneous devices: A survey
Liu et al. From distributed machine learning to federated learning: A survey
Ma et al. On safeguarding privacy and security in the framework of federated learning
Gu et al. Cost minimization for big data processing in geo-distributed data centers
Chen et al. Fedgraph: Federated graph learning with intelligent sampling
Luo et al. QoE-driven computation offloading for edge computing
CN101359333A (en) Parallel data processing method based on latent dirichlet allocation model
Mishra et al. Nature-inspired cost optimisation for enterprise cloud systems using joint allocation of resources
Zhang et al. DiGA: Population diversity handling genetic algorithm for QoS-aware web services selection
CN103942197A (en) Data monitoring processing method and device
JP2021193568A (en) Federation learning method and device for improving matching efficiency, electronic device, and medium
Bhowmik et al. Distributed control plane for software-defined networks: A case study using event-based middleware
Wu et al. An ensemble of random decision trees with local differential privacy in edge computing
Dong et al. Joint optimization of energy and QoE with fairness in cooperative fog computing system
Raouf et al. A predictive multi-tenant database migration and replication in the cloud environment
CN113014649B (en) Cloud Internet of things load balancing method, device and equipment based on deep learning
CN108446383A (en) A kind of data task redistribution method based on geographically distributed data query
Consul et al. FLBCPS: federated learning based secured computation offloading in blockchain-assisted cyber-physical systems
Zhang Storage optimization algorithm design of cloud computing edge node based on artificial intelligence technology
Zheng et al. Shadowsync: Performing synchronization in the background for highly scalable distributed training
CN108055321B (en) High-reliability cluster construction method based on localization platform
Li et al. Parallel k-dominant skyline queries over uncertain data streams with capability index
Parra-Ullauri et al. Federated Analytics for 6G Networks: Applications, Challenges, and Opportunities
Nanor et al. FedSULP: A communication-efficient federated learning framework with selective updating and loss penalization
Mora et al. Serverless computing at the edge for aiot applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant