CN105701209A - Load balancing method for improving parallel connection performance on big data - Google Patents

Load balancing method for improving parallel connection performance on big data Download PDF

Info

Publication number
CN105701209A
CN105701209A CN201610019840.1A CN201610019840A CN105701209A CN 105701209 A CN105701209 A CN 105701209A CN 201610019840 A CN201610019840 A CN 201610019840A CN 105701209 A CN105701209 A CN 105701209A
Authority
CN
China
Prior art keywords
data
data block
access
connection
temperature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610019840.1A
Other languages
Chinese (zh)
Inventor
葛微
李先贤
王利娥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Normal University
Original Assignee
Guangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Normal University filed Critical Guangxi Normal University
Priority to CN201610019840.1A priority Critical patent/CN105701209A/en
Publication of CN105701209A publication Critical patent/CN105701209A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a load balancing method for improving parallel connection performance on big data. The load balancing method comprises the following steps of 1) initializing and dividing mass data into data blocks according to a query result, wherein the divided data blocks comprise a plurality of table data which participate in connection and of which connection attribution conforms to a query condition, and the table data is organized and managed in a mode of data blocks; 2) recording access rate of the data blocks, accumulating heat degrees of the data blocks, and calculating average access duty ratio; 3) carrying out self-adaptive adjustment on the division of the data blocks according to the average access duty ratio of the data blocks, and triggering the combination and split of the data blocks according to a fitting degree of a query request; 4) uniformly distributing heat data onto each cluster node, and carrying out load balancing on a query task according to the heat degree; and 5) connecting the query request to be executed on each cluster node, counting connection results, and retuning the connection results to a client. By the method, the time efficiency of data query is improved, the balance of a connection query load is achieved, and the performance of connection operation is improved.

Description

A kind of improve the balancing method of loads of parallel join performance in big data
Technical field
The present invention relates to parallel load balancing technique in big data, specifically a kind of improve the balancing method of loads of parallel join performance in big data。
Background technology
Concatenation operation on relational database is to merge two tables in the horizontal direction, is namely combined by those row being mutually matched on corporate data item in two tables。In relational algebra, concatenation operation is chosen computing by a cartesian product computing and one and is constituted。First complete the multiplication to two data acquisition systems with cartesian product, then the results set generated is chosen computing, it is ensured that only respectively from two data acquisition systems and there is the row of lap combine。The whole meanings connected are in that: merge two data acquisition systems (usually table) in the horizontal direction, and produce a new results set, and its method is that the row that it mates by the row in a data source and the neutralization of another data source is combined into a new tuple。
Having urgent Connection inquiring demand in big data: the static datas such as user profile are saved in tables of data, the data such as click steam daily record and business diary are constantly be generated in a steady stream and accumulate。Analysis based on daily record data needs merging static data and dynamic data to be attached, and the result based on concatenation operation proceeds depth analysis。But, attended operation is the query manipulation that in data analysis, cost is significantly high, and the most original method needs all row of two tables are carried out cartesian product computing。On relational database, it is always up the focus of research for the optimisation strategy research of concatenation operation。Towards current big data environment, Connection inquiring optimization is more very urgent。Concatenation operation cost in mass data is excessive causes that the result of data analysis is substantially delayed。
The modal optimization means of concatenation operation under parallel computation environment is parallelization, will connect task distribution on each node of cluster, and allow concatenation operation executed in parallel, then summarized results on each node。Wherein, the load balance of data fragmentation and task is emphasis and the difficult point of algorithm: how to be distributed on each node of cluster by data, making for connection task, the data being distributed in each node can locally execute concatenation operation, and the task of clustered node can balancedly perform on each node。
For Nature Link (equivalent connection), typical method is that Hash connects。Concatenation operation has an important feature, namely participate in two data source R and S of concatenation operation on connection attribute, only to meet equal condition just can produce connection result, therefore, two data source R and S are carried out burst with identical hash function on connection attribute, can ensure that the data being only mapped to same node may produce connection result, and the data being mapped on different node do not have connection result。Based on such premise, Hash connects through data fragmentation and greatly reduces the cost of data cube computation under distributed environment。But, Hash connects the equalization problem not accounting for load, there is each node and calculate the unbalanced phenomenon of task, this can make the performance of parallel join have a greatly reduced quality, and what connect task completes to depend on the tasks carrying deadline of the slowest computing node in distributed environment。
2006, Google delivered MapReduce paper, it is proposed that the executed in parallel of United Dispatching processes framework。MapReduce can detect in cluster and perform slower task node, and when there being node to be finished, it can be assigned with together with slower task node and perform its task。Such scheduling strategy can improve executed in parallel and calculate the performance of task, is the remedial measure when task manager detects running delay。But, under big market demand scene, it is universal phenomenon that the access of data tilts, and a part of data are fairly frequently accessed, and major part data are seldom accessed。MapReduce parallel computation frame there is presently no and considers that the access frequency (task load in reflection data query) based on data carrys out scheduler task, so the load balance of task always just starts to remedy when node performs tilt phenomenon occur。
Summary of the invention
The present invention is directed to the deficiencies in the prior art, and provide a kind of and improve the balancing method of loads of parallel join performance in big data。This method can reduce the expense of data management and maintenance, improve the response time of data query, simultaneously, can accurately catch the access regularity of distribution of data, the access regularity of distribution according to data weighs data access load, realizing the balance of the query load of Connection inquiring and the balance of data distribution, making task uniform load between each node, thus improving the performance of concatenation operation。
The technical scheme realizing present invention is:
A kind of improve the balancing method of loads of parallel join performance in big data, comprise the steps:
1) mass data is divided into data block according to Query Result initialization: first the result " gathering in bulk " every time inquired about is divided data, this data block of the metadata record of management data block span on connection attribute, i.e. the start-stop value of a successive range;Data block after division includes the multiple table data participating in connection meeting querying condition on connection attribute, and data only just can be divided in bulk until first time is queried hit, and organizes in the way of data block;
2) record the accesss accounting rate of data block, calculate average access accounting rate, total temperature of accumulation node and data total amount: when inquiring about the access to data block and being all hit, the temperature of data block cumulative 1, namely 100%, when inquiring about the access to data block and being partial hit, value between the cumulative 0-1 of the temperature of data, namely data block is accessed for percentage ratio;
The record sum of the record strip number/data block in access accounting rate=this access data block;
The temperature of data block=
Average access accounting rate=The accessed number of times of/data block;
3) according to data block average access accounting rate in continuous Query, the division of data block is carried out self-adaptative adjustment, weigh data with this and divide and the fitting degree of inquiry request, come the merging of trigger data block, division according to the fitting degree of inquiry request;
4) data are uniformly distributed on each node of cluster so that on each node, the data of distribution all keep in a basic balance in space expense and temperature;What the temperature of data represented is the query load of Deta bearer, the data block that the data block that temperature is high is namely accessed frequently, after the self-adaptative adjustment of split degree, the cold and hot degree of data is weighed exactly by data temperature, on this basis, it is possible to realize the dynamic adjustment of Clusters Load Balance in real time, accurately and efficiently;Connection calculating is uniformly shared by each node in the cluster and parallelization performs, it is possible to makes the parallelization performance of cluster reach optimum;
5) task is distributed to each node of cluster equalizedly, and Connection inquiring request performs on each node of cluster, finally collects connection result and returns client。
This method optimizes the performance of Connection inquiring according to the access frequency of data and calculation cost, catches the access regularity of distribution with fitting data by recording the access frequency of data, and data are divided into according to data access rule the data block of different temperature。Space expense and temperature based on data, data are distributed on different node by we, the load making data access can be evenly distributed in the cluster, and such load balance scheduling strategy towards access frequency can make full use of the storage resource of each node of cluster and calculate resource。
This method divides data into block, data block skip list in addition organization and management, it is possible to reduce the expense of data management and maintenance, improve the response time of data query;Simultaneously, by the self-adaptative adjustment of deblocking, can accurately catch the access regularity of distribution of data, the access regularity of distribution measurement data access load according to data, realize the balance of the query load of Connection inquiring and the balance of data distribution, make task uniform load between each node, thus improve the performance of concatenation operation。
Accompanying drawing explanation
Fig. 1 is embodiment method flow schematic diagram;
Fig. 2 is that embodiment data are divided into continuous blocks schematic diagram;
Fig. 3 is that embodiment data block 37 divides schematic diagram;
Fig. 4 is that embodiment data block 32 and 37 merges schematic diagram;
Fig. 5 is that embodiment data divide fitting data access rule schematic diagram。
Detailed description of the invention
Below in conjunction with drawings and Examples, present disclosure is further elaborated, but is not limitation of the invention。
Embodiment:
Referring to Fig. 1, a kind of improve the balancing method of loads of parallel join performance in big data, comprise the steps:
1) mass data is divided into data block according to Query Result initialization: first the result " gathering in bulk " every time inquired about is divided data, this data block of the metadata record of management data block span on connection attribute, i.e. the start-stop value of a successive range;Data block after division includes the multiple table data participating in connection meeting querying condition on connection attribute, and data only just can be divided in bulk until first time is queried to, and organizes in the way of data block;
Referring to Fig. 2, if the querying attributes value scope of data is at 7-99;First the result " gathering in bulk " every time inquired about is divided data by the present embodiment, 7-13, 21-31, 32-36, 37-70, this 5 blocks of data of 85-99 is divided into data block when being queried hit first, the data block being divided into includes the multiple table data participating in connection meeting querying condition on connection attribute, two tables such as participating in connection are table R and table S, then the querying attributes value scope of table R and table S is all divided in same data block in the data of 7-13, this data block of the metadata record of management data block span on connection attribute, the i.e. start-stop value of a successive range, such as 37-70, data are divided in bulk when first time is queried hit, manage with block;But without the data being queried hit, this two segment data of such as 14-20,71-84 then will not by block management data;
2) record the access ratio of data block, calculate average access accounting rate, total temperature of accumulation node and data total amount: when the access of data block is all hit by inquiry, the temperature of data block cumulative 1, namely 100%, when inquiring about the access to data block and being partial hit, value between the cumulative 0-1 of the temperature of data, namely data block is accessed for percentage ratio;
The record sum of the record strip number/data block in access accounting rate=this access data block;
The temperature of data block=
Average access accounting rate=The accessed number of times of/data block;
3) division of data block is carried out self-adaptative adjustment by the average access accounting rate according to data block: data access is required for recording the access accounting rate of accessed data block every time, weigh data with this and divide the fitting degree with inquiry request, come the merging of trigger data block, division according to the fitting degree of inquiry request;The division of data block is carried out self-adaptative adjustment by the average access accounting rate according to data block, has divided, by merging, the self-adaptative adjustment that data divide, and the data block obtaining meeting the data access regularity of distribution divides;
The self-adapting regulation method of data block is as follows: if the average access accounting rate of data block is very low, data block 37 in such as Fig. 2, illustrates that the data record dependency in data block is only small for inquiry request sequence, often hit by fraction, need not flock together, it is therefore desirable to the division of trigger data block, referring to Fig. 3, data block 37 is split into three isometric data block: 37-47,48-58,59-70, temperature and access times are all inherited from former data block 37;
And if two continuous print data block temperatures and access times are closely, then the accessed data block of these continuous print is often hit simultaneously, and namely the dependency of data block is strong, it is necessary to the merging of trigger data block;Data block 32 in such as Fig. 3 and data block 37, after 2000 queried accesses, the access times of data block 32 and 37 and data block temperature all closely, trigger the merging of two data blocks, referring to Fig. 4;
Division and merging are the means that data block carries out self-adaptative adjustment, and through self-adaptative adjustment, the regularity of distribution dividing meeting progressively matching inquiry request sequence of data block, referring to Fig. 5;Data are organization and management in the way of data block, significantly reduces time and the space cost of management;
4) data are uniformly distributed on each node of cluster, the data temperature summation and the memory space summation that make distribution on each node keep in a basic balance: through starting stage several times split degree, general access through 2000-5000 time can reach to stablize, after data access, data divide and are trained to meet data access rule, division and merging all can be greatly decreased, then can data be distributed on each node of cluster;Referring to Fig. 4, data block 7,21,32,48,59,85 is all dsc data block, and data block 14 and 71 is cold data block, and their temperature is respectively as follows:
Data block 7 temperature=27.5,
Data block 14 temperature=0.8,
Data block 21 temperature=10.3,
Data block 32 temperature=10.875,
Data block 48 temperature=3.5,
Data block 59 temperature=3.5,
Data block 71 temperature=0.3,
Data block 85 temperature=25.8,
Their temperature is sorted, and is distributed on each node of cluster respectively, then:
Having data block 7 and 71 on node 1, temperature summation is 30.5,
Having data block 85,14 and 48, temperature summation on node 2 is 30.1,
Having data block 21,32 and 59, temperature summation on node 3 is 24.675;
Storage and management in big data need to pay close attention to dsc data, because dsc data is accessed frequently, the cost that dsc data is accessed largely affects systematic function, and the concatenation operation on dsc data is by each node equally loaded, and system can obtain optimal performance when load balancing;
5) task is born by each node equilibrium of cluster, and Connection inquiring request performs on each node of cluster, finally collects connection result and returns client。

Claims (1)

1. improve a balancing method of loads for parallel join performance in big data, comprise the steps:
Mass data is initialized according to Query Result and is divided into data block: first the result " gathering in bulk " every time inquired about is divided data, this data block of the metadata record of management data block span on connection attribute, i.e. the start-stop value of a successive range;Data block after division includes the multiple table data participating in connection meeting querying condition on connection attribute, and data only just can be divided in bulk until first time is queried to, and organizes in the way of data block;
The record access ratio of data block, the temperature of accumulation data block, calculating average access accounting rate: when inquiring about the access to data block and being all hit, the temperature of data block cumulative 1, when inquiring about the access to data block and being partial hit, value between the cumulative 0-1 of the temperature of data, namely data block is accessed for percentage ratio;
The record sum of the record strip number/data block in access accounting rate=this access data block;
The temperature of data block=
Average access accounting rate=The accessed number of times of/data block;
The division of data block is carried out self-adaptative adjustment by the average access accounting rate according to data block: data access is required for recording the access accounting rate of accessed data block every time, weigh data with this and divide the fitting degree with inquiry request, come the merging of trigger data block, division according to the fitting degree of inquiry request;
Data are uniformly distributed on each node of cluster so that the data temperature summation being distributed on each node and memory space summation keep in a basic balance, do the load balance of query task according to temperature;
Task is distributed to each node of cluster equalizedly, and Connection inquiring request performs on each node of cluster, finally collects connection result and returns client。
CN201610019840.1A 2016-01-13 2016-01-13 Load balancing method for improving parallel connection performance on big data Pending CN105701209A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610019840.1A CN105701209A (en) 2016-01-13 2016-01-13 Load balancing method for improving parallel connection performance on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610019840.1A CN105701209A (en) 2016-01-13 2016-01-13 Load balancing method for improving parallel connection performance on big data

Publications (1)

Publication Number Publication Date
CN105701209A true CN105701209A (en) 2016-06-22

Family

ID=56226131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610019840.1A Pending CN105701209A (en) 2016-01-13 2016-01-13 Load balancing method for improving parallel connection performance on big data

Country Status (1)

Country Link
CN (1) CN105701209A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106506665A (en) * 2016-11-18 2017-03-15 郑州云海信息技术有限公司 A kind of load-balancing method of distributed video monitoring system and platform
CN106656675A (en) * 2017-01-03 2017-05-10 北京奇虎科技有限公司 Method and device for detecting transmission node cluster
CN107315547A (en) * 2017-07-18 2017-11-03 郑州云海信息技术有限公司 A kind of method and device for reading distributed meta data file
CN107357659A (en) * 2017-07-04 2017-11-17 东北大学 Towards the group technology and querying method of Storm successive ranges inquiry GSLB
CN107357871A (en) * 2017-07-04 2017-11-17 东北大学 A kind of successive range query load equalization methods based on feedback towards Storm
CN107515899A (en) * 2017-07-24 2017-12-26 北京国电通网络技术有限公司 Database federation sharding method, device and storage medium
CN107888697A (en) * 2017-11-24 2018-04-06 北京航天自动控制研究所 A kind of node locking means in load-balancing algorithm
CN108920282A (en) * 2018-08-03 2018-11-30 北京科技大学 A kind of copy of content generation, placement and the update method of holding load equilibrium
CN109086133A (en) * 2018-07-06 2018-12-25 第四范式(北京)技术有限公司 Managing internal memory data and the method and system for safeguarding data in memory
CN110674086A (en) * 2019-09-29 2020-01-10 广州华多网络科技有限公司 Data merging method and device, electronic equipment and storage medium
CN111858657A (en) * 2020-07-21 2020-10-30 威讯柏睿数据科技(北京)有限公司 Method and equipment for accelerating data parallel query based on high-frequency data processing
CN116633870A (en) * 2023-05-25 2023-08-22 圣麦克思智能科技(江苏)有限公司 Operation and maintenance data processing system and method based on cloud end-added mode
CN116662013A (en) * 2023-06-28 2023-08-29 劳弗尔视觉科技有限公司 Load balancing processing method based on stripe division

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101504663A (en) * 2009-03-17 2009-08-12 北京大学 Swarm intelligence based spatial data copy self-adapting distribution method
CN101996250A (en) * 2010-11-15 2011-03-30 中国科学院计算技术研究所 Hadoop-based mass stream data storage and query method and system
US20140025638A1 (en) * 2011-03-22 2014-01-23 Zte Corporation Method, system and serving node for data backup and restoration

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101504663A (en) * 2009-03-17 2009-08-12 北京大学 Swarm intelligence based spatial data copy self-adapting distribution method
CN101996250A (en) * 2010-11-15 2011-03-30 中国科学院计算技术研究所 Hadoop-based mass stream data storage and query method and system
US20140025638A1 (en) * 2011-03-22 2014-01-23 Zte Corporation Method, system and serving node for data backup and restoration

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张常淳: ""基于MapReduce的大数据连接算法的设计与优化"", 《中国博士学位论文全文数据库 (电子期刊) 信息科技辑》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106506665B (en) * 2016-11-18 2019-09-24 郑州云海信息技术有限公司 A kind of load-balancing method and platform of distributed video monitoring system
CN106506665A (en) * 2016-11-18 2017-03-15 郑州云海信息技术有限公司 A kind of load-balancing method of distributed video monitoring system and platform
CN106656675A (en) * 2017-01-03 2017-05-10 北京奇虎科技有限公司 Method and device for detecting transmission node cluster
CN106656675B (en) * 2017-01-03 2020-01-21 北京奇虎科技有限公司 Detection method and device for transmission node cluster
CN107357871B (en) * 2017-07-04 2020-08-11 东北大学 Storm-oriented continuous range query load balancing method based on feedback
CN107357659B (en) * 2017-07-04 2020-09-29 东北大学 Grouping method and query method for query of global load balance in Storm continuous range
CN107357871A (en) * 2017-07-04 2017-11-17 东北大学 A kind of successive range query load equalization methods based on feedback towards Storm
CN107357659A (en) * 2017-07-04 2017-11-17 东北大学 Towards the group technology and querying method of Storm successive ranges inquiry GSLB
CN107315547A (en) * 2017-07-18 2017-11-03 郑州云海信息技术有限公司 A kind of method and device for reading distributed meta data file
CN107515899B (en) * 2017-07-24 2020-05-22 北京中电普华信息技术有限公司 Database joint fragmentation method and device and storage medium
CN107515899A (en) * 2017-07-24 2017-12-26 北京国电通网络技术有限公司 Database federation sharding method, device and storage medium
CN107888697A (en) * 2017-11-24 2018-04-06 北京航天自动控制研究所 A kind of node locking means in load-balancing algorithm
CN107888697B (en) * 2017-11-24 2020-07-14 北京航天自动控制研究所 Node locking method in load balancing algorithm
CN109086133B (en) * 2018-07-06 2019-08-30 第四范式(北京)技术有限公司 The method and system of data is safeguarded in memory
CN109086133A (en) * 2018-07-06 2018-12-25 第四范式(北京)技术有限公司 Managing internal memory data and the method and system for safeguarding data in memory
CN108920282A (en) * 2018-08-03 2018-11-30 北京科技大学 A kind of copy of content generation, placement and the update method of holding load equilibrium
CN110674086A (en) * 2019-09-29 2020-01-10 广州华多网络科技有限公司 Data merging method and device, electronic equipment and storage medium
CN111858657A (en) * 2020-07-21 2020-10-30 威讯柏睿数据科技(北京)有限公司 Method and equipment for accelerating data parallel query based on high-frequency data processing
CN116633870A (en) * 2023-05-25 2023-08-22 圣麦克思智能科技(江苏)有限公司 Operation and maintenance data processing system and method based on cloud end-added mode
CN116633870B (en) * 2023-05-25 2023-11-14 圣麦克思智能科技(江苏)有限公司 Operation and maintenance data processing system and method based on cloud end-added mode
CN116662013A (en) * 2023-06-28 2023-08-29 劳弗尔视觉科技有限公司 Load balancing processing method based on stripe division

Similar Documents

Publication Publication Date Title
CN105701209A (en) Load balancing method for improving parallel connection performance on big data
US7457835B2 (en) Movement of data in a distributed database system to a storage location closest to a center of activity for the data
US9152669B2 (en) System and method for distributed SQL join processing in shared-nothing relational database clusters using stationary tables
CN103106249B (en) A kind of parallel data processing system based on Cassandra
CN103345514A (en) Streamed data processing method in big data environment
CN103218404A (en) Multi-dimensional metadata management method and system based on association characteristics
US20220171792A1 (en) Ingestion partition auto-scaling in a time-series database
US20110055219A1 (en) Database management device and method
Zeng et al. Cost minimization for big data processing in geo-distributed data centers
US20220300323A1 (en) Job Scheduling Method and Job Scheduling Apparatus
CN111324429B (en) Micro-service combination scheduling method based on multi-generation ancestry reference distance
CN108304253A (en) Map method for scheduling task based on cache perception and data locality
WO2021190024A1 (en) Data stream equivalent connection optimization method and system, and electronic device
WO2021000694A1 (en) Method for deploying services and scheduling apparatus
CN111629216B (en) VOD service cache replacement method based on random forest algorithm under edge network environment
CN109165718A (en) Network reconstruction method and system based on paralleling ant cluster algorithm
WO2016092604A1 (en) Data processing system and data access method
CN106055674B (en) A kind of top-k under distributed environment based on metric space dominates querying method
Pham et al. Dilos: A dynamic integrated load manager and scheduler for continuous queries
CN107689876A (en) The distribution management method of metadata in distributed objects storage system
Sharaf et al. Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web.
Li et al. Mux-Kmeans: multiplex Kmeans for clustering large-scale data set
Kim et al. A machine cell formation algorithm for simultaneously minimising machine workload imbalances and inter-cell part movements
Kang et al. Estimating and enhancing real-time data service delays: Control-theoretic approaches
Chai et al. Profit-oriented task scheduling algorithm in Hadoop cluster

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160622

WD01 Invention patent application deemed withdrawn after publication