CN104618406A - Load balancing algorithm based on naive Bayesian classification - Google Patents

Load balancing algorithm based on naive Bayesian classification Download PDF

Info

Publication number
CN104618406A
CN104618406A CN201310536971.3A CN201310536971A CN104618406A CN 104618406 A CN104618406 A CN 104618406A CN 201310536971 A CN201310536971 A CN 201310536971A CN 104618406 A CN104618406 A CN 104618406A
Authority
CN
China
Prior art keywords
node
load
classification
load balancing
idle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310536971.3A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZHENJIANG HUAYANG INFORMATION TECHNOLOGY CO LTD
Original Assignee
ZHENJIANG HUAYANG INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZHENJIANG HUAYANG INFORMATION TECHNOLOGY CO LTD filed Critical ZHENJIANG HUAYANG INFORMATION TECHNOLOGY CO LTD
Priority to CN201310536971.3A priority Critical patent/CN104618406A/en
Publication of CN104618406A publication Critical patent/CN104618406A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1029Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers using data related to the state of servers by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1025Dynamic adaptation of the criteria on which the server selection is based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • H04L67/5682Policies or rules for updating, deleting or replacing the stored data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Multi Processors (AREA)

Abstract

Provided is an algorithm based on naive Bayesian classification. Node dividing feature parameters and a partition method are given, and corresponding task allocation strategies and balancing strategies are adopted for nodes of different divided regions. The efficiency of cloud task execution is effectively improved, and the purpose of clustered load balancing is achieved.

Description

A kind of load-balancing algorithm based on Naive Bayes Classification
Technical field
This algorithm relates to relate to a kind of algorithm field in computer.
Background technology
Cloud computing is a kind of new computation schema after distributed treatment, parallel processing, grid computing, and its core is undertaken virtual by the computational resource of large-scale data center, and to provide to user with computational resource be the service of form.Along with the data volume in the Internet grows with each passing day, cloud computing environment (abbreviation cloud environment) must possess the ability providing a large amount of Concurrency Access to serve, how the overall process load " rationally " in cloud environment is assigned on each node, avoiding the disposal ability of each node and I/O ability to become the bottleneck that cloud environment provides service, is one of hot issue of cloud computing research field.At present, main employing load-balancing technique, by adjusting the process load distribution condition on each node, carry out internodal load balance, thus maximally utilise system resource, to realize the maximization of user's service and scalability in existing load-balancing technique research, realize the difference of load balancing mode according to it, mainly can be divided into static load balancing technology and dynamic load equilibrium technology.Static load balancing utilizes existing mission bit stream, according to parameters such as system hardwares, selects suitable node, distributes, executes the task by dispatching algorithm.Dynamic load equilibrium technology then according to system current state or recently state go to determine how to each peer distribution task of distributed system, if node tasks overloads, overload task will be transferred to other nodes and perform, as the load-balancing algorithm based on ant group algorithm, based on the load-balancing algorithm of greedy algorithm [5]deng.In addition, document [6] proposes a kind of adaptive load balancing algorithm based on real-time migration of virtual machine, by current data and historical data, to systematic influence degree after prediction virtual machine (vm) migration, thus selects effective mobility strategy.Load balancing based on mobile agent in distributed system is then applied to cloud environment by document [7], utilizes agent polling mechanism collector node information, carries out load migration between node.Prediction is difficult to because each task takies resource, and each node processing power is different, dynamic load equilibrium technology, more can according to the change of systematic function compared with static load balancing technology, dynamically adjust the load distribution condition of each node, load balancing effect is better.But when realizing dynamic dispatching, owing to not carrying out Efficient Evaluation to the loading condition of node each in cloud environment, make the allocation schedule of task unreasonable, easily cause task " shake " on a large scale situation between each clustered node, cause a large amount of overhead to system.For the problems referred to above, in conjunction with the feature of cloud computing distributed parallel, a kind of load-balancing technique based on Naive Bayes Classification will be proposed herein.First, periodic collection is carried out to the heartbeat packet information in cloud environment, and adopt NB Algorithm to divide each node load state; Then, root node, according to division result, realizes the rational management of task and Resourse Distribute, effectively realizes each node load balancing in cloud environment, improves the performance of cluster.
Summary of the invention
definition 1load characteristic value: the static load characteristic sum dynamic load feature of cloud environment interior joint s, for reflecting overall load situation when node runs, is designated as WV (s).
When calculating each node load characteristic value, the load characteristic attribute that we set node mainly comprises four aspects such as CPU, internal memory, disk and network.Wherein, CPU and memory attribute reflect the loading condition in present node task processes; Disk Properties reflection present node I/O loading condition; Network attribute then reflects reception and the transmission situation of node tasks.To the concrete definition of each load characteristic value attribute be provided below:
definition 2cpu load characteristic value: to set in the operation queue of the CPU of cloud environment interior joint s charter as C 1, CPU contextual processing rate is C 2, idle CPU time percentage is C 3, then its cpu load characteristic value WV c(s) be:
Wherein, be one group of weight coefficient, and, according to different application and demand dynamic adjustment weight.
definition 3internal memory load characteristic value: the physical memory size R establishing the internal memory of cloud environment interior joint s 1, virtual memory size R 2, free storage size R 3, then its internal memory load characteristic value is WV r(s) be:
Wherein, be one group of weight coefficient, and, according to different application and demand dynamic adjustment weight.
definition 4disk load characteristic value: set the disk utilization of cloud environment interior joint s as D 1, disk access speeds is D 2, disk queue length is D 3, then its disk load characteristic value WV d(s) be:
Wherein, be one group of weight coefficient, and, according to different application and demand dynamic adjustment weight.
definition 5offered load characteristic value: set the network round-trip time delay of cloud environment interior joint s as N 1with network bandwidth N 2, then its offered load characteristic value WV ns () is, then has:
Wherein, be one group of weight coefficient, and, according to different application and the dynamic adjustable weight of demand.
In addition, in order to carry out Bayes's classification to the loading condition of node s, according to above-mentioned definition, the formal definition of training sample is as follows:
definition 6training sample: in cloud environment system is < WV for carrying out the training sample form of Bayes's classification to the loading condition of node s c(s), WV r(s), WV d(s), WV n(s), T (s) > five-tuple, wherein WV c(s), WV r(s), WV d(s) and WV ns () is respectively the load characteristic value of CPU, internal memory, disk and network; T (s) ∈ T(T is training sample category set, is designated as T={ T 1, T 2, T 3, T 1represent idle condition, T respectively 2normal load state, T 3overload).
2.2 based on the sorting algorithm of naive Bayesian
Naive Bayesian (Bayes) sorting technique is theoretical foundation with Bayes' theorem, is the mode identification method when known prior probability and conditional probability.Compare (as artificial neural net, decision tree etc.) with other sorting algorithms, Naive Bayes Classification Algorithm is more simple effectively, is more suitable for the parallel processing mechanism of cloud environment.
If sample space is U, characteristic attribute set is X={ WV c, WV r, WV d, WV n, training sample state classification set T={ T 1, T 2, T 3, characteristic value is according to normal load threshold alpha and overloading threshold β, and the concrete steps of Naive Bayes Classification are as follows:
Step1: design conditions probability density parameter and prior probability;
(1) according to training sample, compute classes conditional probability density parameter.
(2) in statistical sample set, the sample number of each state is ∑ T j, different characteristic belongs to the sample number ∑ (X in certain interval respectively under a certain state i[t] | T j).
Calculate the prior probability of characteristic attribute of all categories;
Wherein X i{ C, R, D, N}, t represent that the difference according to feature value division is interval, T j{ L, H, M}; X i[t] represents that certain load characteristic is in different interval sample.
Then, formula is utilized
Change prior probability into posterior probability.
(3) This document assumes that each characteristic attribute is separate, then posterior probability is calculated according to Bayes' theorem:
Finally, posterior probability is utilized to carry out Decision Classfication
(4) Decision Classfication:
To sum up, the classification function of whole system can be expressed as:
According to formula (9), present node state classification function can be obtained, be expressed as
2.4 load balancing
For all child node node in cloud environment, before each transmission heartbeat heartbeat packet, free time is utilized to calculate node s load characteristic value WV c(s), WV r(s), WV d(s) and WV n(s); Then, load characteristic value is sent to root node with heartbeat packet, root node, according to load balancing, feeds back the task requests of each child node, and the optimization realizing task distributes, and reaches the object of Clusters Load Balance.For improve response speed, root node is after receiving heartbeat packet, and the division result according to last time carrys out feedback command, and this collect information, then at one's leisure between calculate, next time send.Concrete equilibrium strategy is as follows:
Root node receives after node node is sent to heartbeat packet, according to decision node state classification, and will process by following three kinds of strategies:
(1) if node is idle condition, then respond the task requests of this node, and send load migration instruction simultaneously---receiver-in itiated strategy;
(2) if node is normal condition, then with certain probability respondence task requests.For improving cluster balance benefit, if when current idle node occupies the majority, then suitably improve response probability, accelerate the speed of performing task; If present overload node is most, then reduce response probability, control cluster overall load, avoid cluster overall load overweight;
(3) if node is overload, then allocating task is refused.
2.5 load migration strategies
According to the division result of Naive Bayes Classifier formula (8), root node carries out load migration to the strategy that corresponding child node takes sender to start and receiver-in itiated combines to overload node.Startup used herein and reception strategy are all localities, avoid large-scale node migrates, improve transport efficiency.Specific strategy is as follows:
(1) idle node receiver-in itiated strategy:
A, idle node find overload node within the scope of distance μ;
B, discovery overload node, then according to Hadoop migration strategy, will transship node section load migration to idle node;
After C, migration terminate, notice root node, root node recalculates the load of idle node, overload node according to formula (8), and divides;
If D overload node still transships, and idle node is still idle, then continue to perform step B; Otherwise, sender is started to overload node and starts strategy;
If E idle node is still idle, and do not travel through all nodes in scope μ, then continue steps A; Otherwise, perform F;
F, stopping traversal, stop receiver-in itiated strategy.
(2) node sender of transshipping starts strategy:
A, overload node find idle node within the scope of distance ρ;
B, discovery idle node, then according to Hadoop migration strategy, be transferred to idle node by own partial load;
After C, migration terminate, notice root node, root node recalculates the load of this overload node and idle node according to formula (8), and divides;
If D overload node still transships, and idle node is still idle, then continue to perform step B; Otherwise, perform steps A, find new idle node;
If E transship node normally or overload node traveled through all idle node within the scope of distance ρ, then stopping sender starting strategy.
Its middle distance μ and distance ρ regulates according to cluster concrete condition, avoids large-scale load migration.And this migration strategy hypothesis idle node can not directly transfer overload node to.Adopt distcp to walk abreast in transition process to copy, improve transport efficiency.

Claims (3)

1. based on a load-balancing algorithm for Naive Bayes Classification, being theoretical foundation with Bayes' theorem, is the mode identification method when known prior probability and conditional probability.
2. compare (as artificial neural net, decision tree etc.) with other sorting algorithms, Naive Bayes Classification Algorithm is more simple effectively, is more suitable for the parallel processing mechanism of cloud environment.
3. set sample space as U according to claim 1 item, characteristic attribute set is X={ WV c, WV r, WV d, WV n, training sample state classification set T={ T 1, T 2, T 3, characteristic value is according to normal load threshold alpha and overloading threshold.
CN201310536971.3A 2013-11-05 2013-11-05 Load balancing algorithm based on naive Bayesian classification Pending CN104618406A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310536971.3A CN104618406A (en) 2013-11-05 2013-11-05 Load balancing algorithm based on naive Bayesian classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310536971.3A CN104618406A (en) 2013-11-05 2013-11-05 Load balancing algorithm based on naive Bayesian classification

Publications (1)

Publication Number Publication Date
CN104618406A true CN104618406A (en) 2015-05-13

Family

ID=53152681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310536971.3A Pending CN104618406A (en) 2013-11-05 2013-11-05 Load balancing algorithm based on naive Bayesian classification

Country Status (1)

Country Link
CN (1) CN104618406A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106198481A (en) * 2016-09-29 2016-12-07 安徽理工大学 Fake wine identification device and method based on LIF technology and Naive Bayes Classification
CN106484496A (en) * 2016-10-28 2017-03-08 云南大学 Virtual machine BOTTOM LAYER ENVIRONMENT feature analysiss based on Bayesian network and performance metric method
CN106888237A (en) * 2015-12-15 2017-06-23 中国移动通信集团公司 A kind of data dispatching method and system
CN108664580A (en) * 2018-05-04 2018-10-16 西安邮电大学 Fine-grained load-balancing method and system in a kind of MongoDB databases
CN109711526A (en) * 2018-12-20 2019-05-03 广东工业大学 Server cluster dispatching method based on SVM and ant group algorithm
CN110390345A (en) * 2018-04-20 2019-10-29 复旦大学 A kind of big data cluster adaptive resource dispatching method based on cloud platform
CN110519347A (en) * 2019-08-15 2019-11-29 南京南瑞信息通信科技有限公司 A kind of load-balancing method and system of the more application server systems of isomery
CN113342510A (en) * 2021-08-05 2021-09-03 国能大渡河大数据服务有限公司 Water and power basin emergency command cloud-side computing resource cooperative processing method
CN113867960A (en) * 2021-09-30 2021-12-31 丝路信息港云计算科技有限公司 Cloud load balancing hybrid model based on file types

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298646A (en) * 2011-09-21 2011-12-28 苏州大学 Method and device for classifying subjective text and objective text
CN102523158A (en) * 2011-12-15 2012-06-27 杭州电子科技大学 Metadata server cluster load balancing method based on weight

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298646A (en) * 2011-09-21 2011-12-28 苏州大学 Method and device for classifying subjective text and objective text
CN102523158A (en) * 2011-12-15 2012-06-27 杭州电子科技大学 Metadata server cluster load balancing method based on weight

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李静梅 等: "一种文本处理中的朴素贝叶斯分类器", 《哈尔滨工程大学学报》 *
程春玲 等: "一种面向云计算的分态式自适应负载均衡策略", 《南京邮电大学学报(自然科学版)》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106888237A (en) * 2015-12-15 2017-06-23 中国移动通信集团公司 A kind of data dispatching method and system
CN106888237B (en) * 2015-12-15 2020-01-07 中国移动通信集团公司 Data scheduling method and system
CN106198481A (en) * 2016-09-29 2016-12-07 安徽理工大学 Fake wine identification device and method based on LIF technology and Naive Bayes Classification
CN106198481B (en) * 2016-09-29 2020-01-24 安徽理工大学 Fake wine identification device and method based on LIF technology and naive Bayes classification
CN106484496A (en) * 2016-10-28 2017-03-08 云南大学 Virtual machine BOTTOM LAYER ENVIRONMENT feature analysiss based on Bayesian network and performance metric method
CN106484496B (en) * 2016-10-28 2019-08-20 云南大学 Virtual machine BOTTOM LAYER ENVIRONMENT signature analysis and performance metric method based on Bayesian network
CN110390345A (en) * 2018-04-20 2019-10-29 复旦大学 A kind of big data cluster adaptive resource dispatching method based on cloud platform
CN110390345B (en) * 2018-04-20 2023-08-22 复旦大学 Cloud platform-based big data cluster self-adaptive resource scheduling method
CN108664580A (en) * 2018-05-04 2018-10-16 西安邮电大学 Fine-grained load-balancing method and system in a kind of MongoDB databases
CN109711526A (en) * 2018-12-20 2019-05-03 广东工业大学 Server cluster dispatching method based on SVM and ant group algorithm
CN110519347A (en) * 2019-08-15 2019-11-29 南京南瑞信息通信科技有限公司 A kind of load-balancing method and system of the more application server systems of isomery
CN113342510A (en) * 2021-08-05 2021-09-03 国能大渡河大数据服务有限公司 Water and power basin emergency command cloud-side computing resource cooperative processing method
CN113342510B (en) * 2021-08-05 2021-11-02 国能大渡河大数据服务有限公司 Water and power basin emergency command cloud-side computing resource cooperative processing method
CN113867960A (en) * 2021-09-30 2021-12-31 丝路信息港云计算科技有限公司 Cloud load balancing hybrid model based on file types
CN113867960B (en) * 2021-09-30 2023-08-11 丝路信息港云计算科技有限公司 Cloud load balancing hybrid model based on file types

Similar Documents

Publication Publication Date Title
CN104618406A (en) Load balancing algorithm based on naive Bayesian classification
CN111694636B (en) Electric power Internet of things container migration method oriented to edge network load balancing
CN103605567B (en) Cloud computing task scheduling method facing real-time demand change
Fu et al. Task scheduling of cloud computing based on hybrid particle swarm algorithm and genetic algorithm
US8144590B2 (en) Distributed resource allocation in stream processing systems
CN107404523A (en) Cloud platform adaptive resource dispatches system and method
CN104657221A (en) Multi-queue peak-alternation scheduling model and multi-queue peak-alteration scheduling method based on task classification in cloud computing
CN107992353B (en) Container dynamic migration method and system based on minimum migration volume
CN104902001B (en) Web request load-balancing method based on operating system virtualization
CN103699433B (en) One kind dynamically adjusts number of tasks purpose method and system in Hadoop platform
CN102508714A (en) Green-computer-based virtual machine scheduling method for cloud computing
CN102932422A (en) Cloud environment task scheduling method based on improved ant colony algorithm
CN109617826A (en) A kind of storm dynamic load balancing method based on cuckoo search
CN115103404A (en) Node task scheduling method in computational power network
CN104537682A (en) Medical image segmenting and dispatching method
CN115629865B (en) Deep learning inference task scheduling method based on edge calculation
Chekired et al. Multi-tier fog architecture: A new delay-tolerant network for IoT data processing
CN112954012B (en) Cloud task scheduling method based on improved simulated annealing algorithm of load
CN114745666A (en) Unmanned aerial vehicle auxiliary edge calculation method used in crowded venue
CN114938372A (en) Federal learning-based micro-grid group request dynamic migration scheduling method and device
CN112148474B (en) Loongson big data all-in-one self-adaptive task segmentation method and system for load balancing
Guo Ant colony optimization computing resource allocation algorithm based on cloud computing environment
Patel et al. An improved approach for load balancing among heterogeneous resources in computational grids
CN110865871A (en) Resource rationalization application-based virtualized cluster resource scheduling method
Chunlin et al. Elastic resource provisioning in hybrid mobile cloud for computationally intensive mobile applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150513

WD01 Invention patent application deemed withdrawn after publication