CN109379298A - A kind of load-balancing method of big data system - Google Patents

A kind of load-balancing method of big data system Download PDF

Info

Publication number
CN109379298A
CN109379298A CN201811489449.3A CN201811489449A CN109379298A CN 109379298 A CN109379298 A CN 109379298A CN 201811489449 A CN201811489449 A CN 201811489449A CN 109379298 A CN109379298 A CN 109379298A
Authority
CN
China
Prior art keywords
data
back end
load
node
data cell
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811489449.3A
Other languages
Chinese (zh)
Inventor
徐静
刘劲松
饶江
王友柱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Huasheng Gene Data Technology Co Ltd
Original Assignee
Jiangsu Huasheng Gene Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Huasheng Gene Data Technology Co Ltd filed Critical Jiangsu Huasheng Gene Data Technology Co Ltd
Priority to CN201811489449.3A priority Critical patent/CN109379298A/en
Publication of CN109379298A publication Critical patent/CN109379298A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/29Flow control; Congestion control using a combination of thresholds
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of load-balancing methods of big data system, this method comprises: back end records each data cell data volume being read in each data manipulation of its storage or the data volume being updated;Back end counts the reading total amount of data of each data cell in one day and updates total amount of data;Based on the reading total amount of data and total amount of data is updated for each data cell computational load index;Back end calculates the sum of the load factor of all data cells of its storage, the node load index as the back end;Node load index of the management server based on each back end controls each back end and carries out load balancing.The load of each back end, improves resource utilization in this method balance system.

Description

A kind of load-balancing method of big data system
[technical field]
The invention belongs to computers and internet area more particularly to big data field, specifically, being related to a kind of big data The load-balancing method of system.
[background technique]
With the fast development of computer and Internet technology, we have been in the epoch of an information explosion, in order to locate A large amount of information is managed, the concept of big data occurs.So-called big data, referring to can not be in the time range that can be born with conventional The data acquisition system that software tool is captured, managed and handled is to need new tupe that could have stronger decision edge, hole Examine magnanimity, high growth rate and the diversified information assets of discovery power and process optimization ability.
Due to the mass property of data, people only with one's own be difficult to these data these analysis, but with Cloud computing is under the setting off of the technological innovation curtain of representative, these data that is difficult to collect originally and use start to be easy to be utilized Get up, by constantly bringing forth new ideas for all trades and professions, big data is gradually that the mankind create more values.
Since big data system has mass data, carry out storing data usually using multiple back end, in practical fortune During row, since each node stores different data, the load of each node is also different, if node is negative It carries excessively high, necessarily affects its speed for handling data, the problems such as causing the response time too long, and other node loads are low simultaneously, Lead to resources idle.Although this unbalanced situation leads to system, whole resource is enough, and actual operating state is undesirable, because This needs to carry out load balancing.
[summary of the invention]
To solve the above-mentioned problems, the invention proposes a kind of load-balancing methods of big data system.
The technical solution adopted by the invention is as follows:
A kind of load-balancing method of big data system, comprising the following steps:
(1) back end record its storage the data volume that is read in each data manipulation of each data cell or by The data volume of update;
(2) back end counts the reading total amount of data of each data cell in one day and updates total amount of data;
It (3) is each data cell computational load index based on the reading total amount of data and update total amount of data;
(4) back end calculates the sum of the load factor of all data cells of its storage, the section as the back end Point load factor;
(5) the node load index that back end is calculated is sent to management server;
(6) node load index of the management server based on each back end controls each back end and is loaded It is balanced;
Wherein, in the step 3, the method for calculating the load factor F an of data cell is as follows:
(3.1) when the data cell is stored in back end, its initial load factor F=0 is set;
(3.2) one day new reading total amount of data R of the data cell is being obtained1With update total amount of data R2Afterwards, it calculates new Load factor Fnew, it may be assumed that
Fnew=FS+W1R1+W2R2
Wherein, W1And W2It is weighted value predetermined, S is damped expoential predetermined, 0 < S < 1;
(3.3) the load factor F of the data cell is updated to Fnew
Further, the step 6 specifically includes:
(6.1) shared n back end is set, corresponding node load index is F1, F2... ..., FnManagement server meter Count stating the average value F of n node load index inave
(6.2) management server calculates the difference of each node load index and the average value, i.e. Di=Fi–Fave(1≤i≤ n);
(6.3) the n D for being calculatediIf some DiGreater than predefined threshold value, then by its corresponding back end Node set to be equalized is added;
(6.4) each back end of the management server into node set to be equalized issues a preparation equilibrium Command messages include above-mentioned average value F in the command messagesave
(6.5) back end for receiving the command messages calculates the difference of its node load index and the average value, And select a load factor closest to the data cell of the difference from the data cell that it is stored, by selected data sheet The load factor of member notifies management server;
(6.6) sequence of the management server according to the load factor received from big to small treats equalizing section point set Each back end sequence in conjunction, if each back end after sequence is A1, A2... ..., Am
(6.7) sequence of the management server according to node load index from small to large, to corresponding back end into Row sequence, m data node, is set as B before taking1, B2... ..., Bm
(6.8) management server is to back end AjLoad balancing message is issued, is wrapped in the load balancing message Back end B is includedjAddress (1≤j≤m);
(6.9) back end AjIts selected data cell is moved into back end Bj, A laterjDelete its storage The data cell.
Further, in the step 2, each back end carries out the statistics in given time.
Further, the moment is daily zero point.
Further, in the whole life cycle of data cell, it is all associated with its corresponding load factor, once the number It is deleted according to unit, which is also deleted.
Further, in the step 6.3, if the node set to be equalized is empty set, method terminates.
Further, in AjObtain BjAddress after, with BjConnection is established, sends B for selected data cellj The data cell of its own storage is deleted in storage later.
The invention has the benefit that the load of each back end in balance system, improves resource utilization.
[Detailed description of the invention]
Described herein the drawings are intended to provide a further understanding of the invention, constitutes part of this application, but It does not constitute improper limitations of the present invention, in the accompanying drawings:
Fig. 1 is the big data system schematic of the method for the present invention application.
[specific embodiment]
Come that the present invention will be described in detail below in conjunction with attached drawing and specific embodiment, illustrative examples therein and says It is bright to be only used to explain the present invention but not as a limitation of the invention.
Referring to attached drawing 1, it illustrates the basic framework of big data system applied by the method for the present invention, which includes one A management server and multiple back end, pass through network connection between management server and each back end.The management clothes Business device is for being managed entire big data system, and the back end is for storing data and according to the life of management server It enables and carries out corresponding data manipulation.
The data stored in back end be using data cell as data storage unit, in one embodiment, data sheet Member refers to a data file, and in another embodiment, data cell is also possible to a data record in database, The present invention is not specifically limited this.
Based on above system framework, the present invention provides the load-balancing methods between a kind of back end, for guaranteeing The load of each node is essentially identical, is described as follows:
(1) back end record its storage the data volume that is read in each data manipulation of each data cell or by The data volume of update.
Have three classes for the data manipulation of data cell: read operation updates operation and delete operation.It is grasped if it is deleting Make, entire data cell is deleted, just not again can logarithm cause to load according to node, therefore load balancing of the invention does not consider to delete Except operation.If it is read operation, just there is a reading data volume, such as the 1M byte in data cell is read in once-through operation, Then reading data volume is 1M byte.Similarly, it operates for updating, if the byte number that data cell is updated is 1M, updates Data volume is 1M byte.
Each back end records the number that each data cell is read based on the data manipulation instruction received every time According to the data volume measured or be updated, the reading of these data volumes and more new capital cause load to data node.
(2) back end counts the reading total amount of data of each data cell in one day and updates total amount of data.
Since data cell is read every time in step 1 data volume and the data volume being updated all are recorded, number As soon as according to node can count day in the data cell reading total amount of data and update total amount of data, the reading total amount of data It is the sum of the data volume that the data cell is read every time in this day, the update total amount of data is exactly that the data cell is each The sum of data volume being updated.
Each back end can carry out above-mentioned statistics in given time, for example, can count after daily zero point It goes the reading total amount of data of each data cell in one day and updates total amount of data.
It (3) is each data cell computational load index based on the reading total amount of data and update total amount of data.
Specifically, the method for calculating the load factor F an of data cell is as follows:
(3.1) when the data cell is stored in back end, its initial load factor F=0 is set.
As soon as back end creates a load factor in newly-built data cell, for it, in the entire of data cell In life cycle, this load factor is all associated with the data cell, once the data cell is deleted, which can also It is deleted.
(3.2) one day new reading total amount of data R of the data cell is being obtained1With update total amount of data R2Afterwards, it calculates new Load factor Fnew, it may be assumed that
Fnew=FS+W1R1+W2R2
Wherein, W1And W2It is weighted value predetermined, can be used for indicating reading and updating the load caused by back end Weight.F is the current load factor of the data cell, and S is damped expoential predetermined, and meets 0 < S < 1, for indicating The degree that original load effect decays at any time.
(3.3) the load factor F of the data cell is updated to Fnew
After updating load factor for each data cell, back end can calculate the load factor of its own.
(4) back end calculates the sum of the load factor of all data cells of its storage, the section as the back end Point load factor.
By the step for, the node load index of itself can be calculated in each back end, due to every number Above-mentioned steps (such as after daily zero point) all is executed at the scheduled time according to node, therefore each back end will be in basic phase With at the time of its node load index is calculated.
(5) the node load index that back end is calculated is sent to management server.
Node load index is just sent to management immediately after its node load index is calculated by each back end Server, then the management server will also obtain the node load index of each back end at the time of essentially identical.
(6) node load index of the management server based on each back end controls each back end and is loaded It is balanced.
The node load index of each back end indicates the load of each back end data-handling capacity, essence On be the sum of each data cell load, those skilled in the art can be based on the node load index, to balance each data The data volume of node, and then balance its load.
A specific embodiment according to the present invention, the step 6 specifically include:
(6.1) shared n back end is set, corresponding node load index is F1, F2... ..., FnManagement server meter Count stating the average value F of n node load index inave
(6.2) management server calculates the difference of each node load index and the average value, i.e. Di=Fi–Fave(1≤i≤ n)。
(6.3) the n D for being calculatediIf some DiGreater than predefined threshold value, then by its corresponding back end Node set to be equalized is added.
The node set to be equalized is used to indicate that load excessive to need the set of balanced back end, and the set is initial , if sharing m node in the set, if m=0, illustrate without carrying out load balancing, method after step 6.3 for empty set Directly terminate.
The predefined threshold value is for being arranged a balanced line, i.e., if load is only only slight beyond average value, so that it may Not have to be handled, only load just needs to carry out load balancing significantly more than average value.
(6.4) each back end of the management server into node set to be equalized issues a preparation equilibrium Command messages include above-mentioned average value F in the command messagesave
(6.5) back end for receiving the command messages calculates the difference of its node load index and the average value, And select a load factor closest to the data cell of the difference from the data cell that it is stored, by selected data sheet The load factor of member notifies management server.
(6.6) sequence of the management server according to the load factor received from big to small treats equalizing section point set Each back end sequence in conjunction, if each back end after sequence is A1, A2... ..., Am
(6.7) sequence of the management server according to node load index from small to large, to corresponding back end into Row sequence, m data node, is set as B before taking1, B2... ..., Bm
The step for actually obtain is that minimum m data node is loaded in system, in this, as load balancing Destination node.
(6.8) management server is to back end AjLoad balancing message is issued, is wrapped in the load balancing message Back end B is includedjAddress (1≤j≤m).
Based on the step for, management server is B1Destination node as load balancing informs A1, B2It informs A2... ..., BmInform Am, so as to form m to the source node and destination node of load balancing.
(6.9) back end AjIts selected data cell is moved into back end Bj, A laterjDelete its storage The data cell.
In AjObtain BjAddress after, can be with BjConnection is established, sends B for selected data celljIt deposits Storage deletes the data cell of its own storage, in this way, A laterjLoad will download, and BjLoad will rise, thus real The target of load balancing is showed.
The above description is only a preferred embodiment of the present invention, thus it is all according to the configuration described in the scope of the patent application of the present invention, The equivalent change or modification that feature and principle are done, is included in the scope of the patent application of the present invention.

Claims (7)

1. a kind of load-balancing method of big data system, which comprises the following steps:
(1) data volume or be updated that each data cell that back end records its storage is read in each data manipulation Data volume;
(2) back end counts the reading total amount of data of each data cell in one day and updates total amount of data;
It (3) is each data cell computational load index based on the reading total amount of data and update total amount of data;
(4) back end calculates the sum of the load factor of all data cells of its storage, and the node as the back end is negative Carry index;
(5) the node load index that back end is calculated is sent to management server;
(6) node load index of the management server based on each back end controls each back end and carries out load balancing;
Wherein, in the step 3, the method for calculating the load factor F an of data cell is as follows:
(3.1) when the data cell is stored in back end, its initial load factor F=0 is set;
(3.2) one day new reading total amount of data R of the data cell is being obtained1With update total amount of data R2Afterwards, new bear is calculated Carry index Fnew, it may be assumed that
Fnew=FS+W1R1+W2R2
Wherein, W1And W2It is weighted value predetermined, S is damped expoential predetermined, 0 < S < 1;
(3.3) the load factor F of the data cell is updated to Fnew
2. the method according to claim 1, wherein the step 6 specifically includes:
(6.1) shared n back end is set, corresponding node load index is F1, F2... ..., FnOn management server calculates State the average value F of n node load indexave
(6.2) management server calculates the difference of each node load index and the average value, i.e. Di=Fi–Fave(1≤i≤n);
(6.3) the n D for being calculatediIf some DiGreater than predefined threshold value, then its corresponding back end is added Node set to be equalized;
(6.4) each back end of the management server into node set to be equalized issues the order for preparing equilibrium Message includes above-mentioned average value F in the command messagesave
(6.5) back end for receiving the command messages calculates the difference of its node load index and the average value, and from Select a load factor closest to the data cell of the difference in its data cell stored, by selected data cell Load factor notifies management server;
(6.6) sequence of the management server according to the load factor received from big to small is treated in balanced node set Each back end sequence, if sequence after each back end be A1, A2... ..., Am
(6.7) sequence of the management server according to node load index from small to large arranges corresponding back end Sequence, m data node, is set as B before taking1, B2... ..., Bm
(6.8) management server is to back end AjLoad balancing message is issued, includes number in the load balancing message According to node BjAddress (1≤j≤m);
(6.9) back end AjIts selected data cell is moved into back end Bj, A laterjDelete the number of its storage According to unit.
3. method described in -2 any one according to claim 1, which is characterized in that in the step 2, each back end exists Given time carries out the statistics.
4. according to the method described in claim 3, it is characterized in that, the moment is daily zero point.
5. method according to any of claims 1-4, which is characterized in that in the whole life cycle of data cell In, it is all associated with its corresponding load factor, once the data cell is deleted, which is also deleted.
6. according to the method described in claim 2, it is characterized in that, in the step 6.3, if the node set to be equalized is Empty set, then method terminates.
7. according to the method described in claim 2, it is characterized in that, in AjObtain BjAddress after, with BjConnection is established, it will Selected data cell is sent to BjThe data cell of its own storage is deleted in storage later.
CN201811489449.3A 2018-12-06 2018-12-06 A kind of load-balancing method of big data system Pending CN109379298A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811489449.3A CN109379298A (en) 2018-12-06 2018-12-06 A kind of load-balancing method of big data system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811489449.3A CN109379298A (en) 2018-12-06 2018-12-06 A kind of load-balancing method of big data system

Publications (1)

Publication Number Publication Date
CN109379298A true CN109379298A (en) 2019-02-22

Family

ID=65376037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811489449.3A Pending CN109379298A (en) 2018-12-06 2018-12-06 A kind of load-balancing method of big data system

Country Status (1)

Country Link
CN (1) CN109379298A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080046895A1 (en) * 2006-08-15 2008-02-21 International Business Machines Corporation Affinity dispatching load balancer with precise CPU consumption data
CN104836819A (en) * 2014-02-10 2015-08-12 阿里巴巴集团控股有限公司 Dynamic load balancing method and system, and monitoring and dispatching device
CN105187512A (en) * 2015-08-13 2015-12-23 航天恒星科技有限公司 Method and system for load balancing of virtual machine clusters
CN105306525A (en) * 2015-09-11 2016-02-03 浪潮集团有限公司 Data layout method, device and system
CN108255427A (en) * 2017-12-29 2018-07-06 广东南华工商职业学院 A kind of data storage and dynamic migration method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080046895A1 (en) * 2006-08-15 2008-02-21 International Business Machines Corporation Affinity dispatching load balancer with precise CPU consumption data
CN104836819A (en) * 2014-02-10 2015-08-12 阿里巴巴集团控股有限公司 Dynamic load balancing method and system, and monitoring and dispatching device
CN105187512A (en) * 2015-08-13 2015-12-23 航天恒星科技有限公司 Method and system for load balancing of virtual machine clusters
CN105306525A (en) * 2015-09-11 2016-02-03 浪潮集团有限公司 Data layout method, device and system
CN108255427A (en) * 2017-12-29 2018-07-06 广东南华工商职业学院 A kind of data storage and dynamic migration method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
柳旭日等: "异构集群服务器的动态加权负载均衡算法", 《微计算机信息》 *
郝昱文等: "基于分布式环境的存储负载均衡算法研究", 《信息技术》 *

Similar Documents

Publication Publication Date Title
AU2015229200B2 (en) Coordinated admission control for network-accessible block storage
US10250673B1 (en) Storage workload management using redirected messages
CN108509275B (en) A kind of catalogue moving method and metadata load-balancing method
CN106648456B (en) Dynamic copies file access method based on user&#39;s amount of access and forecasting mechanism
CN109408590B (en) Method, device and equipment for expanding distributed database and storage medium
CN108183947A (en) Distributed caching method and system
US11792275B2 (en) Dynamic connection capacity management
CN108255427B (en) Data storage and dynamic migration method and device
CN109510852B (en) Method and device for gray scale publishing
CN107766159A (en) A kind of metadata management method, device and computer-readable recording medium
CN110347651A (en) Method of data synchronization, device, equipment and storage medium based on cloud storage
CN103647656A (en) Billing node load control method, data access control method and node
CN108900626A (en) Date storage method, apparatus and system under a kind of cloud environment
CN106201561B (en) The upgrade method and equipment of distributed caching cluster
EP4170491A1 (en) Resource scheduling method and apparatus, electronic device, and computer-readable storage medium
CN109271106A (en) Message storage, read method and device, server, storage medium
CN102480502B (en) I/O load equilibrium method and I/O server
CN107391039A (en) A kind of data object storage method and device
US20220103500A1 (en) Method and device for managing group member, and method for processing group message
CN107453948A (en) The storage method and system of a kind of network measurement data
CN110321225A (en) Load-balancing method, meta data server and computer readable storage medium
CN109379298A (en) A kind of load-balancing method of big data system
CN108259583B (en) Data dynamic migration method and device
CN107273527A (en) A kind of Hadoop clusters and distributed system
EP3090361B1 (en) Providing consistent tenant experiences for multi-tenant databases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190222