CN109525662A - The method of copy is set for Hot Contents - Google Patents

The method of copy is set for Hot Contents Download PDF

Info

Publication number
CN109525662A
CN109525662A CN201811355782.5A CN201811355782A CN109525662A CN 109525662 A CN109525662 A CN 109525662A CN 201811355782 A CN201811355782 A CN 201811355782A CN 109525662 A CN109525662 A CN 109525662A
Authority
CN
China
Prior art keywords
server
copy
load
name
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201811355782.5A
Other languages
Chinese (zh)
Inventor
程桂平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201811355782.5A priority Critical patent/CN109525662A/en
Publication of CN109525662A publication Critical patent/CN109525662A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1029Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers using data related to the state of servers by a load balancer

Abstract

The method of copy is arranged for Hot Contents, comprising: step 1: big data system has n server, using consistency hash algorithm, by this n server be placed into the value space of hash function for 0- () hash space ring on, wherein K=32;Concrete mode is that hash algorithm is executed to the Service name of each server, and the length of the cryptographic Hash of output is K.

Description

The method of copy is set for Hot Contents
Technical field
The invention belongs to the load balancing fields of big data.
Background technique
Consistency hash algorithm (Consistent Hashing Algorithm) is a kind of distributed algorithm, is usually used in bearing It carries balanced.Memcached client also selects this algorithm, solves key-value being evenly distributed to numerous Memcached Problem on server.It can replace traditional modulo operation, and additions and deletions Memcached can not be coped with by solving modulo operation The problem of Server, (additions and deletions server will lead to the same key, and in get operation, distribution is really stored less than data Server, hit rate can sharply decline).
But in the prior art, by the way of consistency hash algorithm and dummy node, accomplish data are generally uniform Be distributed on each node, but do not account for the cold and hot property between data.And in the application of reality, the cold and hot difference of data It is very big, it is non-uniform so as to cause the amount of access to each node.When connection number is excessive, system will be unable to offer service. Occur so as to cause the problem of load imbalance.
Summary of the invention
The method of copy is set for Hot Contents, comprising:
Step 1: there is big data system n server this n server is placed into Hash using consistency hash algorithm The value space of function be 0-() hash space ring on, wherein K=32;Concrete mode is the clothes to each server Name of being engaged in executes hash algorithm, and the length of the cryptographic Hash of output is K;
Step 2: after big data system brings into operation, monitor periodically monitors the load parameter of each server, concrete mode Are as follows: at moment t, the load of server i is, whereinIt is sampling time interval, everyMonitor to Each server sends request, and server is current by this nodeValue returns to monitor, and monitor can save last time meter Calculate return value when load
Step 3: calculating the average load of t moment server zone are as follows:
Step 4: calculating the load variance of t moment server zone are as follows:;Step 5: if It is less than predetermined value, then return step 2, it is no to then follow the steps 6;
Step 6: the maximum server of load is found from Servers-all;
Step 7: finding that number is accessed in a period of time (e.g. 1 day) recently is most from the maximum server of the load M content file, such as m can be set as 10 in advance;
Step 8: for this m content file, copy title is set.Copy number, example are preferably added after raw filename Such as raw filename are as follows: file name, then copy name can be set to file name #1.If having existed for a pair This, copy entitled file name #1 is subsequent when also needing to be further added by copy, can set copy name to file name # 2。
Step 9: calculating the cryptographic Hash of copy name, according to consistency hash algorithm, which is stored in corresponding service On device.All copy informations are all stored on a loaded server, when user needs to access some file content, first to Loaded server sends request, and loaded server finds that the content is not provided with copy by inquiry, then directly according to existing Process flow processing directly goes to access original contents, if it find that there are copies, then from the original contents and copy content with Machine selects a content to access.To realize more copies setting of hot content, the access pressure of individual server is reduced Power.
Step 10: returning to step 2.
The method of copy is set for Hot Contents, comprising:
Step 1: there is n server in the big data system, using consistency hash algorithm, this n server is placed To the value space of hash function be 0-() hash space ring on, wherein K=31+;Concrete mode is Hash algorithm is executed to the Service name of each server, the length of the cryptographic Hash of output is K;
Step 2: after big data system brings into operation, monitor periodically monitors the load parameter of each server, concrete mode Are as follows: at moment t, the load of server i is, whereinIt is sampling time interval, everyMonitor to Each server sends request, and server is current by this nodeValue returns to monitor, and monitor can save last time meter Calculate return value when load
Step 3: calculating the average load of t moment server zone are as follows:
Step 4: calculating the load variance of t moment server zone are as follows:;Step 5: if It is less than predetermined value, then return step 2, it is no to then follow the steps 6;
Step 6: the maximum server of load is found from Servers-all;
Step 7: finding that number is accessed in a period of time (e.g. 1 day) recently is most from the maximum server of the load M content file, such as m can be set as 10 in advance;
Step 8: for this m content file, copy title is set.Copy number, example are preferably added after raw filename Such as raw filename are as follows: file name, then copy name can be set to file name #1.If having existed for a pair This, copy entitled file name #1 is subsequent when also needing to be further added by copy, can set copy name to file name # 2。
Step 9: calculating the cryptographic Hash of copy name, according to consistency hash algorithm, which is stored in corresponding service On device.All copy informations are all stored on a loaded server, when user needs to access some file content, first to Loaded server sends request, and loaded server finds that the content is not provided with copy by inquiry, then directly according to existing Process flow processing directly goes to access original contents, if it find that there are copies, then from the original contents and copy content with Machine selects a content to access.To realize more copies setting of hot content, the access pressure of individual server is reduced Power.
Step 10: returning to step 2.
It is an advantage of the invention that according to the mean square deviation of server load, to determine whether to as on the maximum server of load Hot content copy is set, to reach load sharing, the problem of mitigating individual server load pressure.
Detailed description of the invention
Fig. 1 is schematic diagram 1 according to an embodiment of the present invention;
Fig. 2 is schematic diagram 2 according to an embodiment of the present invention;
Fig. 3 is schematic diagram 3 according to an embodiment of the present invention;
Fig. 4 is schematic diagram 4 according to an embodiment of the present invention.
Specific embodiment
Consistency Hash principle
In simple terms, entire hash-value space is organized into a virtual annulus by consistency Hash, such as assumes certain hash function H Value space be 0-() (i.e. cryptographic Hash is one K without symbol shaping).It is illustrated below with K=32.
Entire hash space ring is as shown in Figure 1, entire space is organized in the direction of the clock.0 and () in zero point Direction is overlapped.
Each server is subjected to a Hash using H in next step, specifically can choose the ip or host masterpiece of server Hash is carried out for keyword, every machine so just can determine that its position on hash space ring, it is assumed here that will above Three servers use as shown in Figure 2 in the position of annular space after ip Address-Hash.
Next respective server is accessed using following algorithm location data: data key is counted using identical function H Cryptographic Hash h is calculated, logical to determine position of this data on ring according to h, " walking " clockwise along ring from this position, First encounters Server be exactly server that it should be navigated to.
Such as we have tetra- data objects of A, B, C, D, position such as Fig. 3 institute after Hash calculation, on annular space Show: according to consistency hash algorithm, data A can be decided to be on Server 1, and D is decided to be on Server 3, and B, C distinguish It is decided to be on Server 2.
Dummy node principle
In order to solve the problems, such as that this data skew, consistency hash algorithm introduce dummy node mechanism, i.e., to each service Node calculates multiple Hash, this service node, referred to as dummy node are placed in each calculated result position.Specific practice can It is realized with increasing number behind server ip or host name.Such as situation above, we are determined as every server Three dummy nodes are calculated, " Memcached Server 1#1 ", " Memcached Server 1# can be then calculated separately 2”、“Memcached Server 1#3”、“Memcached Server 2#1”、“Memcached Server 2#2”、 The cryptographic Hash of " Memcached Server 2#3 " then forms six dummy nodes, as shown in Figure 4.
Embodiment 1
The method of copy is set for Hot Contents, comprising:
Step 1: there is big data system n server this n server is placed into Hash using consistency hash algorithm The value space of function be 0-() hash space ring on, wherein K=32;Concrete mode is the clothes to each server Name of being engaged in executes hash algorithm, and the length of the cryptographic Hash of output is K;
Step 2: after big data system brings into operation, monitor periodically monitors the load parameter of each server, concrete mode Are as follows: at moment t, the load of server i is, whereinIt is sampling time interval, everyMonitor to Each server sends request, and server is current by this nodeValue returns to monitor, and monitor can save last time meter Calculate return value when load
Step 3: calculating the average load of t moment server zone are as follows:
Step 4: calculating the load variance of t moment server zone are as follows:;Step 5: if It is less than predetermined value, then return step 2, it is no to then follow the steps 6;
Step 6: the maximum server of load is found from Servers-all;
Step 7: finding that number is accessed in a period of time (e.g. 1 day) recently is most from the maximum server of the load M content file, such as m can be set as 10 in advance;
Step 8: for this m content file, copy title is set.Copy number, example are preferably added after raw filename Such as raw filename are as follows: file name, then copy name can be set to file name #1.If having existed for a pair This, copy entitled file name #1 is subsequent when also needing to be further added by copy, can set copy name to file name # 2。
Step 9: calculating the cryptographic Hash of copy name, according to consistency hash algorithm, which is stored in corresponding service On device.All copy informations are all stored on a loaded server, when user needs to access some file content, first to Loaded server sends request, and loaded server finds that the content is not provided with copy by inquiry, then directly according to existing Process flow processing directly goes to access original contents, if it find that there are copies, then from the original contents and copy content with Machine selects a content to access.To realize more copies setting of hot content, the access pressure of individual server is reduced Power.
Step 10: returning to step 2.
The method of copy is set for Hot Contents, comprising:
Step 1: there is n server in the big data system, using consistency hash algorithm, this n server is placed To the value space of hash function be 0-() hash space ring on, wherein K=31+;Concrete mode is Hash algorithm is executed to the Service name of each server, the length of the cryptographic Hash of output is K;
Step 2: after big data system brings into operation, monitor periodically monitors the load parameter of each server, concrete mode Are as follows: at moment t, the load of server i is, whereinIt is sampling time interval, everyMonitor to Each server sends request, and server is current by this nodeValue returns to monitor, and monitor can save last time meter Calculate return value when load
Step 3: calculating the average load of t moment server zone are as follows:
Step 4: calculating the load variance of t moment server zone are as follows:;Step 5: if It is less than predetermined value, then return step 2, it is no to then follow the steps 6;
Step 6: the maximum server of load is found from Servers-all;
Step 7: finding that number is accessed in a period of time (e.g. 1 day) recently is most from the maximum server of the load M content file, such as m can be set as 10 in advance;
Step 8: for this m content file, copy title is set.Copy number, example are preferably added after raw filename Such as raw filename are as follows: file name, then copy name can be set to file name #1.If having existed for a pair This, copy entitled file name #1 is subsequent when also needing to be further added by copy, can set copy name to file name # 2。
Step 9: calculating the cryptographic Hash of copy name, according to consistency hash algorithm, which is stored in corresponding service On device.All copy informations are all stored on a loaded server, when user needs to access some file content, first to Loaded server sends request, and loaded server finds that the content is not provided with copy by inquiry, then directly according to existing Process flow processing directly goes to access original contents, if it find that there are copies, then from the original contents and copy content with Machine selects a content to access.To realize more copies setting of hot content, the access pressure of individual server is reduced Power.
Step 10: returning to step 2.

Claims (4)

1. the method for copy is arranged for Hot Contents, comprising:
Step 1: there is big data system n server this n server is placed into Hash using consistency hash algorithm The value space of function be 0-() hash space ring on, wherein K=32;Concrete mode is the clothes to each server Name of being engaged in executes hash algorithm, and the length of the cryptographic Hash of output is K;
Step 2: after big data system brings into operation, monitor periodically monitors the load parameter of each server, concrete mode Are as follows: at moment t, the load of server i is, whereinIt is sampling time interval, everyMonitor to Each server sends request, and server is current by this nodeValue returns to monitor, and monitor can save last time meter Calculate return value when load
Step 3: calculating the average load of t moment server zone are as follows:
Step 4: calculating the load variance of t moment server zone are as follows:;Step 5: ifNot More than predetermined value, then return step 2, no to then follow the steps 6;
Step 6: the maximum server of load is found from Servers-all;
Step 7: finding that number is accessed in a period of time (e.g. 1 day) recently is most from the maximum server of the load M content file, such as m can be set as 10 in advance;
Step 8: copy title being set for this m content file, copy number, example are preferably added after raw filename Such as raw filename are as follows: file name, then copy name can be set to file name #1, if having existed for a pair This, copy entitled file name #1 is subsequent when also needing to be further added by copy, can set copy name to file name # 2;
Step 9: calculating the cryptographic Hash of copy name, according to consistency hash algorithm, which is stored on corresponding server;
Step 10: returning to step 2.
2. the method for copy is arranged for Hot Contents, comprising:
Step 1: there is n server in the big data system, using consistency hash algorithm, this n server is placed To the value space of hash function be 0-() hash space ring on, wherein K=31+;Concrete mode is Hash algorithm is executed to the Service name of each server, the length of the cryptographic Hash of output is K;
Step 2: after big data system brings into operation, monitor periodically monitors the load parameter of each server, concrete mode Are as follows: at moment t, the load of server i is, whereinIt is sampling time interval, everyMonitor to Each server sends request, and server is current by this nodeValue returns to monitor, and monitor can save last time meter Calculate return value when load
Step 3: calculating the average load of t moment server zone are as follows:
Step 4: calculating the load variance of t moment server zone are as follows:;Step 5: ifNot More than predetermined value, then return step 2, no to then follow the steps 6;
Step 6: the maximum server of load is found from Servers-all;
Step 7: finding that number is accessed in a period of time (e.g. 1 day) recently is most from the maximum server of the load M content file, such as m can be set as 10 in advance;
Step 8: copy title being set for this m content file, copy number, example are preferably added after raw filename Such as raw filename are as follows: file name, then copy name can be set to file name #1, if having existed for a pair This, copy entitled file name #1 is subsequent when also needing to be further added by copy, can set copy name to file name # 2;
Step 9: calculating the cryptographic Hash of copy name, according to consistency hash algorithm, which is stored on corresponding server;
Step 10: returning to step 2.
3. a kind of computer program, for executing any one method in method 1-2.
4. the system of copy is arranged for Hot Contents, comprising: central processing unit, memory include computer on the memory Program, the computer program, for executing any one method in method 1-2.
CN201811355782.5A 2018-11-14 2018-11-14 The method of copy is set for Hot Contents Withdrawn CN109525662A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811355782.5A CN109525662A (en) 2018-11-14 2018-11-14 The method of copy is set for Hot Contents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811355782.5A CN109525662A (en) 2018-11-14 2018-11-14 The method of copy is set for Hot Contents

Publications (1)

Publication Number Publication Date
CN109525662A true CN109525662A (en) 2019-03-26

Family

ID=65777683

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811355782.5A Withdrawn CN109525662A (en) 2018-11-14 2018-11-14 The method of copy is set for Hot Contents

Country Status (1)

Country Link
CN (1) CN109525662A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102244685A (en) * 2011-08-11 2011-11-16 中国科学院软件研究所 Distributed type dynamic cache expanding method and system supporting load balancing
CN102624922A (en) * 2012-04-11 2012-08-01 武汉大学 Method for balancing load of network GIS heterogeneous cluster server
CN106572181A (en) * 2016-11-08 2017-04-19 深圳市中博睿存科技有限公司 Object storage interface load balancing method and system based on cluster file system
CN107302561A (en) * 2017-05-23 2017-10-27 南京邮电大学 A kind of hot spot data Replica placement method in cloud storage system
CN107463342A (en) * 2017-08-28 2017-12-12 北京奇艺世纪科技有限公司 A kind of storage method and device of CDN fringe nodes file
CN107483519A (en) * 2016-06-08 2017-12-15 Tcl集团股份有限公司 A kind of Memcache load-balancing methods and its system
CN107508758A (en) * 2017-08-16 2017-12-22 北京云端智度科技有限公司 A kind of method that focus file spreads automatically

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102244685A (en) * 2011-08-11 2011-11-16 中国科学院软件研究所 Distributed type dynamic cache expanding method and system supporting load balancing
CN102624922A (en) * 2012-04-11 2012-08-01 武汉大学 Method for balancing load of network GIS heterogeneous cluster server
CN107483519A (en) * 2016-06-08 2017-12-15 Tcl集团股份有限公司 A kind of Memcache load-balancing methods and its system
CN106572181A (en) * 2016-11-08 2017-04-19 深圳市中博睿存科技有限公司 Object storage interface load balancing method and system based on cluster file system
CN107302561A (en) * 2017-05-23 2017-10-27 南京邮电大学 A kind of hot spot data Replica placement method in cloud storage system
CN107508758A (en) * 2017-08-16 2017-12-22 北京云端智度科技有限公司 A kind of method that focus file spreads automatically
CN107463342A (en) * 2017-08-28 2017-12-12 北京奇艺世纪科技有限公司 A kind of storage method and device of CDN fringe nodes file

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王诚等: "基于贪心算法的一致性哈希负载均衡优化", 《南京邮电大学学报(自然科学版)》 *

Similar Documents

Publication Publication Date Title
US10185721B2 (en) Distributed data set storage and retrieval
CN106464674B (en) Managing NIC encryption streams for migrating customers or tasks
KR101503202B1 (en) Data synchronization
Loesing et al. Stormy: an elastic and highly available streaming service in the cloud
JP2019504412A (en) Short link processing method, device, and server
WO2021003935A1 (en) Data cluster storage method and apparatus, and computer device
Zheng et al. BatchFS: Scaling the file system control plane with client-funded metadata servers
JP2012079242A (en) Composite event distribution device, composite event distribution method and composite event distribution program
EP3639463B1 (en) Distributed data set encryption and decryption
Liu et al. Popularity-aware multi-failure resilient and cost-effective replication for high data durability in cloud storage
Henze et al. Practical data compliance for cloud storage
CN109451069B (en) Network data file library storage and query method based on distributed storage
US20080270483A1 (en) Storage Management System
CN110019205A (en) A kind of data storage, restoring method, device and computer equipment
TWI652621B (en) Method and system for generating queue based applications dependencies in virtual machines
JP5957965B2 (en) Virtualization system, load balancing apparatus, load balancing method, and load balancing program
CN109525662A (en) The method of copy is set for Hot Contents
CN104219163A (en) Load balancing method for node dynamic forward based on dynamic replication method and virtual node method
Huang et al. Optimizing data partition for scaling out NoSQL cluster
US9940342B2 (en) Stability measurement for federation engine
CN109246250A (en) The method for adjusting dummy node quantity according to the change of number of servers
JP6259408B2 (en) Distributed processing system
Ying et al. Consistent hashing algorithm based on slice in improving Scrapy-Redis distributed crawler efficiency
Sun et al. RPCC: A Replica Placement Method to Alleviate the Replica Consistency under Dynamic Cloud
Tech et al. A view on load balancing of NoSQL databases (Couchbase, Cassandra, Neo4j and Voldemort)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190326