CN105554069B

CN105554069B - A kind of big data processing distributed cache system and its method

Info

Publication number: CN105554069B
Application number: CN201510891553.5A
Authority: CN
Inventors: 马艳; 陈玉峰; 朱文兵; 杜修明; 郑建; 袁海燕; 任敬国; 邹立达; 苏东亮
Original assignee: State Grid Corp of China SGCC; Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd; Shandong Zhongshi Yitong Group Co Ltd
Current assignee: State Grid Corp of China SGCC; Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd; Shandong Zhongshi Yitong Group Co Ltd
Priority date: 2015-12-04
Filing date: 2015-12-04
Publication date: 2018-09-11
Anticipated expiration: 2035-12-04
Also published as: CN105554069A

Abstract

The invention discloses a kind of big data processing distributed cache systems and its method, wherein this method includes：Big data processing server is divided into several buffer units, into row storage data in the form of key-value pair in each buffer unit；It according to the accessed frequency of buffer unit, calculates the value of buffer unit and is ranked up, extract all buffer units in default value threshold range；All buffer units in the default value threshold range of extraction are clustered, and the buffer unit of default number of clusters is distributed into any cloud computing cache node and is stored.The invention enables data in accessed or processing, reduces the network data transmission between node, shortens processing time, effectively improves the efficiency of big data processing.

Description

A kind of big data processing distributed cache system and its method

Technical field

The invention belongs to big data application field more particularly to a kind of big data processing distributed cache system and its sides Method.

Background technology

The development of internet science and technology makes data volume sharply increase, and under the greatly developing of data science and technology, people can store up The data deposit, handled have reached unprecedented magnitude, and are rapidly increased with the speed more than Moore's Law.Big data Core value is exactly to be to store mass data and analyzed.In commercial environments, data processing service provider will Big data processing is packaged into service, is sold to user.

For some real-time data analysis requirements, user has the performance of processing and the time of return required.Therefore The performance handled big data is needed to optimize, to improve data-handling efficiency.Caching is to improve big data processing speed Important means.

It stores data in cache, data I/O efficiency can be greatly improved, and then accelerate data-handling efficiency.So And it is a kind of article costly to cache relative to External memory equipments such as disks, and big data is the magnanimity number of bulk sample sheet According to it is uneconomic, infeasible in the buffer to store all data.The data access of user is often to a part of data Frequently, in real time, therefore we can will access frequent, important data and be positioned among caching.

It is cached relative to traditional data, big data is cached with its exclusive feature：

Data are stored with key-value pair (Key-Value) structure organization.The granularity of caching, form and replace algorithm need into One step discussion is to adapt to the storage organization of big data.

Big data handles the cloud computing platform that needs to rely on.The data that big data accesses often have certain relevance, Related data are placed into similar position, the cost of data transmission can be reduced.For example a data processing needs A, B two Partial data, and A is stored in from B in two different nodes, this needs will be in one of data transmission to another node It could be handled；If A, B is centrally stored in a node, network transmission will be avoided, to improve treatment effeciency.It is obtaining Need it is data cached after, need to design a kind of method and these data be placed on to suitable node.

Invention content

In order to solve the disadvantage that the prior art, the present invention provide a kind of big data processing distributed caching method.This method Using the method clustered to buffer unit, memory buffers cell type is corresponded in each cloud computing cache node, is used for Accelerate the processing speed of big data.

To achieve the above object, the present invention uses following technical scheme：

A kind of big data processing distributed cache system, including：The big data memory and distributed cloud meter being in communication with each other Calculate server；

The big data memory is divided into several buffer units, and each buffer unit is used for the shape of key-value pair Formula is into row storage data；

It is equipped with several cloud computing cache nodes, big data extraction module and cloud meter in the distribution cloud computing server Calculate cache node distribution module；

The big data extraction module is used for the accessed frequency according to buffer unit, calculates the value of buffer unit And be ranked up, extract all buffer units in default value threshold range；

The cloud computing cache node distribution module, all cachings being used in the default value threshold range to extraction Unit is clustered, and the buffer unit of default number of clusters is distributed into any cloud computing cache node and is stored.

The big data memory includes RAM memory and FLASH memory.

The data being updated according to predetermined period in big data memory in buffer unit.

A kind of caching method of big data processing distributed cache system, including：

Big data processing server is divided into several buffer units, in each buffer unit in the form of key-value pair Into row storage data；

It according to the accessed frequency of buffer unit, calculates the value of buffer unit and is ranked up, extract and preset value threshold All buffer units being worth in range；

All buffer units in the default value threshold range of extraction are clustered, and delaying default number of clusters Memory cell is distributed into any cloud computing cache node and is stored.

Before the value for calculating buffer unit, the data that are updated according to predetermined period in buffer unit.

The computational methods of the value of buffer unit are：

Wherein,Indicate i-th of buffer unit j-th of period value；Indicate i-th of buffer unit in jth- The value in 1 period；α is the cycle influences factor, is constant；β is the data value factor in i-th of buffer unit, is constant；Access times of i-th of buffer unit within j-th of period；I and j is the positive integer more than or equal to 1,To be more than or equal to 0 positive integer.

In cloud computing cache node, caching big data is carried out using Memcache mechanism.

All buffer units in the default value threshold range of extraction are clustered using k-means algorithms.

Big data memory includes RAM memory and FLASH memory.

Beneficial effects of the present invention are：

(1) distributed cloud computing server of the invention is equipped with several cloud computing cache nodes, using each cloud meter Calculation cache node corresponds to the buffer unit type of storage preset quantity so that data are in accessed or processing, between reduction node Network data transmission, shorten processing time, effectively improve big data processing efficiency；

(2) the cloud computing cache node of distributed cloud computing server of the invention, may be used a variety of memory mechanisms into Row storage big data, wherein including Memcache mechanism；And it is arranged in the big data processing distributed cache system of the present invention Multiple cloud computing cache nodes, can ensure that big data obtains distributed caching and processing.

Description of the drawings

Fig. 1 is the big data processing distributed caching method flow chart of the present invention.

Specific implementation mode

The present invention will be further described with embodiment below in conjunction with the accompanying drawings：

The big data of the present invention handles distributed cache system：Big data memory and distributed cloud computing service Device, the two are in communication with each other.

It describes in detail in turn below to big data memory and with distributed cloud computing server：

(1) big data memory：

The division of big data memory has several buffer units, each buffer unit to be used to carry out in the form of key-value pair Store data.Wherein, big data memory includes RAM memory and FLASH memory.

(2) distributed cloud computing server：

It is slow that several cloud computing cache nodes, big data extraction module and cloud computing are equipped in distributed cloud computing server Deposit node distribution module；

Wherein, big data extraction module calculates the value of buffer unit simultaneously for the accessed frequency according to buffer unit It is ranked up, extracts all buffer units in default value threshold range；

Cloud computing cache node distribution module, all buffer units being used in the default value threshold range to extraction It is clustered, and the buffer unit of default number of clusters is distributed into any cloud computing cache node and is stored.

Fig. 1 is the caching method of the big data processing distributed cache system of the present invention, this is described in detail with reference to Fig. 1 The caching method of the big data processing distributed cache system of invention.

Specifically, which includes：

Step 1：Big data processing server is divided into several buffer units, with key-value pair in each buffer unit Form into row storage data；

Step 2：It according to the accessed frequency of buffer unit, calculates the value of buffer unit and is ranked up, extraction is default All buffer units being worth in threshold range；

Step 3：All buffer units in the default value threshold range of extraction are clustered, and by default cluster numbers The buffer unit of amount is distributed into any cloud computing cache node and is stored.

Wherein, before the value for calculating buffer unit, the data that are updated according to predetermined period in buffer unit.

In step 2, the computational methods of the value of buffer unit are：

Wherein,Indicate i-th of buffer unit j-th of period value；Indicate i-th of buffer unit in jth- The value in 1 period；α is the cycle influences factor, is constant；β is the data value factor in i-th of buffer unit, is constant；Access times of i-th of buffer unit within j-th of period；I and j is the positive integer more than or equal to 1,To be more than In 0 positive integer.

When i-th of buffer unit is accessed, time of return is more urgent, and β value is higher.It, can basis in data access The requirement to time of return is accessed, urgency level is classified：In real time, generally, loosely.These three similar corresponding different β Value.The high access of urgency level has higher β value.It can be recorded and be counted in a cycle according to data access, any one The access frequency and access urgency level of buffer unit.

All buffer units in the default value threshold range of extraction are clustered using k-means algorithms, and will The buffer unit of default number of clusters, which is distributed into any one cloud computing cache node, to be stored.

If a cluster uses k-means algorithms into line splitting more than the capacity of a node, then by the cluster, it is used in combination Node as few as possible stores it.

Before being clustered to all buffer units in the default value threshold range of extraction, in default value threshold value model All buffer units in enclosing build a connected graph：

Each buffer unit in default value threshold range is set as a point, if two buffer units are by a number It according to processing while accessing, increases the side that a weight is 1 in the two points, the weight on side can be superimposed, and preset value threshold value model All buffer units in enclosing form a connected graph.

The connected graph built is carried out to judge whether it is empty, if not empty, then to the default value threshold value model of extraction All buffer units in enclosing are clustered；Otherwise, terminate without cluster, in the default value threshold range extracted at this time The number of all buffer units is one.This buffer unit correspondence is stored into a cloud computing cache node.

Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.

Claims

1. a kind of big data handles distributed cache system, which is characterized in that including：The big data memory that is in communication with each other and point Cloth cloud computing server；

The big data memory is divided into several buffer units, each buffer unit be used in the form of key-value pair into Row storage data；

It is slow it to be equipped with several cloud computing cache nodes, big data extraction module and cloud computing in the distribution cloud computing server Deposit node distribution module；

The big data extraction module is used to be gone forward side by side according to the accessed frequency of buffer unit, the value for calculating buffer unit All buffer units in default value threshold range are extracted in row sequence；

The computational methods of the value of buffer unit are：

Wherein,Indicate i-th of buffer unit j-th of period value；Indicate i-th of buffer unit in -1 week of jth The value of phase；α is the cycle influences factor, is constant；β is the data value factor in i-th of buffer unit, is constant；I-th Access times of a buffer unit within j-th of period；I and j is the positive integer more than or equal to 1,For just more than or equal to 0 Integer；

The cloud computing cache node distribution module, all buffer units being used in the default value threshold range to extraction It is clustered, and the buffer unit of default number of clusters is distributed into any cloud computing cache node and is stored.

2. a kind of big data as described in claim 1 handles distributed cache system, which is characterized in that the big data storage Device includes RAM memory and FLASH memory.

3. a kind of big data as described in claim 1 handles distributed cache system, which is characterized in that in big data memory The middle data being updated according to predetermined period in buffer unit.

4. a kind of caching method of big data processing distributed cache system as described in claim 1, which is characterized in that packet It includes：

Big data processing server is divided into several buffer units, is carried out in the form of key-value pair in each buffer unit Store data；

It according to the accessed frequency of buffer unit, calculates the value of buffer unit and is ranked up, extract default value threshold value model Enclose interior all buffer units；

The computational methods of the value of buffer unit are：

All buffer units in the default value threshold range of extraction are clustered, and by the caching list of default number of clusters It is stored in member distribution to a cloud computing cache node.

5. caching method as claimed in claim 4, which is characterized in that before the value for calculating buffer unit, according to default week Phase is updated the data in buffer unit.

6. caching method as claimed in claim 4, which is characterized in that in cloud computing cache node, using Memcache machines System carries out caching big data.

7. caching method as claimed in claim 4, which is characterized in that using k-means algorithms to the default value threshold of extraction All buffer units within the scope of value are clustered.

8. caching method as claimed in claim 4, which is characterized in that the big data memory include RAM memory and FLASH memory.