CN105554069B - A kind of big data processing distributed cache system and its method - Google Patents

A kind of big data processing distributed cache system and its method Download PDF

Info

Publication number
CN105554069B
CN105554069B CN201510891553.5A CN201510891553A CN105554069B CN 105554069 B CN105554069 B CN 105554069B CN 201510891553 A CN201510891553 A CN 201510891553A CN 105554069 B CN105554069 B CN 105554069B
Authority
CN
China
Prior art keywords
buffer unit
big data
value
buffer
cloud computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510891553.5A
Other languages
Chinese (zh)
Other versions
CN105554069A (en
Inventor
马艳
陈玉峰
朱文兵
杜修明
郑建
袁海燕
任敬国
邹立达
苏东亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Shandong Zhongshi Yitong Group Co Ltd
Original Assignee
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Shandong Zhongshi Yitong Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd, Shandong Zhongshi Yitong Group Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201510891553.5A priority Critical patent/CN105554069B/en
Publication of CN105554069A publication Critical patent/CN105554069A/en
Application granted granted Critical
Publication of CN105554069B publication Critical patent/CN105554069B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses a kind of big data processing distributed cache systems and its method, wherein this method includes:Big data processing server is divided into several buffer units, into row storage data in the form of key-value pair in each buffer unit;It according to the accessed frequency of buffer unit, calculates the value of buffer unit and is ranked up, extract all buffer units in default value threshold range;All buffer units in the default value threshold range of extraction are clustered, and the buffer unit of default number of clusters is distributed into any cloud computing cache node and is stored.The invention enables data in accessed or processing, reduces the network data transmission between node, shortens processing time, effectively improves the efficiency of big data processing.

Description

A kind of big data processing distributed cache system and its method
Technical field
The invention belongs to big data application field more particularly to a kind of big data processing distributed cache system and its sides Method.
Background technology
The development of internet science and technology makes data volume sharply increase, and under the greatly developing of data science and technology, people can store up The data deposit, handled have reached unprecedented magnitude, and are rapidly increased with the speed more than Moore's Law.Big data Core value is exactly to be to store mass data and analyzed.In commercial environments, data processing service provider will Big data processing is packaged into service, is sold to user.
For some real-time data analysis requirements, user has the performance of processing and the time of return required.Therefore The performance handled big data is needed to optimize, to improve data-handling efficiency.Caching is to improve big data processing speed Important means.
It stores data in cache, data I/O efficiency can be greatly improved, and then accelerate data-handling efficiency.So And it is a kind of article costly to cache relative to External memory equipments such as disks, and big data is the magnanimity number of bulk sample sheet According to it is uneconomic, infeasible in the buffer to store all data.The data access of user is often to a part of data Frequently, in real time, therefore we can will access frequent, important data and be positioned among caching.
It is cached relative to traditional data, big data is cached with its exclusive feature:
Data are stored with key-value pair (Key-Value) structure organization.The granularity of caching, form and replace algorithm need into One step discussion is to adapt to the storage organization of big data.
Big data handles the cloud computing platform that needs to rely on.The data that big data accesses often have certain relevance, Related data are placed into similar position, the cost of data transmission can be reduced.For example a data processing needs A, B two Partial data, and A is stored in from B in two different nodes, this needs will be in one of data transmission to another node It could be handled;If A, B is centrally stored in a node, network transmission will be avoided, to improve treatment effeciency.It is obtaining Need it is data cached after, need to design a kind of method and these data be placed on to suitable node.
Invention content
In order to solve the disadvantage that the prior art, the present invention provide a kind of big data processing distributed caching method.This method Using the method clustered to buffer unit, memory buffers cell type is corresponded in each cloud computing cache node, is used for Accelerate the processing speed of big data.
To achieve the above object, the present invention uses following technical scheme:
A kind of big data processing distributed cache system, including:The big data memory and distributed cloud meter being in communication with each other Calculate server;
The big data memory is divided into several buffer units, and each buffer unit is used for the shape of key-value pair Formula is into row storage data;
It is equipped with several cloud computing cache nodes, big data extraction module and cloud meter in the distribution cloud computing server Calculate cache node distribution module;
The big data extraction module is used for the accessed frequency according to buffer unit, calculates the value of buffer unit And be ranked up, extract all buffer units in default value threshold range;
The cloud computing cache node distribution module, all cachings being used in the default value threshold range to extraction Unit is clustered, and the buffer unit of default number of clusters is distributed into any cloud computing cache node and is stored.
The big data memory includes RAM memory and FLASH memory.
The data being updated according to predetermined period in big data memory in buffer unit.
A kind of caching method of big data processing distributed cache system, including:
Big data processing server is divided into several buffer units, in each buffer unit in the form of key-value pair Into row storage data;
It according to the accessed frequency of buffer unit, calculates the value of buffer unit and is ranked up, extract and preset value threshold All buffer units being worth in range;
All buffer units in the default value threshold range of extraction are clustered, and delaying default number of clusters Memory cell is distributed into any cloud computing cache node and is stored.
Before the value for calculating buffer unit, the data that are updated according to predetermined period in buffer unit.
The computational methods of the value of buffer unit are:
Wherein,Indicate i-th of buffer unit j-th of period value;Indicate i-th of buffer unit in jth- The value in 1 period;α is the cycle influences factor, is constant;β is the data value factor in i-th of buffer unit, is constant;Access times of i-th of buffer unit within j-th of period;I and j is the positive integer more than or equal to 1,To be more than or equal to 0 positive integer.
In cloud computing cache node, caching big data is carried out using Memcache mechanism.
All buffer units in the default value threshold range of extraction are clustered using k-means algorithms.
Big data memory includes RAM memory and FLASH memory.
Beneficial effects of the present invention are:
(1) distributed cloud computing server of the invention is equipped with several cloud computing cache nodes, using each cloud meter Calculation cache node corresponds to the buffer unit type of storage preset quantity so that data are in accessed or processing, between reduction node Network data transmission, shorten processing time, effectively improve big data processing efficiency;
(2) the cloud computing cache node of distributed cloud computing server of the invention, may be used a variety of memory mechanisms into Row storage big data, wherein including Memcache mechanism;And it is arranged in the big data processing distributed cache system of the present invention Multiple cloud computing cache nodes, can ensure that big data obtains distributed caching and processing.
Description of the drawings
Fig. 1 is the big data processing distributed caching method flow chart of the present invention.
Specific implementation mode
The present invention will be further described with embodiment below in conjunction with the accompanying drawings:
The big data of the present invention handles distributed cache system:Big data memory and distributed cloud computing service Device, the two are in communication with each other.
It describes in detail in turn below to big data memory and with distributed cloud computing server:
(1) big data memory:
The division of big data memory has several buffer units, each buffer unit to be used to carry out in the form of key-value pair Store data.Wherein, big data memory includes RAM memory and FLASH memory.
(2) distributed cloud computing server:
It is slow that several cloud computing cache nodes, big data extraction module and cloud computing are equipped in distributed cloud computing server Deposit node distribution module;
Wherein, big data extraction module calculates the value of buffer unit simultaneously for the accessed frequency according to buffer unit It is ranked up, extracts all buffer units in default value threshold range;
Cloud computing cache node distribution module, all buffer units being used in the default value threshold range to extraction It is clustered, and the buffer unit of default number of clusters is distributed into any cloud computing cache node and is stored.
The data being updated according to predetermined period in big data memory in buffer unit.
Fig. 1 is the caching method of the big data processing distributed cache system of the present invention, this is described in detail with reference to Fig. 1 The caching method of the big data processing distributed cache system of invention.
Specifically, which includes:
Step 1:Big data processing server is divided into several buffer units, with key-value pair in each buffer unit Form into row storage data;
Step 2:It according to the accessed frequency of buffer unit, calculates the value of buffer unit and is ranked up, extraction is default All buffer units being worth in threshold range;
Step 3:All buffer units in the default value threshold range of extraction are clustered, and by default cluster numbers The buffer unit of amount is distributed into any cloud computing cache node and is stored.
Wherein, before the value for calculating buffer unit, the data that are updated according to predetermined period in buffer unit.
In step 2, the computational methods of the value of buffer unit are:
Wherein,Indicate i-th of buffer unit j-th of period value;Indicate i-th of buffer unit in jth- The value in 1 period;α is the cycle influences factor, is constant;β is the data value factor in i-th of buffer unit, is constant;Access times of i-th of buffer unit within j-th of period;I and j is the positive integer more than or equal to 1,To be more than In 0 positive integer.
When i-th of buffer unit is accessed, time of return is more urgent, and β value is higher.It, can basis in data access The requirement to time of return is accessed, urgency level is classified:In real time, generally, loosely.These three similar corresponding different β Value.The high access of urgency level has higher β value.It can be recorded and be counted in a cycle according to data access, any one The access frequency and access urgency level of buffer unit.
In cloud computing cache node, caching big data is carried out using Memcache mechanism.
All buffer units in the default value threshold range of extraction are clustered using k-means algorithms, and will The buffer unit of default number of clusters, which is distributed into any one cloud computing cache node, to be stored.
If a cluster uses k-means algorithms into line splitting more than the capacity of a node, then by the cluster, it is used in combination Node as few as possible stores it.
Before being clustered to all buffer units in the default value threshold range of extraction, in default value threshold value model All buffer units in enclosing build a connected graph:
Each buffer unit in default value threshold range is set as a point, if two buffer units are by a number It according to processing while accessing, increases the side that a weight is 1 in the two points, the weight on side can be superimposed, and preset value threshold value model All buffer units in enclosing form a connected graph.
The connected graph built is carried out to judge whether it is empty, if not empty, then to the default value threshold value model of extraction All buffer units in enclosing are clustered;Otherwise, terminate without cluster, in the default value threshold range extracted at this time The number of all buffer units is one.This buffer unit correspondence is stored into a cloud computing cache node.
Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.

Claims (8)

1. a kind of big data handles distributed cache system, which is characterized in that including:The big data memory that is in communication with each other and point Cloth cloud computing server;
The big data memory is divided into several buffer units, each buffer unit be used in the form of key-value pair into Row storage data;
It is slow it to be equipped with several cloud computing cache nodes, big data extraction module and cloud computing in the distribution cloud computing server Deposit node distribution module;
The big data extraction module is used to be gone forward side by side according to the accessed frequency of buffer unit, the value for calculating buffer unit All buffer units in default value threshold range are extracted in row sequence;
The computational methods of the value of buffer unit are:
Wherein,Indicate i-th of buffer unit j-th of period value;Indicate i-th of buffer unit in -1 week of jth The value of phase;α is the cycle influences factor, is constant;β is the data value factor in i-th of buffer unit, is constant;I-th Access times of a buffer unit within j-th of period;I and j is the positive integer more than or equal to 1,For just more than or equal to 0 Integer;
The cloud computing cache node distribution module, all buffer units being used in the default value threshold range to extraction It is clustered, and the buffer unit of default number of clusters is distributed into any cloud computing cache node and is stored.
2. a kind of big data as described in claim 1 handles distributed cache system, which is characterized in that the big data storage Device includes RAM memory and FLASH memory.
3. a kind of big data as described in claim 1 handles distributed cache system, which is characterized in that in big data memory The middle data being updated according to predetermined period in buffer unit.
4. a kind of caching method of big data processing distributed cache system as described in claim 1, which is characterized in that packet It includes:
Big data processing server is divided into several buffer units, is carried out in the form of key-value pair in each buffer unit Store data;
It according to the accessed frequency of buffer unit, calculates the value of buffer unit and is ranked up, extract default value threshold value model Enclose interior all buffer units;
The computational methods of the value of buffer unit are:
Wherein,Indicate i-th of buffer unit j-th of period value;Indicate i-th of buffer unit in -1 week of jth The value of phase;α is the cycle influences factor, is constant;β is the data value factor in i-th of buffer unit, is constant;I-th Access times of a buffer unit within j-th of period;I and j is the positive integer more than or equal to 1,For just more than or equal to 0 Integer;
All buffer units in the default value threshold range of extraction are clustered, and by the caching list of default number of clusters It is stored in member distribution to a cloud computing cache node.
5. caching method as claimed in claim 4, which is characterized in that before the value for calculating buffer unit, according to default week Phase is updated the data in buffer unit.
6. caching method as claimed in claim 4, which is characterized in that in cloud computing cache node, using Memcache machines System carries out caching big data.
7. caching method as claimed in claim 4, which is characterized in that using k-means algorithms to the default value threshold of extraction All buffer units within the scope of value are clustered.
8. caching method as claimed in claim 4, which is characterized in that the big data memory include RAM memory and FLASH memory.
CN201510891553.5A 2015-12-04 2015-12-04 A kind of big data processing distributed cache system and its method Active CN105554069B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510891553.5A CN105554069B (en) 2015-12-04 2015-12-04 A kind of big data processing distributed cache system and its method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510891553.5A CN105554069B (en) 2015-12-04 2015-12-04 A kind of big data processing distributed cache system and its method

Publications (2)

Publication Number Publication Date
CN105554069A CN105554069A (en) 2016-05-04
CN105554069B true CN105554069B (en) 2018-09-11

Family

ID=55833001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510891553.5A Active CN105554069B (en) 2015-12-04 2015-12-04 A kind of big data processing distributed cache system and its method

Country Status (1)

Country Link
CN (1) CN105554069B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528833A (en) * 2016-11-14 2017-03-22 天津南大通用数据技术股份有限公司 Method and device for dynamic redistribution of data of MPP (Massively Parallel Processing) database
CN107645541B (en) * 2017-08-24 2021-03-02 创新先进技术有限公司 Data storage method and device and server
CN107704591A (en) * 2017-10-12 2018-02-16 西南财经大学 A kind of data processing method of the intelligent wearable device based on cloud computing non-database framework
CN107995020B (en) * 2017-10-23 2021-05-07 北京兰云科技有限公司 Asset value assessment method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102984203A (en) * 2012-10-31 2013-03-20 深圳市深信服电子科技有限公司 Method and device and system for improving use ratio of high-cache device based on cloud computing
CN103051701A (en) * 2012-12-17 2013-04-17 北京网康科技有限公司 Cache admission method and system
CN103475690A (en) * 2013-06-17 2013-12-25 携程计算机技术(上海)有限公司 Memcached instance configuration method and Memcached instance configuration system
CN104050043A (en) * 2014-06-17 2014-09-17 华为技术有限公司 Share cache perception-based virtual machine scheduling method and device
CN104219327A (en) * 2014-09-27 2014-12-17 上海瀚之友信息技术服务有限公司 Distributed cache system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150106884A1 (en) * 2013-10-11 2015-04-16 Broadcom Corporation Memcached multi-tenancy offload

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102984203A (en) * 2012-10-31 2013-03-20 深圳市深信服电子科技有限公司 Method and device and system for improving use ratio of high-cache device based on cloud computing
CN103051701A (en) * 2012-12-17 2013-04-17 北京网康科技有限公司 Cache admission method and system
CN103475690A (en) * 2013-06-17 2013-12-25 携程计算机技术(上海)有限公司 Memcached instance configuration method and Memcached instance configuration system
CN104050043A (en) * 2014-06-17 2014-09-17 华为技术有限公司 Share cache perception-based virtual machine scheduling method and device
CN104219327A (en) * 2014-09-27 2014-12-17 上海瀚之友信息技术服务有限公司 Distributed cache system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"大数据负载的体系结构特征分析";罗建平 等;《计算机科学》;20151115;第42卷(第11期);第48-52页 *

Also Published As

Publication number Publication date
CN105554069A (en) 2016-05-04

Similar Documents

Publication Publication Date Title
CN105554069B (en) A kind of big data processing distributed cache system and its method
CN103856567B (en) Small file storage method based on Hadoop distributed file system
CN103678172B (en) Local data cache management method and device
US9652374B2 (en) Sparsity-driven matrix representation to optimize operational and storage efficiency
CN104407879B (en) A kind of power network sequential big data loaded in parallel method
CN105718364B (en) Resource capability dynamic assessment method is calculated in a kind of cloud computing platform
US20160132541A1 (en) Efficient implementations for mapreduce systems
CN104484234B (en) A kind of more wavefront tidal current computing methods and system based on GPU
CN104902001A (en) Method for load balancing of Web requests based on operating system virtualization
Canny et al. Machine learning at the limit
CN104199942B (en) A kind of Hadoop platform time series data incremental calculation method and system
CN106648456A (en) Dynamic save file access method based on use page view and prediction mechanism
CN108519919A (en) A method of realizing server resource dynamic dispatching under virtual cluster environment
CN104572505A (en) System and method for ensuring eventual consistency of mass data caches
CN108416054A (en) Dynamic HDFS copy number calculating methods based on file access temperature
CN106201839A (en) The information loading method of a kind of business object and device
CN105005585A (en) Log data processing method and device
CN109587072A (en) Distributed system overall situation speed limiting system and method
CN103577161A (en) Big data frequency parallel-processing method
WO2023278077A1 (en) Memory reduction in a system by oversubscribing physical memory shared by compute entities supported by the system
CN202093513U (en) Bulk data processing system
CN111629216B (en) VOD service cache replacement method based on random forest algorithm under edge network environment
CN110162272B (en) Memory computing cache management method and device
CN104050189B (en) The page shares processing method and processing device
CN110413540A (en) A kind of method, system, equipment and the storage medium of FPGA data caching

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant