CN105554069B - A kind of big data processing distributed cache system and its method - Google Patents
A kind of big data processing distributed cache system and its method Download PDFInfo
- Publication number
- CN105554069B CN105554069B CN201510891553.5A CN201510891553A CN105554069B CN 105554069 B CN105554069 B CN 105554069B CN 201510891553 A CN201510891553 A CN 201510891553A CN 105554069 B CN105554069 B CN 105554069B
- Authority
- CN
- China
- Prior art keywords
- buffer unit
- big data
- value
- buffer
- cloud computing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012545 processing Methods 0.000 title claims abstract description 30
- 238000000034 method Methods 0.000 title claims abstract description 18
- 239000000872 buffer Substances 0.000 claims abstract description 93
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 239000000284 extract Substances 0.000 claims abstract description 5
- 238000013075 data extraction Methods 0.000 claims description 6
- 238000000205 computational method Methods 0.000 claims description 4
- 238000004891 communication Methods 0.000 claims description 3
- 238000013500 data storage Methods 0.000 claims 1
- 239000004744 fabric Substances 0.000 claims 1
- 230000005540 biological transmission Effects 0.000 abstract description 5
- 230000007246 mechanism Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000008520 organization Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention discloses a kind of big data processing distributed cache systems and its method, wherein this method includes:Big data processing server is divided into several buffer units, into row storage data in the form of key-value pair in each buffer unit;It according to the accessed frequency of buffer unit, calculates the value of buffer unit and is ranked up, extract all buffer units in default value threshold range;All buffer units in the default value threshold range of extraction are clustered, and the buffer unit of default number of clusters is distributed into any cloud computing cache node and is stored.The invention enables data in accessed or processing, reduces the network data transmission between node, shortens processing time, effectively improves the efficiency of big data processing.
Description
Technical field
The invention belongs to big data application field more particularly to a kind of big data processing distributed cache system and its sides
Method.
Background technology
The development of internet science and technology makes data volume sharply increase, and under the greatly developing of data science and technology, people can store up
The data deposit, handled have reached unprecedented magnitude, and are rapidly increased with the speed more than Moore's Law.Big data
Core value is exactly to be to store mass data and analyzed.In commercial environments, data processing service provider will
Big data processing is packaged into service, is sold to user.
For some real-time data analysis requirements, user has the performance of processing and the time of return required.Therefore
The performance handled big data is needed to optimize, to improve data-handling efficiency.Caching is to improve big data processing speed
Important means.
It stores data in cache, data I/O efficiency can be greatly improved, and then accelerate data-handling efficiency.So
And it is a kind of article costly to cache relative to External memory equipments such as disks, and big data is the magnanimity number of bulk sample sheet
According to it is uneconomic, infeasible in the buffer to store all data.The data access of user is often to a part of data
Frequently, in real time, therefore we can will access frequent, important data and be positioned among caching.
It is cached relative to traditional data, big data is cached with its exclusive feature:
Data are stored with key-value pair (Key-Value) structure organization.The granularity of caching, form and replace algorithm need into
One step discussion is to adapt to the storage organization of big data.
Big data handles the cloud computing platform that needs to rely on.The data that big data accesses often have certain relevance,
Related data are placed into similar position, the cost of data transmission can be reduced.For example a data processing needs A, B two
Partial data, and A is stored in from B in two different nodes, this needs will be in one of data transmission to another node
It could be handled;If A, B is centrally stored in a node, network transmission will be avoided, to improve treatment effeciency.It is obtaining
Need it is data cached after, need to design a kind of method and these data be placed on to suitable node.
Invention content
In order to solve the disadvantage that the prior art, the present invention provide a kind of big data processing distributed caching method.This method
Using the method clustered to buffer unit, memory buffers cell type is corresponded in each cloud computing cache node, is used for
Accelerate the processing speed of big data.
To achieve the above object, the present invention uses following technical scheme:
A kind of big data processing distributed cache system, including:The big data memory and distributed cloud meter being in communication with each other
Calculate server;
The big data memory is divided into several buffer units, and each buffer unit is used for the shape of key-value pair
Formula is into row storage data;
It is equipped with several cloud computing cache nodes, big data extraction module and cloud meter in the distribution cloud computing server
Calculate cache node distribution module;
The big data extraction module is used for the accessed frequency according to buffer unit, calculates the value of buffer unit
And be ranked up, extract all buffer units in default value threshold range;
The cloud computing cache node distribution module, all cachings being used in the default value threshold range to extraction
Unit is clustered, and the buffer unit of default number of clusters is distributed into any cloud computing cache node and is stored.
The big data memory includes RAM memory and FLASH memory.
The data being updated according to predetermined period in big data memory in buffer unit.
A kind of caching method of big data processing distributed cache system, including:
Big data processing server is divided into several buffer units, in each buffer unit in the form of key-value pair
Into row storage data;
It according to the accessed frequency of buffer unit, calculates the value of buffer unit and is ranked up, extract and preset value threshold
All buffer units being worth in range;
All buffer units in the default value threshold range of extraction are clustered, and delaying default number of clusters
Memory cell is distributed into any cloud computing cache node and is stored.
Before the value for calculating buffer unit, the data that are updated according to predetermined period in buffer unit.
The computational methods of the value of buffer unit are:
Wherein,Indicate i-th of buffer unit j-th of period value;Indicate i-th of buffer unit in jth-
The value in 1 period;α is the cycle influences factor, is constant;β is the data value factor in i-th of buffer unit, is constant;Access times of i-th of buffer unit within j-th of period;I and j is the positive integer more than or equal to 1,To be more than or equal to
0 positive integer.
In cloud computing cache node, caching big data is carried out using Memcache mechanism.
All buffer units in the default value threshold range of extraction are clustered using k-means algorithms.
Big data memory includes RAM memory and FLASH memory.
Beneficial effects of the present invention are:
(1) distributed cloud computing server of the invention is equipped with several cloud computing cache nodes, using each cloud meter
Calculation cache node corresponds to the buffer unit type of storage preset quantity so that data are in accessed or processing, between reduction node
Network data transmission, shorten processing time, effectively improve big data processing efficiency;
(2) the cloud computing cache node of distributed cloud computing server of the invention, may be used a variety of memory mechanisms into
Row storage big data, wherein including Memcache mechanism;And it is arranged in the big data processing distributed cache system of the present invention
Multiple cloud computing cache nodes, can ensure that big data obtains distributed caching and processing.
Description of the drawings
Fig. 1 is the big data processing distributed caching method flow chart of the present invention.
Specific implementation mode
The present invention will be further described with embodiment below in conjunction with the accompanying drawings:
The big data of the present invention handles distributed cache system:Big data memory and distributed cloud computing service
Device, the two are in communication with each other.
It describes in detail in turn below to big data memory and with distributed cloud computing server:
(1) big data memory:
The division of big data memory has several buffer units, each buffer unit to be used to carry out in the form of key-value pair
Store data.Wherein, big data memory includes RAM memory and FLASH memory.
(2) distributed cloud computing server:
It is slow that several cloud computing cache nodes, big data extraction module and cloud computing are equipped in distributed cloud computing server
Deposit node distribution module;
Wherein, big data extraction module calculates the value of buffer unit simultaneously for the accessed frequency according to buffer unit
It is ranked up, extracts all buffer units in default value threshold range;
Cloud computing cache node distribution module, all buffer units being used in the default value threshold range to extraction
It is clustered, and the buffer unit of default number of clusters is distributed into any cloud computing cache node and is stored.
The data being updated according to predetermined period in big data memory in buffer unit.
Fig. 1 is the caching method of the big data processing distributed cache system of the present invention, this is described in detail with reference to Fig. 1
The caching method of the big data processing distributed cache system of invention.
Specifically, which includes:
Step 1:Big data processing server is divided into several buffer units, with key-value pair in each buffer unit
Form into row storage data;
Step 2:It according to the accessed frequency of buffer unit, calculates the value of buffer unit and is ranked up, extraction is default
All buffer units being worth in threshold range;
Step 3:All buffer units in the default value threshold range of extraction are clustered, and by default cluster numbers
The buffer unit of amount is distributed into any cloud computing cache node and is stored.
Wherein, before the value for calculating buffer unit, the data that are updated according to predetermined period in buffer unit.
In step 2, the computational methods of the value of buffer unit are:
Wherein,Indicate i-th of buffer unit j-th of period value;Indicate i-th of buffer unit in jth-
The value in 1 period;α is the cycle influences factor, is constant;β is the data value factor in i-th of buffer unit, is constant;Access times of i-th of buffer unit within j-th of period;I and j is the positive integer more than or equal to 1,To be more than
In 0 positive integer.
When i-th of buffer unit is accessed, time of return is more urgent, and β value is higher.It, can basis in data access
The requirement to time of return is accessed, urgency level is classified:In real time, generally, loosely.These three similar corresponding different β
Value.The high access of urgency level has higher β value.It can be recorded and be counted in a cycle according to data access, any one
The access frequency and access urgency level of buffer unit.
In cloud computing cache node, caching big data is carried out using Memcache mechanism.
All buffer units in the default value threshold range of extraction are clustered using k-means algorithms, and will
The buffer unit of default number of clusters, which is distributed into any one cloud computing cache node, to be stored.
If a cluster uses k-means algorithms into line splitting more than the capacity of a node, then by the cluster, it is used in combination
Node as few as possible stores it.
Before being clustered to all buffer units in the default value threshold range of extraction, in default value threshold value model
All buffer units in enclosing build a connected graph:
Each buffer unit in default value threshold range is set as a point, if two buffer units are by a number
It according to processing while accessing, increases the side that a weight is 1 in the two points, the weight on side can be superimposed, and preset value threshold value model
All buffer units in enclosing form a connected graph.
The connected graph built is carried out to judge whether it is empty, if not empty, then to the default value threshold value model of extraction
All buffer units in enclosing are clustered;Otherwise, terminate without cluster, in the default value threshold range extracted at this time
The number of all buffer units is one.This buffer unit correspondence is stored into a cloud computing cache node.
Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention
The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not
Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.
Claims (8)
1. a kind of big data handles distributed cache system, which is characterized in that including:The big data memory that is in communication with each other and point
Cloth cloud computing server;
The big data memory is divided into several buffer units, each buffer unit be used in the form of key-value pair into
Row storage data;
It is slow it to be equipped with several cloud computing cache nodes, big data extraction module and cloud computing in the distribution cloud computing server
Deposit node distribution module;
The big data extraction module is used to be gone forward side by side according to the accessed frequency of buffer unit, the value for calculating buffer unit
All buffer units in default value threshold range are extracted in row sequence;
The computational methods of the value of buffer unit are:
Wherein,Indicate i-th of buffer unit j-th of period value;Indicate i-th of buffer unit in -1 week of jth
The value of phase;α is the cycle influences factor, is constant;β is the data value factor in i-th of buffer unit, is constant;I-th
Access times of a buffer unit within j-th of period;I and j is the positive integer more than or equal to 1,For just more than or equal to 0
Integer;
The cloud computing cache node distribution module, all buffer units being used in the default value threshold range to extraction
It is clustered, and the buffer unit of default number of clusters is distributed into any cloud computing cache node and is stored.
2. a kind of big data as described in claim 1 handles distributed cache system, which is characterized in that the big data storage
Device includes RAM memory and FLASH memory.
3. a kind of big data as described in claim 1 handles distributed cache system, which is characterized in that in big data memory
The middle data being updated according to predetermined period in buffer unit.
4. a kind of caching method of big data processing distributed cache system as described in claim 1, which is characterized in that packet
It includes:
Big data processing server is divided into several buffer units, is carried out in the form of key-value pair in each buffer unit
Store data;
It according to the accessed frequency of buffer unit, calculates the value of buffer unit and is ranked up, extract default value threshold value model
Enclose interior all buffer units;
The computational methods of the value of buffer unit are:
Wherein,Indicate i-th of buffer unit j-th of period value;Indicate i-th of buffer unit in -1 week of jth
The value of phase;α is the cycle influences factor, is constant;β is the data value factor in i-th of buffer unit, is constant;I-th
Access times of a buffer unit within j-th of period;I and j is the positive integer more than or equal to 1,For just more than or equal to 0
Integer;
All buffer units in the default value threshold range of extraction are clustered, and by the caching list of default number of clusters
It is stored in member distribution to a cloud computing cache node.
5. caching method as claimed in claim 4, which is characterized in that before the value for calculating buffer unit, according to default week
Phase is updated the data in buffer unit.
6. caching method as claimed in claim 4, which is characterized in that in cloud computing cache node, using Memcache machines
System carries out caching big data.
7. caching method as claimed in claim 4, which is characterized in that using k-means algorithms to the default value threshold of extraction
All buffer units within the scope of value are clustered.
8. caching method as claimed in claim 4, which is characterized in that the big data memory include RAM memory and
FLASH memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510891553.5A CN105554069B (en) | 2015-12-04 | 2015-12-04 | A kind of big data processing distributed cache system and its method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510891553.5A CN105554069B (en) | 2015-12-04 | 2015-12-04 | A kind of big data processing distributed cache system and its method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105554069A CN105554069A (en) | 2016-05-04 |
CN105554069B true CN105554069B (en) | 2018-09-11 |
Family
ID=55833001
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510891553.5A Active CN105554069B (en) | 2015-12-04 | 2015-12-04 | A kind of big data processing distributed cache system and its method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105554069B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106528833A (en) * | 2016-11-14 | 2017-03-22 | 天津南大通用数据技术股份有限公司 | Method and device for dynamic redistribution of data of MPP (Massively Parallel Processing) database |
CN107645541B (en) * | 2017-08-24 | 2021-03-02 | 创新先进技术有限公司 | Data storage method and device and server |
CN107704591A (en) * | 2017-10-12 | 2018-02-16 | 西南财经大学 | A kind of data processing method of the intelligent wearable device based on cloud computing non-database framework |
CN107995020B (en) * | 2017-10-23 | 2021-05-07 | 北京兰云科技有限公司 | Asset value assessment method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102984203A (en) * | 2012-10-31 | 2013-03-20 | 深圳市深信服电子科技有限公司 | Method and device and system for improving use ratio of high-cache device based on cloud computing |
CN103051701A (en) * | 2012-12-17 | 2013-04-17 | 北京网康科技有限公司 | Cache admission method and system |
CN103475690A (en) * | 2013-06-17 | 2013-12-25 | 携程计算机技术(上海)有限公司 | Memcached instance configuration method and Memcached instance configuration system |
CN104050043A (en) * | 2014-06-17 | 2014-09-17 | 华为技术有限公司 | Share cache perception-based virtual machine scheduling method and device |
CN104219327A (en) * | 2014-09-27 | 2014-12-17 | 上海瀚之友信息技术服务有限公司 | Distributed cache system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150106884A1 (en) * | 2013-10-11 | 2015-04-16 | Broadcom Corporation | Memcached multi-tenancy offload |
-
2015
- 2015-12-04 CN CN201510891553.5A patent/CN105554069B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102984203A (en) * | 2012-10-31 | 2013-03-20 | 深圳市深信服电子科技有限公司 | Method and device and system for improving use ratio of high-cache device based on cloud computing |
CN103051701A (en) * | 2012-12-17 | 2013-04-17 | 北京网康科技有限公司 | Cache admission method and system |
CN103475690A (en) * | 2013-06-17 | 2013-12-25 | 携程计算机技术(上海)有限公司 | Memcached instance configuration method and Memcached instance configuration system |
CN104050043A (en) * | 2014-06-17 | 2014-09-17 | 华为技术有限公司 | Share cache perception-based virtual machine scheduling method and device |
CN104219327A (en) * | 2014-09-27 | 2014-12-17 | 上海瀚之友信息技术服务有限公司 | Distributed cache system |
Non-Patent Citations (1)
Title |
---|
"大数据负载的体系结构特征分析";罗建平 等;《计算机科学》;20151115;第42卷(第11期);第48-52页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105554069A (en) | 2016-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105554069B (en) | A kind of big data processing distributed cache system and its method | |
CN103856567B (en) | Small file storage method based on Hadoop distributed file system | |
CN103678172B (en) | Local data cache management method and device | |
US9652374B2 (en) | Sparsity-driven matrix representation to optimize operational and storage efficiency | |
CN104407879B (en) | A kind of power network sequential big data loaded in parallel method | |
CN105718364B (en) | Resource capability dynamic assessment method is calculated in a kind of cloud computing platform | |
US20160132541A1 (en) | Efficient implementations for mapreduce systems | |
CN104484234B (en) | A kind of more wavefront tidal current computing methods and system based on GPU | |
CN104902001A (en) | Method for load balancing of Web requests based on operating system virtualization | |
Canny et al. | Machine learning at the limit | |
CN104199942B (en) | A kind of Hadoop platform time series data incremental calculation method and system | |
CN106648456A (en) | Dynamic save file access method based on use page view and prediction mechanism | |
CN108519919A (en) | A method of realizing server resource dynamic dispatching under virtual cluster environment | |
CN104572505A (en) | System and method for ensuring eventual consistency of mass data caches | |
CN108416054A (en) | Dynamic HDFS copy number calculating methods based on file access temperature | |
CN106201839A (en) | The information loading method of a kind of business object and device | |
CN105005585A (en) | Log data processing method and device | |
CN109587072A (en) | Distributed system overall situation speed limiting system and method | |
CN103577161A (en) | Big data frequency parallel-processing method | |
WO2023278077A1 (en) | Memory reduction in a system by oversubscribing physical memory shared by compute entities supported by the system | |
CN202093513U (en) | Bulk data processing system | |
CN111629216B (en) | VOD service cache replacement method based on random forest algorithm under edge network environment | |
CN110162272B (en) | Memory computing cache management method and device | |
CN104050189B (en) | The page shares processing method and processing device | |
CN110413540A (en) | A kind of method, system, equipment and the storage medium of FPGA data caching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |