CN105930096B - A kind of data block pre-cache method based on PageRank - Google Patents
A kind of data block pre-cache method based on PageRank Download PDFInfo
- Publication number
- CN105930096B CN105930096B CN201610227750.1A CN201610227750A CN105930096B CN 105930096 B CN105930096 B CN 105930096B CN 201610227750 A CN201610227750 A CN 201610227750A CN 105930096 B CN105930096 B CN 105930096B
- Authority
- CN
- China
- Prior art keywords
- data block
- data
- model
- block
- pagerank
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0679—Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
A kind of data block pre-cache method based on PageRank;It includes statistic record data block dispatch situation;The building of model;The update of model;The preservation of model;The load of model;The setting of model retention cycle H;Sequence based on PageRank algorithm;Block is lacked to interrupt;The pre- of data block is called in.The present invention be it is a kind of for the frequent disk I/O of data block in big data treatment process and caused by service performance decline and the data block cache hit rate solution that high problem does not propose, it can be widely applied in big data treatment process, pass through real-time statistic record data block dispatch situation, further according to spatial locality, close relation between temporal locality and the data block calculated by PageRank algorithm, by the way of calling in advance, will data block active be pushed in caching, to improve the hit rate of data-block cache, the performance of service is greatly improved.
Description
Technical field
The invention belongs to server data caching technology fields, more particularly to a kind of data block based on PageRank
Pre-cache method.
Background technique
With the development of internet and mobile Internet, a large amount of Internet application all relies on processing mass data to mention
For service.Emerging Internet application, which is illustrated, applies different data storage and access features from tradition.It is some at present big
The storage of the data of Internet company, which mostly uses greatly, stores many file mergencess onto disk at the data block of fixed size, this
Management of the enterprise to data in terms of the storage mode of sample, however when face the processing of mass data block, usually because of data
The influence of block cache policy causes data-block cache hit rate not high, and caused by multiple disk I/O, to seriously reduce
The performance of service.Currently, most of methods for solving the problems, such as this, constantly increase physical memory, using multi-level buffer, increase solid-state
Disk etc., and these methods substantially improve performance by way of physics, for memory, caching, solid state hard disk etc., these
The price of equipment is more expensive, this will cause the spending of very big physical hardware.At present not yet discovery for data block pre-cache or
The method based on PageRank algorithm of active cache.
Summary of the invention
To solve the above-mentioned problems, the purpose of the present invention is to provide a kind of data block pre-cache side based on PageRank
Method.
A kind of data block pre-cache method based on PageRank, characterized by the following steps:
Step 1: judging system with the presence or absence of data model, if will load under specified catalogue there are data model
Data model initializes the PageRank value of each data block according to data model, subsequently into step 6;Otherwise it enters step
Two;
Step 2: parameters required for initialization data pattern;
Step 3: the case where statistical data block uses within the Δ t time, generates the relational matrix A between data block;
Step 4: by above-mentioned relation matrix A, generating probability transfer matrix M is calculated according to PageRank formula V '
The PageRank value of each data block, Construction of A Model are completed;It according to the PageRank value of each data block, is simulated, specifically
Simulation process are as follows: the high N number of data block identifier of PageRank value is come out, and this N number of data block is called in into caching, counts Δ
In t1, the total data block number n that labeled N number of data block access times n1 and computer use, then hit rate p in Δ t1
=n1/n;
Step 5: judging whether above-mentioned hit rate p is greater than preset hit rate P, if being greater than, model is put into practical raw
In production;Otherwise step 3 is executed;
Step 6: the high N number of data block of PageRank value is called in caching;
Step 7: the service condition of statistical data block, updates relational matrix A;
Step 8: in data processing, whether in the buffer data are judged when reading data, if in the buffer, meter
Calculation machine just reads data from the data block of caching;Otherwise occur to lack block interruption, execute step 11;
Step 9: judge whether the time to preservation model, and if time is up, preservation or more new model;Otherwise it executes
Step 7;
Step 10: the data model at this moment is saved or is updated onto disk, it is direct when restarting so as to computer
Stress model;
Step 11: judging to lack the value C whether the number MBIs that block interrupts is greater than initializing set, if being greater than, first
CulBlock (data block that expression is currently accessing) is called in into caching, synchronization carries out model modification;Otherwise, will
CulBlock calls in caching, and record calls in the data block of caching, while scarce block interruption times MIBs is added 1.
Further, in step 3, the expression formula of the relational matrix A are as follows:
Wherein, i=j=n, ak,qIt indicates when data block k is accessed, it is next
Moment accesses the number of data block q, and a when k=qk,q=0, ak,qFor the relationship degree of data block k and data block q, n indicates to calculate
The total data block number that machine uses.
It is further: in step 4, the expression formula of the PageRank formula V ' are as follows:
V'=α MV+ (1- α) e, wherein α indicates that computer directly uses the probability of data block in the buffer, and 1- α indicates meter
Calculation machine occurs to lack the probability that block interruption uses data block from disk, and M indicates that the probability transfer matrix of relational matrix A, V indicate every
PageRank value in secondary iteration, and
The feature of data block pre-cache method provided by the invention based on PageRank is as follows:
1. the invention is to be put into data block in data-block cache in advance by way of a kind of soft strategy of active, to keep away
The IO for having exempted from disk in data block treatment process improves the hit rate of caching.
It is for solving 2. the invention is merged into the data block of fixed size primarily directed to most data files and proposes
Certainly quick processing of the enterprise to magnanimity long data block, it is not applicable to traditional other caching of operating system grade.
3. the invention is mainly the data block dispatch situation passed through under statistics probability real time environment, according to spatial locality and
Temporal locality, along with the PageRank algorithm optimized sorts, by multiple by data block that is to be processed and not handling also
It calls in caching in advance, and constantly updates data model under real time environment, there is good dynamic.
The present invention has the advantage that and good effect:
Data block pre-cache method based on PageRank is a kind of for the frequent magnetic of data block in big data treatment process
Disk IO and caused by service performance decline and the data block cache hit rate solution that high problem does not propose, can be widely applied to
In big data treatment process, by real-time statistic record data block dispatch situation, further according to spatial locality, temporal locality
Close relation between the data block calculated by PageRank algorithm will data block by the way of calling in advance
Active is pushed in caching, to improve the hit rate of data-block cache, the performance of service is greatly improved.
Detailed description of the invention
Fig. 1 is the schematic diagram of the actual production environment of the preferred embodiment of the present invention;
Fig. 2 is that the data block of the preferred embodiment of the present invention eliminates schematic diagram;
Fig. 3 is the flow chart of the preferred embodiment of the present invention.
Specific embodiment
Data block pre-cache method to provided by the invention based on PageRank in the following with reference to the drawings and specific embodiments
It is described in detail.
As shown in Figure 1, Figure 2, Figure 3 shows, the core of the data block pre-cache method provided by the invention based on PageRank is thought
Think be.
Data block pre-cache method provided by the invention based on PageRank is the implementation in actual production environment.Such as
Shown in Fig. 1, the actual production environment includes: data processing centre, memory, Disk, model factory.
Data processing centre: the processing center of mass data, also including model calculating and each data block PageRank
The calculating of value;
Memory: the high data block of caching PageRank value provides caching for data processing centre and supports, improves data processing
The hit rate at center reduces disk I/O;
Disk: the data for storing magnanimity are stored using data block, the persistent storage equipment of data model;
Model factory: mainly including four modules: logger, counter, meta data manager, scheduler.Logger master
To be used to the data block service condition of statistic record data processing centre, construct the relational matrix between data block;Scheduler master
If manage caching of the data block in caching Cache according to data model, eliminate, the building of data model, data model
Load, the update of data model and preservation of data model etc., initialization, update of various parameters etc.;Counter is mainly used
Come enumeration data block interruption times, model retention cycle etc.;Meta data manager is primarily used to data block in managing internal memory
Metadata.
As shown in Fig. 2, the data block of the data block pre-cache method provided by the invention based on PageRank eliminates mode,
Include (a) and (b) two ways:
Data block when the scarce block that (a) in Fig. 3 indicates that S12 the and S13 stage in Fig. 3 occurs interrupts eliminates side
Formula, the block number to come from disk tune are eliminated according to by the smallest data block of data block PageRank value in Cache;
(b) in Fig. 3 indicates that the model being calculated using PageRank algorithm progress data block is called in superseded
Mode calls in the high N of data block PageRank value (allowing the data block number cached according to memory) a data block in caching,
If some will call in the data block of memory in memory, then do not call in, is only marked;It only will not be in memory
In data block call in memory;
Fig. 2 explanation: each grid represents a data block;CulBlock indicates the data block being currently accessing;
StandByBlock indicates the data block (cache blocks) that subsequent time will likely access.
As shown in figure 3, the data block pre-cache method provided by the invention based on PageRank includes executing in order
The following steps:
Step 1: judging that system whether there is the S1 stage of data model: judge that system whether there is data model, if
There are data models to load data model under specified catalogue;Otherwise enter below step initialization data pattern;
Step 2: the S2 stage of initialization data parameter: parameters required for initialization data pattern;
Step 3: the S3 stage of the caching in the statistic record Δ t time: the feelings that statistical data block uses within the Δ t time
Condition generates the relational matrix A between data block;
Step 4: construction data model, carries out emulated memory and calls in, calculate the S4 stage of hit rate: is raw by the S3 stage
At relational matrix A, generate corresponding probability transfer matrix M, according to PageRank formula V ', calculate each data block
PageRank value, Construction of A Model are completed;According to the PageRank value of each data block, simulated (by the high N of PageRank value
A data block identifier comes out, it is assumed that they are called in memory, is counted in Δ t1, labeled N number of data block access times n1
And the total data block number n2 that uses of computer), then hit rate p=n1/n2 in Δ t1;
Step 5: judging whether hit rate is greater than the S5 stage of P: the hit rate p for judging that the S4 stage calculates is not greater than me
Preset hit rate P, if being greater than, will model put into actual production in;Otherwise continue to improve the model of initial phase;
Step 6: maximally related N number of data block to be called in the S6 stage of caching according to data model: by PageRank value
High N (allowing data block number according to memory) a data block calls in caching (only calling in the data block not having in current cache);
Step 7: the S7 stage of data processing: the phase data computer disposal stage, continuing statistical number in this stage
According to the service condition of block, relational matrix A is updated;
Step 8: judge data that computer uses whether the S8 stage in the buffer: computer is in data handling procedure
In, whether in the buffer data are judged when reading data, if in the buffer, computer just reads number from the data block of caching
According to;Otherwise but block occurs and interrupts the execution S11 stage;
Step 9: judging whether the S9 stage to the time of preservation model: judge whether the time to preservation model, if
Time is up, preservation or more new model;Otherwise continue data processing;
Step 10: saving or updating the S10 stage of data model: the data model at this moment being saved or updated and arrives disk
On, model is loaded directly into when restarting so as to computer;
Step 11: whether the judgement number that but block interrupts is greater than the S11 stage of the value of initializing set: judgement is but in block
Whether disconnected number MBIs is greater than the value C of initializing set, if being greater than, CulBlock data block is called in memory first, together
One moment carried out model modification;Otherwise, CulBlock data block need to only be called in memory, record calls in the data block of caching,
Scarce block interruption times MIBs is added 1 simultaneously.
The embodiments of the present invention have been described in detail above, but content is only the preferred embodiment of the present invention,
It should not be considered as limiting the scope of the invention.Any changes and modifications in accordance with the scope of the present application,
It should still be within the scope of the patent of the present invention.
Claims (1)
1. a kind of data block pre-cache method based on PageRank, characterized by the following steps:
Step 1: judging system with the presence or absence of data model, if there are data model data will be loaded under specified catalogue
Model initializes the PageRank value of each data block according to data model, subsequently into step 6;Otherwise two are entered step;
Step 2: parameters required for initialization data pattern;
Step 3: the case where statistical data block uses within the Δ t time generates number according to the precedence relationship that data block is accessed
According to the relational matrix A between block, formula are as follows:
Parameter in formula: i=j=n, ak,qIndicate that, when data block k is accessed, subsequent time accesses the number of data block q, and k=
A when qk,q=0, ak,qFor the relationship degree of data block k and data block q, n indicates the total data block number that computer uses;
Step 4: by above-mentioned relation matrix A, generating probability transfer matrix M is calculated each according to PageRank formula V '
The PageRank value of data block, Construction of A Model are completed;It according to the PageRank value of each data block, is simulated, specific mould
Quasi- process are as follows: the high N number of data block identifier of PageRank value is come out, and this N number of data block is called in into caching, counts Δ t1
Total data block number n interior, that labeled N number of data block access times n1 and computer use, then hit rate p=in Δ t1
n1/n;
V ' is indicated are as follows:
V'=α MV+ (1- α) e
Parameter in formula: α indicates that computer directly uses the probability of data block in the buffer, and 1- α indicates that computer occurs to lack in block
The disconnected probability that data block is used from disk, M indicate that the probability transfer matrix of relational matrix A, V indicate in each iteration
PageRank value, andN in element indicates the total data block number that computer uses;
Step 5: judging whether above-mentioned hit rate p is greater than preset hit rate P, if being greater than, model is put into actual production
In;Otherwise step 3 is executed;
Step 6: the high N number of data block of PageRank value is called in caching;
Step 7: the service condition of statistical data block, updates relational matrix A;
Step 8: in data processing, whether in the buffer data are judged when reading data, if in the buffer, computer
Just data are read from the data block of caching;Otherwise occur to lack block interruption, execute step 11;
Step 9: judge whether the time to preservation model, and if time is up, preservation or more new model;It is no to then follow the steps
Seven;
Step 10: the data model at this moment is saved or is updated onto disk, it is loaded directly into when restarting so as to computer
Model;
Step 11: the value C whether the number MBIs that judgement lacks block interruption is greater than initializing set will work as first if being greater than
The preceding data block accessed calls in caching, and synchronization carries out model modification;Otherwise, the data block tune that will be currently accessing
Enter to caching, record calls in the data block of caching, while scarce block interruption times MIBs is added 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610227750.1A CN105930096B (en) | 2016-04-12 | 2016-04-12 | A kind of data block pre-cache method based on PageRank |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610227750.1A CN105930096B (en) | 2016-04-12 | 2016-04-12 | A kind of data block pre-cache method based on PageRank |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105930096A CN105930096A (en) | 2016-09-07 |
CN105930096B true CN105930096B (en) | 2019-01-11 |
Family
ID=56839005
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610227750.1A Expired - Fee Related CN105930096B (en) | 2016-04-12 | 2016-04-12 | A kind of data block pre-cache method based on PageRank |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105930096B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115952110B (en) * | 2023-03-09 | 2023-06-06 | 浪潮电子信息产业股份有限公司 | Data caching method, device, equipment and computer readable storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1932817A (en) * | 2006-09-15 | 2007-03-21 | 陈远 | Common interconnection network content keyword interactive system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9734063B2 (en) * | 2014-02-27 | 2017-08-15 | École Polytechnique Fédérale De Lausanne (Epfl) | Scale-out non-uniform memory access |
-
2016
- 2016-04-12 CN CN201610227750.1A patent/CN105930096B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1932817A (en) * | 2006-09-15 | 2007-03-21 | 陈远 | Common interconnection network content keyword interactive system |
Also Published As
Publication number | Publication date |
---|---|
CN105930096A (en) | 2016-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103617131B (en) | Data caching achieving method | |
CN104834675B (en) | Query performance optimization method based on user behavior analysis | |
US8825959B1 (en) | Method and apparatus for using data access time prediction for improving data buffering policies | |
CN107197053A (en) | A kind of load-balancing method and device | |
Qu et al. | A dynamic replica strategy based on Markov model for hadoop distributed file system (HDFS) | |
CN108595254B (en) | Query scheduling method | |
CA2776127A1 (en) | Data security for a database in a multi-nodal environment | |
CN108509723A (en) | LRU Cache based on artificial neural network prefetch mechanism performance income evaluation method | |
CN103425564B (en) | A kind of smartphone software uses Forecasting Methodology | |
CN101989236A (en) | Method for realizing instruction buffer lock | |
CN109471872A (en) | Handle the method and device of high concurrent inquiry request | |
CN112463189A (en) | Distributed deep learning multi-step delay updating method based on communication operation sparsification | |
CN108415766B (en) | Rendering task dynamic scheduling method | |
CN106201839A (en) | The information loading method of a kind of business object and device | |
CN105930096B (en) | A kind of data block pre-cache method based on PageRank | |
CN107180118A (en) | A kind of file system cache data managing method and device | |
CN105654120B (en) | A kind of software load feature extracting method based on SOM and K-means two-phase analyzing method | |
Zhang et al. | Joint optimization of multi-user computing offloading and service caching in mobile edge computing | |
Kvet | Dangling predicates and function call optimization in the oracle database | |
US11030194B2 (en) | Demand-driven dynamic aggregate | |
CN100422996C (en) | Method for updating low-loading picture based on database | |
CN103544302A (en) | Partition maintenance method and device of database | |
CN101996246A (en) | Method and system for instant indexing | |
CN113360576A (en) | Power grid mass data real-time processing method and device based on Flink Streaming | |
WO2020114155A1 (en) | Subgrade compaction construction data efficient processing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190111 |
|
CF01 | Termination of patent right due to non-payment of annual fee |