CN105930096B - A kind of data block pre-cache method based on PageRank - Google Patents

A kind of data block pre-cache method based on PageRank Download PDF

Info

Publication number
CN105930096B
CN105930096B CN201610227750.1A CN201610227750A CN105930096B CN 105930096 B CN105930096 B CN 105930096B CN 201610227750 A CN201610227750 A CN 201610227750A CN 105930096 B CN105930096 B CN 105930096B
Authority
CN
China
Prior art keywords
data block
data
model
block
pagerank
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610227750.1A
Other languages
Chinese (zh)
Other versions
CN105930096A (en
Inventor
肖殷洪
刘震
王晨光
王天凯
王斌
王强富
郑峰弓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Travelsky Technology Co Ltd
Original Assignee
China Travelsky Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Travelsky Technology Co Ltd filed Critical China Travelsky Technology Co Ltd
Priority to CN201610227750.1A priority Critical patent/CN105930096B/en
Publication of CN105930096A publication Critical patent/CN105930096A/en
Application granted granted Critical
Publication of CN105930096B publication Critical patent/CN105930096B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A kind of data block pre-cache method based on PageRank;It includes statistic record data block dispatch situation;The building of model;The update of model;The preservation of model;The load of model;The setting of model retention cycle H;Sequence based on PageRank algorithm;Block is lacked to interrupt;The pre- of data block is called in.The present invention be it is a kind of for the frequent disk I/O of data block in big data treatment process and caused by service performance decline and the data block cache hit rate solution that high problem does not propose, it can be widely applied in big data treatment process, pass through real-time statistic record data block dispatch situation, further according to spatial locality, close relation between temporal locality and the data block calculated by PageRank algorithm, by the way of calling in advance, will data block active be pushed in caching, to improve the hit rate of data-block cache, the performance of service is greatly improved.

Description

A kind of data block pre-cache method based on PageRank
Technical field
The invention belongs to server data caching technology fields, more particularly to a kind of data block based on PageRank Pre-cache method.
Background technique
With the development of internet and mobile Internet, a large amount of Internet application all relies on processing mass data to mention For service.Emerging Internet application, which is illustrated, applies different data storage and access features from tradition.It is some at present big The storage of the data of Internet company, which mostly uses greatly, stores many file mergencess onto disk at the data block of fixed size, this Management of the enterprise to data in terms of the storage mode of sample, however when face the processing of mass data block, usually because of data The influence of block cache policy causes data-block cache hit rate not high, and caused by multiple disk I/O, to seriously reduce The performance of service.Currently, most of methods for solving the problems, such as this, constantly increase physical memory, using multi-level buffer, increase solid-state Disk etc., and these methods substantially improve performance by way of physics, for memory, caching, solid state hard disk etc., these The price of equipment is more expensive, this will cause the spending of very big physical hardware.At present not yet discovery for data block pre-cache or The method based on PageRank algorithm of active cache.
Summary of the invention
To solve the above-mentioned problems, the purpose of the present invention is to provide a kind of data block pre-cache side based on PageRank Method.
A kind of data block pre-cache method based on PageRank, characterized by the following steps:
Step 1: judging system with the presence or absence of data model, if will load under specified catalogue there are data model Data model initializes the PageRank value of each data block according to data model, subsequently into step 6;Otherwise it enters step Two;
Step 2: parameters required for initialization data pattern;
Step 3: the case where statistical data block uses within the Δ t time, generates the relational matrix A between data block;
Step 4: by above-mentioned relation matrix A, generating probability transfer matrix M is calculated according to PageRank formula V ' The PageRank value of each data block, Construction of A Model are completed;It according to the PageRank value of each data block, is simulated, specifically Simulation process are as follows: the high N number of data block identifier of PageRank value is come out, and this N number of data block is called in into caching, counts Δ In t1, the total data block number n that labeled N number of data block access times n1 and computer use, then hit rate p in Δ t1 =n1/n;
Step 5: judging whether above-mentioned hit rate p is greater than preset hit rate P, if being greater than, model is put into practical raw In production;Otherwise step 3 is executed;
Step 6: the high N number of data block of PageRank value is called in caching;
Step 7: the service condition of statistical data block, updates relational matrix A;
Step 8: in data processing, whether in the buffer data are judged when reading data, if in the buffer, meter Calculation machine just reads data from the data block of caching;Otherwise occur to lack block interruption, execute step 11;
Step 9: judge whether the time to preservation model, and if time is up, preservation or more new model;Otherwise it executes Step 7;
Step 10: the data model at this moment is saved or is updated onto disk, it is direct when restarting so as to computer Stress model;
Step 11: judging to lack the value C whether the number MBIs that block interrupts is greater than initializing set, if being greater than, first CulBlock (data block that expression is currently accessing) is called in into caching, synchronization carries out model modification;Otherwise, will CulBlock calls in caching, and record calls in the data block of caching, while scarce block interruption times MIBs is added 1.
Further, in step 3, the expression formula of the relational matrix A are as follows:
Wherein, i=j=n, ak,qIt indicates when data block k is accessed, it is next Moment accesses the number of data block q, and a when k=qk,q=0, ak,qFor the relationship degree of data block k and data block q, n indicates to calculate The total data block number that machine uses.
It is further: in step 4, the expression formula of the PageRank formula V ' are as follows:
V'=α MV+ (1- α) e, wherein α indicates that computer directly uses the probability of data block in the buffer, and 1- α indicates meter Calculation machine occurs to lack the probability that block interruption uses data block from disk, and M indicates that the probability transfer matrix of relational matrix A, V indicate every PageRank value in secondary iteration, and
The feature of data block pre-cache method provided by the invention based on PageRank is as follows:
1. the invention is to be put into data block in data-block cache in advance by way of a kind of soft strategy of active, to keep away The IO for having exempted from disk in data block treatment process improves the hit rate of caching.
It is for solving 2. the invention is merged into the data block of fixed size primarily directed to most data files and proposes Certainly quick processing of the enterprise to magnanimity long data block, it is not applicable to traditional other caching of operating system grade.
3. the invention is mainly the data block dispatch situation passed through under statistics probability real time environment, according to spatial locality and Temporal locality, along with the PageRank algorithm optimized sorts, by multiple by data block that is to be processed and not handling also It calls in caching in advance, and constantly updates data model under real time environment, there is good dynamic.
The present invention has the advantage that and good effect:
Data block pre-cache method based on PageRank is a kind of for the frequent magnetic of data block in big data treatment process Disk IO and caused by service performance decline and the data block cache hit rate solution that high problem does not propose, can be widely applied to In big data treatment process, by real-time statistic record data block dispatch situation, further according to spatial locality, temporal locality Close relation between the data block calculated by PageRank algorithm will data block by the way of calling in advance Active is pushed in caching, to improve the hit rate of data-block cache, the performance of service is greatly improved.
Detailed description of the invention
Fig. 1 is the schematic diagram of the actual production environment of the preferred embodiment of the present invention;
Fig. 2 is that the data block of the preferred embodiment of the present invention eliminates schematic diagram;
Fig. 3 is the flow chart of the preferred embodiment of the present invention.
Specific embodiment
Data block pre-cache method to provided by the invention based on PageRank in the following with reference to the drawings and specific embodiments It is described in detail.
As shown in Figure 1, Figure 2, Figure 3 shows, the core of the data block pre-cache method provided by the invention based on PageRank is thought Think be.
Data block pre-cache method provided by the invention based on PageRank is the implementation in actual production environment.Such as Shown in Fig. 1, the actual production environment includes: data processing centre, memory, Disk, model factory.
Data processing centre: the processing center of mass data, also including model calculating and each data block PageRank The calculating of value;
Memory: the high data block of caching PageRank value provides caching for data processing centre and supports, improves data processing The hit rate at center reduces disk I/O;
Disk: the data for storing magnanimity are stored using data block, the persistent storage equipment of data model;
Model factory: mainly including four modules: logger, counter, meta data manager, scheduler.Logger master To be used to the data block service condition of statistic record data processing centre, construct the relational matrix between data block;Scheduler master If manage caching of the data block in caching Cache according to data model, eliminate, the building of data model, data model Load, the update of data model and preservation of data model etc., initialization, update of various parameters etc.;Counter is mainly used Come enumeration data block interruption times, model retention cycle etc.;Meta data manager is primarily used to data block in managing internal memory Metadata.
As shown in Fig. 2, the data block of the data block pre-cache method provided by the invention based on PageRank eliminates mode, Include (a) and (b) two ways:
Data block when the scarce block that (a) in Fig. 3 indicates that S12 the and S13 stage in Fig. 3 occurs interrupts eliminates side Formula, the block number to come from disk tune are eliminated according to by the smallest data block of data block PageRank value in Cache;
(b) in Fig. 3 indicates that the model being calculated using PageRank algorithm progress data block is called in superseded Mode calls in the high N of data block PageRank value (allowing the data block number cached according to memory) a data block in caching, If some will call in the data block of memory in memory, then do not call in, is only marked;It only will not be in memory In data block call in memory;
Fig. 2 explanation: each grid represents a data block;CulBlock indicates the data block being currently accessing; StandByBlock indicates the data block (cache blocks) that subsequent time will likely access.
As shown in figure 3, the data block pre-cache method provided by the invention based on PageRank includes executing in order The following steps:
Step 1: judging that system whether there is the S1 stage of data model: judge that system whether there is data model, if There are data models to load data model under specified catalogue;Otherwise enter below step initialization data pattern;
Step 2: the S2 stage of initialization data parameter: parameters required for initialization data pattern;
Step 3: the S3 stage of the caching in the statistic record Δ t time: the feelings that statistical data block uses within the Δ t time Condition generates the relational matrix A between data block;
Step 4: construction data model, carries out emulated memory and calls in, calculate the S4 stage of hit rate: is raw by the S3 stage At relational matrix A, generate corresponding probability transfer matrix M, according to PageRank formula V ', calculate each data block PageRank value, Construction of A Model are completed;According to the PageRank value of each data block, simulated (by the high N of PageRank value A data block identifier comes out, it is assumed that they are called in memory, is counted in Δ t1, labeled N number of data block access times n1 And the total data block number n2 that uses of computer), then hit rate p=n1/n2 in Δ t1;
Step 5: judging whether hit rate is greater than the S5 stage of P: the hit rate p for judging that the S4 stage calculates is not greater than me Preset hit rate P, if being greater than, will model put into actual production in;Otherwise continue to improve the model of initial phase;
Step 6: maximally related N number of data block to be called in the S6 stage of caching according to data model: by PageRank value High N (allowing data block number according to memory) a data block calls in caching (only calling in the data block not having in current cache);
Step 7: the S7 stage of data processing: the phase data computer disposal stage, continuing statistical number in this stage According to the service condition of block, relational matrix A is updated;
Step 8: judge data that computer uses whether the S8 stage in the buffer: computer is in data handling procedure In, whether in the buffer data are judged when reading data, if in the buffer, computer just reads number from the data block of caching According to;Otherwise but block occurs and interrupts the execution S11 stage;
Step 9: judging whether the S9 stage to the time of preservation model: judge whether the time to preservation model, if Time is up, preservation or more new model;Otherwise continue data processing;
Step 10: saving or updating the S10 stage of data model: the data model at this moment being saved or updated and arrives disk On, model is loaded directly into when restarting so as to computer;
Step 11: whether the judgement number that but block interrupts is greater than the S11 stage of the value of initializing set: judgement is but in block Whether disconnected number MBIs is greater than the value C of initializing set, if being greater than, CulBlock data block is called in memory first, together One moment carried out model modification;Otherwise, CulBlock data block need to only be called in memory, record calls in the data block of caching, Scarce block interruption times MIBs is added 1 simultaneously.
The embodiments of the present invention have been described in detail above, but content is only the preferred embodiment of the present invention, It should not be considered as limiting the scope of the invention.Any changes and modifications in accordance with the scope of the present application, It should still be within the scope of the patent of the present invention.

Claims (1)

1. a kind of data block pre-cache method based on PageRank, characterized by the following steps:
Step 1: judging system with the presence or absence of data model, if there are data model data will be loaded under specified catalogue Model initializes the PageRank value of each data block according to data model, subsequently into step 6;Otherwise two are entered step;
Step 2: parameters required for initialization data pattern;
Step 3: the case where statistical data block uses within the Δ t time generates number according to the precedence relationship that data block is accessed According to the relational matrix A between block, formula are as follows:
Parameter in formula: i=j=n, ak,qIndicate that, when data block k is accessed, subsequent time accesses the number of data block q, and k= A when qk,q=0, ak,qFor the relationship degree of data block k and data block q, n indicates the total data block number that computer uses;
Step 4: by above-mentioned relation matrix A, generating probability transfer matrix M is calculated each according to PageRank formula V ' The PageRank value of data block, Construction of A Model are completed;It according to the PageRank value of each data block, is simulated, specific mould Quasi- process are as follows: the high N number of data block identifier of PageRank value is come out, and this N number of data block is called in into caching, counts Δ t1 Total data block number n interior, that labeled N number of data block access times n1 and computer use, then hit rate p=in Δ t1 n1/n;
V ' is indicated are as follows:
V'=α MV+ (1- α) e
Parameter in formula: α indicates that computer directly uses the probability of data block in the buffer, and 1- α indicates that computer occurs to lack in block The disconnected probability that data block is used from disk, M indicate that the probability transfer matrix of relational matrix A, V indicate in each iteration PageRank value, andN in element indicates the total data block number that computer uses;
Step 5: judging whether above-mentioned hit rate p is greater than preset hit rate P, if being greater than, model is put into actual production In;Otherwise step 3 is executed;
Step 6: the high N number of data block of PageRank value is called in caching;
Step 7: the service condition of statistical data block, updates relational matrix A;
Step 8: in data processing, whether in the buffer data are judged when reading data, if in the buffer, computer Just data are read from the data block of caching;Otherwise occur to lack block interruption, execute step 11;
Step 9: judge whether the time to preservation model, and if time is up, preservation or more new model;It is no to then follow the steps Seven;
Step 10: the data model at this moment is saved or is updated onto disk, it is loaded directly into when restarting so as to computer Model;
Step 11: the value C whether the number MBIs that judgement lacks block interruption is greater than initializing set will work as first if being greater than The preceding data block accessed calls in caching, and synchronization carries out model modification;Otherwise, the data block tune that will be currently accessing Enter to caching, record calls in the data block of caching, while scarce block interruption times MIBs is added 1.
CN201610227750.1A 2016-04-12 2016-04-12 A kind of data block pre-cache method based on PageRank Expired - Fee Related CN105930096B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610227750.1A CN105930096B (en) 2016-04-12 2016-04-12 A kind of data block pre-cache method based on PageRank

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610227750.1A CN105930096B (en) 2016-04-12 2016-04-12 A kind of data block pre-cache method based on PageRank

Publications (2)

Publication Number Publication Date
CN105930096A CN105930096A (en) 2016-09-07
CN105930096B true CN105930096B (en) 2019-01-11

Family

ID=56839005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610227750.1A Expired - Fee Related CN105930096B (en) 2016-04-12 2016-04-12 A kind of data block pre-cache method based on PageRank

Country Status (1)

Country Link
CN (1) CN105930096B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952110B (en) * 2023-03-09 2023-06-06 浪潮电子信息产业股份有限公司 Data caching method, device, equipment and computer readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1932817A (en) * 2006-09-15 2007-03-21 陈远 Common interconnection network content keyword interactive system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9734063B2 (en) * 2014-02-27 2017-08-15 École Polytechnique Fédérale De Lausanne (Epfl) Scale-out non-uniform memory access

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1932817A (en) * 2006-09-15 2007-03-21 陈远 Common interconnection network content keyword interactive system

Also Published As

Publication number Publication date
CN105930096A (en) 2016-09-07

Similar Documents

Publication Publication Date Title
CN103617131B (en) Data caching achieving method
CN104834675B (en) Query performance optimization method based on user behavior analysis
US8825959B1 (en) Method and apparatus for using data access time prediction for improving data buffering policies
CN107197053A (en) A kind of load-balancing method and device
Qu et al. A dynamic replica strategy based on Markov model for hadoop distributed file system (HDFS)
CN108595254B (en) Query scheduling method
CA2776127A1 (en) Data security for a database in a multi-nodal environment
CN108509723A (en) LRU Cache based on artificial neural network prefetch mechanism performance income evaluation method
CN103425564B (en) A kind of smartphone software uses Forecasting Methodology
CN101989236A (en) Method for realizing instruction buffer lock
CN109471872A (en) Handle the method and device of high concurrent inquiry request
CN112463189A (en) Distributed deep learning multi-step delay updating method based on communication operation sparsification
CN108415766B (en) Rendering task dynamic scheduling method
CN106201839A (en) The information loading method of a kind of business object and device
CN105930096B (en) A kind of data block pre-cache method based on PageRank
CN107180118A (en) A kind of file system cache data managing method and device
CN105654120B (en) A kind of software load feature extracting method based on SOM and K-means two-phase analyzing method
Zhang et al. Joint optimization of multi-user computing offloading and service caching in mobile edge computing
Kvet Dangling predicates and function call optimization in the oracle database
US11030194B2 (en) Demand-driven dynamic aggregate
CN100422996C (en) Method for updating low-loading picture based on database
CN103544302A (en) Partition maintenance method and device of database
CN101996246A (en) Method and system for instant indexing
CN113360576A (en) Power grid mass data real-time processing method and device based on Flink Streaming
WO2020114155A1 (en) Subgrade compaction construction data efficient processing system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190111

CF01 Termination of patent right due to non-payment of annual fee