CN105373347B - A kind of hot spot data identification of storage system and dispatching method and system - Google Patents
A kind of hot spot data identification of storage system and dispatching method and system Download PDFInfo
- Publication number
- CN105373347B CN105373347B CN201510696498.4A CN201510696498A CN105373347B CN 105373347 B CN105373347 B CN 105373347B CN 201510696498 A CN201510696498 A CN 201510696498A CN 105373347 B CN105373347 B CN 105373347B
- Authority
- CN
- China
- Prior art keywords
- hot spot
- data
- spot data
- time
- scheduling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003860 storage Methods 0.000 title claims abstract description 60
- 238000000034 method Methods 0.000 title claims abstract description 14
- 238000012544 monitoring process Methods 0.000 claims abstract description 54
- 230000006870 function Effects 0.000 claims description 48
- 238000012217 deletion Methods 0.000 claims description 7
- 230000037430 deletion Effects 0.000 claims description 7
- 241001269238 Data Species 0.000 claims description 6
- 230000003252 repetitive effect Effects 0.000 claims description 5
- 230000003111 delayed effect Effects 0.000 claims description 4
- 238000011161 development Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013508 migration Methods 0.000 description 3
- 230000005012 migration Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000013500 data storage Methods 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000002567 autonomic effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 230000001839 systemic circulation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0647—Migration mechanisms
- G06F3/0649—Lifecycle management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Hot spot data identification and dispatching method and system the invention discloses a kind of storage system, including:Collector collects the access information of data object in storage system, and the access information of the data object is sent to monitoring data library;Scheduler obtains the access information of data object from the monitoring data library, and hot spot data is identified, and will be in the high-speed memory of hot spot data write-in storage system according to the access information of the data object;The high-speed memory sets the expired time of hot spot data respectively, and scheduler carries out the scheduling of hot spot data according to the expired time.It is of the invention effectively to solve the bottleneck problem that hot spot data accesses.
Description
Technical field
The present invention relates to memory system technologies field, the identification of the hot spot data of espespecially a kind of storage system and dispatching method and
System.
Background technology
With the continuous development of Internet technology, all kinds of unstructured datas, such as picture, audio and video, text information be in
Reveal the trend of explosive increase.Meanwhile for the access feature of data, there is also the difference on visiting frequency, no
With data between be divided into " cold and hot ", if the same data slot of multiple " hot spot " data referencings, the data slot
" temperature " will be each " hot spot " data " temperature " adduction.Understand that the data of this kind of hot spot can become storage system and access
Bottleneck.
It identifies " cold and hot " data block, Data Migration is carried out according to strategy, by the relatively low data Autonomic Migration Framework of access frequency
To on the Data Migration frequently accessed to high performance storage hierarchy, be current distribution onto low speed, inexpensive accumulation layer
A storage system problem urgently to be resolved hurrily.
Invention content
Hot spot data identification and dispatching method in order to solve the above technical problem, the present invention provides a kind of storage system
And system, it can effectively solve the problem that the bottleneck problem that hot spot data accesses.
In order to reach the object of the invention, hot spot data identification and dispatching method the present invention provides a kind of storage system,
It is characterised in that it includes:Collector collects the access information of data object in storage system, and by the access of the data object
Information is sent to monitoring data library;Scheduler obtains the access information of data object from the monitoring data library, according to the number
Hot spot data is identified according to the access information of object, and will be in the high-speed memory of hot spot data write-in storage system;It is described
High-speed memory sets the expired time of hot spot data respectively, and scheduler carries out the tune of hot spot data according to the expired time
Degree.
Further, the collector collects the access information of data object in storage system, specially:
Collector obtains the access information of the data object of history by monitoring data library, is obtained by message queue real-time
Data object access information, wherein the access information of the real-time data object of the message queue is ordered since monitor
The real-time system monitoring data read.
Further, it is described that hot spot data is identified according to the access information of the data object, specially:According to the number
According to the access information of object, the discriminant function based on hot spot data identifies hot spot data, and the discriminant function of the hot spot data is:
Get_topk (db, K, time);
Wherein, db is monitoring data library, and K is access times ranking, and time is the initial time for calculating access times;It is described
Discriminant function represents to obtain in the db of monitoring data library in from time time point to current time, the heat of K before access times ranking
Point data, the return value of function are the set of the hot spot data of K before access times ranking.
Further, the high-speed memory sets the expired time of hot spot data respectively, specially:It divides time into
Time slot TIMESLOT, a TIMESLOT are a period for performing scheduling;To the hot spot data point into store through cache
The expired time of acquiescence is not set, and the expired time of the acquiescence adds T TIMESLOT for current time;Each TIMESLOT
The repetitive rate of middle data is λ (1 >=λ >=0), then after T TIMESLOT, the hot spot data quantity possessed in high-speed memory is
(1-λ)×T×K;To carrying the data object of n times access times, it is a that expired time can be delayed to l (l=f (n), l ∈ N)
TIMESLOT。
Further, the scheduler carries out the scheduling of hot spot data according to the expired time, specially:The scheduling
Device carries out the scheduling of hot spot data based on scheduling function, the scheduling function is according to the expired time:
Get_swapout (schedule, count);
Wherein, schedule is the hot spot data table in high-speed memory, and count is to need to dispatch out high-speed memory
The counting of hot spot data;The scheduling function represents to obtain count in the hot spot data table schedule of high-speed memory
Hot spot data is scheduled out high-speed memory, and the return value of function is the set of the count hot spot datas that are scheduled out.
Further, the scheduling that hot spot data is carried out based on scheduling function, specially:Initialize high-speed memory,
Schedule set in high-speed memory is emptied, and calculates the open ended maximum hot spot data number of storage system, is denoted as
volume;Record current time time;The row of access in a upper TIMESLOT is obtained from monitoring data library by discriminant function
The hot spot data set of K, is denoted as volunteer before name, and the expired time of hot spot data should be in the hot spot data set
Time+ [T+f (n)] × TIMESLOT, wherein T are the retardation time of acquiescence, and n is the access times of data;It takes out
Part in volunteer and not in schedule is denoted as insert set, the insert collection is merged into schedule
Set;The intersection of volunteer and schedule is taken out, is denoted as refresh set, the hot spot during the refresh is gathered
Data carry out the postponement of expired time;Refreshing in schedule needs to carry out the newer hot spot data of expired time, by described in more
The expired time of new hot spot data is postponed according to access times;Data expired in schedule are counted, are put into deletion
Delete gathers;It calculates after completing the addition of new data and the deletion of stale data, if the length of schedule tables is more than magnetic
Disk capacity volume, then carry out the scheduling of hot spot data, it would be desirable to which the hot spot data to swap out is incorporated to swapout set;It will
After hot spot data in delete set is deleted from high-speed memory, then hot spot data during delete is gathered from
It is deleted in schedule;After hot spot data during insert is gathered and store through cache, then the heat during insert is gathered
Point data is incorporated in schedule;The scheduling of a hot spot data is performed every TIMESLOT.
The present invention also provides a kind of hot spot data identification for storage system and system is dispatched, including:Collector is used
Monitoring data is sent in the access information for collecting data object in storage system, and by the access information of the data object
Library;Scheduler for obtaining the access information of data object from the monitoring data library, is believed according to the access of the data object
Breath identification hot spot data, and will be in the high-speed memory of hot spot data write-in storage system;The high-speed memory, is used for
The expired time of hot spot data is set respectively, and scheduler carries out the scheduling of hot spot data according to the expired time.
Further, the system also includes:Monitor and message queue;The collector is obtained by monitoring data library
The access information of the data object of history obtains the access information of real-time data object by message queue, wherein described disappear
The access information for ceasing the real-time data object of queue carrys out the real-time system monitoring data subscribed to since monitor.
Further, discriminant function identification hot spot data of the scheduler based on hot spot data, the hot spot data
Discriminant function is:
Get_topk (db, K, time);
Wherein, db is monitoring data library, and K is access times ranking, and time is the initial time for calculating access times;It is described
Discriminant function represents to obtain in the db of monitoring data library in from time time point to current time, the heat of K before access times ranking
Point data, the return value of function are the set of the hot spot data of K before access times ranking.
Further, it is one that the high-speed memory, which divides time into time slot TIMESLOT, a TIMESLOT,
Perform the period of scheduling;The expired time of acquiescence, the mistake of the acquiescence are set respectively to the hot spot data into store through cache
Time phase adds T TIMESLOT for current time;The repetitive rate of data is λ (1 >=λ >=0) in each TIMESLOT, then T
After a TIMESLOT, the hot spot data quantity possessed in high-speed memory is (1- λ) × T × K;To carrying n times access times
Expired time can be delayed l (l=f (n), l ∈ N) a TIMESLOT by data object.
Further, the scheduler carries out the scheduling of hot spot data based on scheduling function, and the scheduling function is:
Get_swapout (schedule, count);
Wherein, schedule is the hot spot data table in high-speed memory, and count is to need to dispatch out high-speed memory
The counting of hot spot data;The scheduling function represents to obtain count in the hot spot data table schedule of high-speed memory
Hot spot data is scheduled out high-speed memory, and the return value of function is the set of the count hot spot datas that are scheduled out.
Further, the high-speed memory carries out the scheduling of hot spot data based on scheduling function, specially:Initialization is high
Fast memory empties the schedule set in high-speed memory, and calculates the open ended maximum hot spot data of storage system
Number is denoted as volume;Record current time time;A upper TIMESLOT is obtained from monitoring data library by discriminant function
The hot spot data set of K before middle access ranking, is denoted as volunteer, in the hot spot data set hot spot data it is expired when
Between should be time+ [T+f (n)] × TIMESLOT, wherein T is the retardation time of acquiescence, and n is the access times of data;It takes out
Part in volunteer and not in schedule is denoted as insert set, the insert collection is merged into schedule
Set;The intersection of volunteer and schedule is taken out, is denoted as refresh set, the hot spot during the refresh is gathered
Data carry out the postponement of expired time;Refreshing in schedule needs to carry out the newer hot spot data of expired time, by described in more
The expired time of new hot spot data is postponed according to access times;Data expired in schedule are counted, are put into deletion
Delete gathers;It calculates after completing the addition of new data and the deletion of stale data, if the length of schedule tables is more than magnetic
Disk capacity volume, then carry out the scheduling of hot spot data, it would be desirable to which the hot spot data to swap out is incorporated to swapout set;It will
After hot spot data in delete set is deleted from high-speed memory, then hot spot data during delete is gathered from
It is deleted in schedule;After hot spot data during insert is gathered and store through cache, then the heat during insert is gathered
Point data is incorporated in schedule;The scheduling of a hot spot data is performed every TIMESLOT.
Further, the monitor is realized using Python Tornado frames.
Further, the message queue and monitoring data Cooley realize that schedule tables are deposited using Redis with Redis
The ordered set of storage system is realized.
Further, the collector collects the visit of data object in storage system using the monitoring interface in storage system
Ask information.
Compared with prior art, hot spot data is sent into high-speed memory, and tie up by the present invention by the statistics of visiting frequency
Its expired time is protected, and complex optimum is carried out in terms of hot spot data differentiation, hot spot data maintenance, the displacement three of data, is had
Effect solves the bottleneck problem of hot spot data access.In addition, K before access times ranking is obtained in the period in monitoring data library
Data slot, this part hot spot data can set corresponding expired time according to its access times;When will organize the time to become
Between slot, within each dispatching cycle, its expired time is extended to the data repeatedly accessed;It has been got in hot spot data dispatch list
Time phase earliest data acquisition system swaps out, and achievees the purpose that memory space recycles, optimizes the overall performance of storage, so as to
The development of mass data storage system structure is pushed.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification
It obtains it is clear that being understood by implementing the present invention.The purpose of the present invention and other advantages can be by specification, rights
Specifically noted structure is realized and is obtained in claim and attached drawing.
Description of the drawings
Attached drawing is used for providing further understanding technical solution of the present invention, and a part for constitution instruction, with this
The embodiment of application technical solution for explaining the present invention together, does not form the limitation to technical solution of the present invention.
Fig. 1 is to show in a kind of embodiment of the present invention for the system architecture of hot spot data identification and the scheduling of storage system
It is intended to.
Fig. 2 is the flow signal of the method for hot spot data identification and the scheduling of storage system in a kind of embodiment of the invention
Figure.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention
Embodiment be described in detail.It should be noted that in the absence of conflict, in the embodiment and embodiment in the application
Feature mutually can arbitrarily combine.
Step shown in the flowchart of the accompanying drawings can be in the computer system of such as a group of computer-executable instructions
It performs.Also, although logical order is shown in flow charts, it in some cases, can be to be different from herein suitable
Sequence performs shown or described step.
Fig. 1 is to show in a kind of embodiment of the present invention for the system architecture of hot spot data identification and the scheduling of storage system
It is intended to.As shown in Figure 1, including:
Monitor:A substantially Web server, it is main to handle the front end request from the user to monitoring data.
On the one hand the data of monitor come from monitoring data library, inquiry of the main processing to historical information;On the other hand message is come from
Queue, inquiry of the main processing to real-time monitoring data.Monitor can utilize Python Tornado frames to realize.
Message queue and monitoring data library:It is realized using Redis, Redis is that a use ANSI C language increased income is compiled
Write, support network, can based on memory also can persistence log type high performance Key-Value storages, and provide high performance
Message Queuing Services are mainly used to preserve the historical data of monitoring and forward real-time monitoring data.
Collector:Using the monitoring interface in storage system, such as Recon components and StatsD, to the performance number of system
According to being collected and monitoring data library be written, and to the real-time system monitoring data that message queue publication monitor is subscribed to.
Scheduler:By reading the historical performance and access information of system in monitoring data library, dsc data segment is identified, really
Determine scheduling strategy, and the scheduling of control command progress data is sent to storage system.
Based on framework shown in FIG. 1, the present invention also provides a kind of hot spot data identification of storage system and the sides of scheduling
Method, as shown in Fig. 2, including:
Step 201, collector collects the access information of data object in storage system, and the access of the data object is believed
Breath is sent to monitoring data library.
In this step, collector obtains the access information of the data object of history by monitoring data library, passes through message
Queue obtains the access information of real-time data object, the access information of the wherein real-time data object of message queue come since
The real-time system monitoring data that monitor is subscribed to.
Step 202, scheduler obtains the access information of data object from monitoring data library, is believed according to the access of data object
Breath identifies hot spot data, and hot spot data is put into the high-speed memory of storage system.
In this step, the discriminant function of hot spot data is as follows:
Get_topk (db, K, time)
Wherein, db is monitoring data library, and K is access times ranking, and time is the initial time for calculating access times.This is sentenced
Other function representation:It is obtained out of time this periods till now at time point in the db of monitoring data library, K before access times ranking
Hot spot data, the return value of function is the set of these hot spot datas.
Step 203, high-speed memory sets the expired time of hot spot data respectively, and scheduler carries out heat according to expired time
The scheduling of point data.
In this step, the scheduling of hot spot data is carried out by expired time.In the tissue of time, divide time into
Timeslice, also referred to as time slot, are expressed as TIMESLOT.One TIMESLOT is a period for performing scheduling.For entering
The hot spot data of high-speed memory, can set the expired time of an acquiescence, and the expired time of the acquiescence adds T for current time
A TIMESLOT.The setting of T and K is related to the utilization rate of high speed storing, and simple hypothesis hot spot data expired time is not updated,
And the repetitive rate for assuming data in each TIMESLOT is λ (1 >=λ >=0), then possesses in system after T TIMESLOT
Data slot quantity is (1- λ) × T × K.To carrying the object of n times access times, expired time can be delayed to l (l=f (n), l
∈ N) a TIMESLOT.
The scheduling of hot spot data is based on scheduling function:
Get_swapout (schedule, count)
Wherein, schedule is the hot spot data table in high-speed memory, and count is to need to dispatch out high-speed memory
The counting of hot spot data.The scheduling function represents:Count hot spot data is obtained in hot spot data table schedule to be scheduled
Go out high-speed memory, the return value of function is the set of hot spot data of being scheduled out.The acquisition of the set is by from schedule
Count earliest hot spot data of expired time is obtained in table.
The specific algorithm of the scheduling function is as follows:
Wherein, 1-3 rows, expression are prepared work, initialize high-speed memory first, then put schedule set
Sky, the last open ended maximum data segment number of computing system, is denoted as volume.
4-25 rows represent systemic circulation of the scheduler program as service progress, and Wait (TIMESLOT) represents the tune in 25 rows
Journey is spent to perform once, specifically every TIMESLOT:
5 rows represent record current time time.
6 rows, represent to obtain from database by get_topk functions K before ranking are accessed in a upper TIMESLOT
Data acquisition system is denoted as volunteer, notices that the expired time of data in the set should be time+ [T+f (n)] × TIMESLOT,
T is the retardation time of acquiescence, and n is the access times of data.
7 rows, represent take out volunteer set in and not schedule set in part, be denoted as insert collection
It closes, which needs to be incorporated to schedule set.
8 rows represent to take out volunteer and schedule intersection of sets collection, are denoted as refresh and gather, in the set
Data need to carry out the postponement of expired time.
9-10 rows, representing to refresh in schedule needs to carry out expired time newer part, this partial data it is expired
Time can postpone several TIMESLOT according to its access times.
11-14 rows represent data expired in statistics schedule tables, are put into and delete set delete.
15-18 rows, represent calculate complete new data addition and stale data deletion after whether the length of schedule tables
Degree can be more than disk size volume, if it exceeds progress data is then needed to swap out.The data for needing to swap out are incorporated to swapout
Set.
19-21 rows after representing that the data during delete is gathered are deleted from high-speed memory, then delete are gathered
Data from schedule set in delete.
22-24 rows, after representing the data push-in high-speed memory during insert is gathered, then the number during insert is gathered
Gather according to schedule is incorporated to.
The management of high-speed processing apparatus is needed to complete jointly by collector, scheduler, monitoring data library, relational graph
Shown in 1.Collector collects the access information of object, these access informations are provided by the monitor of agent node, then by these
Information write-in monitoring data library, scheduler completes identification and the scheduling decision of hot spot data after obtaining data from monitoring data library after
Traffic order is sent to high speed storing layer.
Dispatching algorithm needs to consider the problems of schedule list data structure and operation when realizing.Schedule tables
" ordered set " for being achieved with Redis storage systems realizes that the member of each ordered set is associated with a scoring, this
Member in ordered set for being assigned to best result arrangement by a scoring by minimum, for mistake of the scoring for data in this algorithm
Time phase.The key operation list of schedule tables and in Redis it is corresponding operation and time complexity it is as shown in table 1.
Table 1
The present invention provides hot spot data identification and dispatching method and system in a kind of storage system, passes through visiting frequency
Statistics, hot spot data is sent into high-speed memory, and safeguard its expired time, and from hot spot data differentiate, hot spot data safeguard,
Three aspects of displacement of data carry out complex optimum, efficiently solve the bottleneck problem of hot spot data access.In addition, in monitoring number
According to the data slot that K before access times ranking in the period is obtained in library, this part hot spot data can be set according to its access times
Put corresponding expired time;It will organize time to become time slot, within each dispatching cycle, it is extended to the data repeatedly accessed
Expired time;The earliest data acquisition system of expired time is obtained in hot spot data dispatch list to swap out, and is reached memory space and is returned
The purpose of receipts optimizes the overall performance of storage, so as to push the development of mass data storage system structure.
It merits attention, the hot spot data identification proposed in the present invention and dispatching algorithm are also pervasive in all kinds of distributed storages
System.Therefore the present invention has very high technological value and practical value in the practice of large-scale distributed object storage system.
Although disclosed herein embodiment as above, the content only for ease of understanding the present invention and use
Embodiment is not limited to the present invention.Technical staff in any fields of the present invention is taken off not departing from the present invention
Under the premise of the spirit and scope of dew, any modification and variation, but the present invention can be carried out in the form and details of implementation
Scope of patent protection, still should be subject to the scope of the claims as defined in the appended claims.
Claims (13)
1. a kind of hot spot data identification of storage system and dispatching method, which is characterized in that including:
Collector collects the access information of data object in storage system, and the access information of the data object is sent to prison
Control database;
Scheduler obtains the access information of data object from the monitoring data library, is known according to the access information of the data object
Other hot spot data, and will be in the high-speed memory of hot spot data write-in storage system;
The high-speed memory sets the expired time of hot spot data respectively, and scheduler carries out hot spot number according to the expired time
According to scheduling;
Wherein, the collector collects the access information of data object in storage system, specially:
Collector obtains the access information of the data object of history by monitoring data library, and number in real time is obtained by message queue
According to the access information of object, wherein the access information of the real-time data object of the message queue comes what is subscribed to since monitor
Real-time system monitoring data.
2. the hot spot data identification of storage system according to claim 1 and dispatching method, which is characterized in that the basis
The access information identification hot spot data of the data object, specially:
According to the access information of the data object, the discriminant function identification hot spot data based on hot spot data, the hot spot number
According to discriminant function be:
Get_topk (db, K, time);
Wherein, db is monitoring data library, and K is access times ranking, and time is the initial time for calculating access times;
The discriminant function represents to obtain in the db of monitoring data library in from time time point to current time, access times row
The hot spot data of K before name, the return value of function are the set of the hot spot data of K before access times ranking.
3. the hot spot data identification of storage system according to claim 2 and dispatching method, which is characterized in that the high speed
Memory sets the expired time of hot spot data respectively, specially:
It is a period for performing scheduling to divide time into time slot TIMESLOT, a TIMESLOT;
The expired time of acquiescence is set respectively to the hot spot data into store through cache, and the expired time of the acquiescence is current
Time adds T TIMESLOT;
The repetitive rate of data is λ in each TIMESLOT, wherein 1 >=λ >=0, then after T TIMESLOT, in high-speed memory
The hot spot data quantity possessed is (1- λ) × T × K;
To carrying the data object of n times access times, expired time can be delayed to the ∈ N of 1 TIMESLOT, wherein 1=f (n), 1.
4. the hot spot data identification of storage system according to claim 3 and dispatching method, which is characterized in that the scheduling
Device carries out the scheduling of hot spot data according to the expired time, specially:
The scheduler carries out the scheduling of hot spot data based on scheduling function, the scheduling function is according to the expired time:
Get_swapout (schedule, count);
Wherein, schedule is the hot spot data table in high-speed memory, and count is the hot spot for needing to dispatch out high-speed memory
The counting of data;
The scheduling function represents to obtain count hot spot data quilt in the hot spot data table schedule of high-speed memory
High-speed memory is dispatched out, the return value of function is the set of the count hot spot datas that are scheduled out.
5. the hot spot data identification of storage system according to claim 4 and dispatching method, which is characterized in that described to be based on
Scheduling function carries out the scheduling of hot spot data, specially:
High-speed memory is initialized, the schedule set in high-speed memory is emptied, and it is open ended to calculate storage system
Maximum hot spot data number, is denoted as volume;
Record current time time;
The hot spot data set of K before access ranking in a upper TIMESLOT is obtained from monitoring data library by discriminant function,
Be denoted as volunteer, in the hot spot data set expired time of hot spot data should be time+ [T+f (n)] ×
TIMESLOT, wherein n are the access times of data;
The part in volunteer and not in schedule is taken out, insert set is denoted as, the insert collection is merged
Enter schedule set;
The intersection of volunteer and schedule is taken out, is denoted as refresh set, the hot spot number during the refresh is gathered
According to the postponement for carrying out expired time;
Refresh schedule in need carry out the newer hot spot data of expired time, by the newer hot spot data it is expired when
Between postponed according to access times;
Data expired in schedule are counted, is put into and deletes delete set;
It calculates after completing the addition of new data and the deletion of stale data, if the length of schedule tables is more than disk size
Volume then carries out the scheduling of hot spot data, it would be desirable to which the hot spot data to swap out is incorporated to swapout set;
After hot spot data during delete is gathered is deleted from high-speed memory, then hot spot data during delete is gathered from
It is deleted in schedule;
After hot spot data during insert is gathered and store through cache, then hot spot data during insert is gathered is incorporated to
In schedule;
The scheduling of a hot spot data is performed every TIMESLOT.
It is 6. a kind of for the hot spot data identification of storage system and scheduling system, which is characterized in that including:
Collector for collecting the access information of data object in storage system, and the access information of the data object is sent out
Give monitoring data library;
Scheduler, for obtaining the access information of data object from the monitoring data library, according to the access of the data object
Information identifies hot spot data, and will be in the high-speed memory of hot spot data write-in storage system;
The high-speed memory, for setting the expired time of hot spot data respectively, scheduler is carried out according to the expired time
The scheduling of hot spot data;
Wherein, the system also includes:Monitor and message queue;
The collector obtains the access information of the data object of history by monitoring data library, is obtained by message queue real-time
Data object access information, wherein the access information of the real-time data object of the message queue is ordered since monitor
The real-time system monitoring data read.
It is 7. according to claim 6 for the hot spot data identification of storage system and scheduling system, which is characterized in that described
Discriminant function identification hot spot data of the scheduler based on hot spot data, the discriminant function of the hot spot data are:
Get_topk (db, K, time);
Wherein, db is monitoring data library, and K is access times ranking, and time is the initial time for calculating access times;
The discriminant function represents to obtain in the db of monitoring data library in from time time point to current time, access times row
The hot spot data of K before name, the return value of function are the set of the hot spot data of K before access times ranking.
It is 8. according to claim 7 for the hot spot data identification of storage system and scheduling system, which is characterized in that described
It is a period for performing scheduling that high-speed memory, which divides time into time slot TIMESLOT, a TIMESLOT,;To entering
The hot spot data of high-speed memory sets the expired time of acquiescence respectively, and the expired time of the acquiescence adds T for current time
A TIMESLOT;The repetitive rate of data is λ in each TIMESLOT, wherein 1 >=λ >=0, then after T TIMESLOT, deposit at a high speed
The hot spot data quantity possessed in reservoir is (1- λ) × T × K;It, can be when will be expired to carrying the data object of n times access times
Between delay the ∈ N of 1 TIMESLOT, wherein 1=f (n), 1.
It is 9. according to claim 8 for the hot spot data identification of storage system and scheduling system, which is characterized in that described
Scheduler carries out the scheduling of hot spot data based on scheduling function, and the scheduling function is:
Get_swapout (schedule, count);
Wherein, schedule is the hot spot data table in high-speed memory, and count is the hot spot for needing to dispatch out high-speed memory
The counting of data;
The scheduling function represents to obtain count hot spot data quilt in the hot spot data table schedule of high-speed memory
High-speed memory is dispatched out, the return value of function is the set of the count hot spot datas that are scheduled out.
It is 10. according to claim 9 for the hot spot data identification of storage system and scheduling system, which is characterized in that institute
The scheduling that high-speed memory carries out hot spot data based on scheduling function is stated, specially:
High-speed memory is initialized, the schedule set in high-speed memory is emptied, and it is open ended to calculate storage system
Maximum hot spot data number, is denoted as volume;
Record current time time;
The hot spot data set of K before access ranking in a upper TIMESLOT is obtained from monitoring data library by discriminant function,
Be denoted as volunteer, in the hot spot data set expired time of hot spot data should be time+ [T+f (n)] ×
TIMESLOT, wherein n are the access times of data;
The part in volunteer and not in schedule is taken out, insert set is denoted as, the insert collection is merged
Enter schedule set;
The intersection of volunteer and schedule is taken out, is denoted as refresh set, the hot spot number during the refresh is gathered
According to the postponement for carrying out expired time;
Refresh schedule in need carry out the newer hot spot data of expired time, by the newer hot spot data it is expired when
Between postponed according to access times;
Data expired in schedule are counted, is put into and deletes delete set;
It calculates after completing the addition of new data and the deletion of stale data, if the length of schedule tables is more than disk size
Volume then carries out the scheduling of hot spot data, it would be desirable to which the hot spot data to swap out is incorporated to swapout set;
After hot spot data during delete is gathered is deleted from high-speed memory, then hot spot data during delete is gathered from
It is deleted in schedule;
After hot spot data during insert is gathered and store through cache, then hot spot data during insert is gathered is incorporated to
In schedule;
The scheduling of a hot spot data is performed every TIMESLOT.
It is 11. according to claim 6 for the hot spot data identification of storage system and scheduling system, which is characterized in that institute
Monitor is stated to realize using Python Tornado frames.
It is 12. according to claim 9 for the hot spot data identification of storage system and scheduling system, which is characterized in that institute
It states message queue and monitoring data Cooley is realized with Redis, schedule tables are real using the ordered set of Redis storage systems
It is existing.
It is 13. according to claim 6 for the hot spot data identification of storage system and scheduling system, which is characterized in that institute
State the access information that collector collects data object in storage system using the monitoring interface in storage system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510696498.4A CN105373347B (en) | 2015-10-23 | 2015-10-23 | A kind of hot spot data identification of storage system and dispatching method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510696498.4A CN105373347B (en) | 2015-10-23 | 2015-10-23 | A kind of hot spot data identification of storage system and dispatching method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105373347A CN105373347A (en) | 2016-03-02 |
CN105373347B true CN105373347B (en) | 2018-06-29 |
Family
ID=55375581
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510696498.4A Active CN105373347B (en) | 2015-10-23 | 2015-10-23 | A kind of hot spot data identification of storage system and dispatching method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105373347B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105912703A (en) * | 2016-04-26 | 2016-08-31 | 北京百度网讯科技有限公司 | Data storage method and data query method and device |
CN108156193B (en) * | 2016-12-02 | 2022-08-19 | 阿里巴巴集团控股有限公司 | Hotspot determination method and system |
CN106682202B (en) * | 2016-12-29 | 2020-01-10 | 北京奇艺世纪科技有限公司 | Search cache updating method and device |
CN107463514B (en) * | 2017-08-16 | 2021-06-29 | 郑州云海信息技术有限公司 | Data storage method and device |
CN109933575B (en) * | 2019-02-28 | 2021-04-27 | 鲁东大学 | Monitoring data storage method and device |
CN110704466B (en) * | 2019-09-27 | 2021-12-17 | 武汉极意网络科技有限公司 | Black product data storage method and device |
CN111309270B (en) * | 2020-03-13 | 2021-04-27 | 清华大学 | Persistent memory key value storage system |
CN113225396B (en) * | 2021-04-30 | 2022-08-12 | 深圳市腾讯网域计算机网络有限公司 | Hot spot data packet distribution method and device, electronic equipment and medium |
CN113849131B (en) * | 2021-09-28 | 2024-09-06 | 咪咕文化科技有限公司 | Data storage method, device, computing equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103713861A (en) * | 2014-01-09 | 2014-04-09 | 浪潮(北京)电子信息产业有限公司 | File processing method and system based on hierarchical division |
US8868839B1 (en) * | 2011-04-07 | 2014-10-21 | Symantec Corporation | Systems and methods for caching data blocks associated with frequently accessed files |
CN104133909A (en) * | 2014-08-08 | 2014-11-05 | 浪潮电子信息产业股份有限公司 | Multi-layer file system |
CN104834609A (en) * | 2015-05-31 | 2015-08-12 | 上海交通大学 | Multi-level cache method based on historical upgrading and downgrading frequency |
-
2015
- 2015-10-23 CN CN201510696498.4A patent/CN105373347B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8868839B1 (en) * | 2011-04-07 | 2014-10-21 | Symantec Corporation | Systems and methods for caching data blocks associated with frequently accessed files |
CN103713861A (en) * | 2014-01-09 | 2014-04-09 | 浪潮(北京)电子信息产业有限公司 | File processing method and system based on hierarchical division |
CN104133909A (en) * | 2014-08-08 | 2014-11-05 | 浪潮电子信息产业股份有限公司 | Multi-layer file system |
CN104834609A (en) * | 2015-05-31 | 2015-08-12 | 上海交通大学 | Multi-level cache method based on historical upgrading and downgrading frequency |
Also Published As
Publication number | Publication date |
---|---|
CN105373347A (en) | 2016-03-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105373347B (en) | A kind of hot spot data identification of storage system and dispatching method and system | |
US20200089624A1 (en) | Apparatus and method for managing storage of data blocks | |
CN103795781B (en) | A kind of distributed caching method based on file prediction | |
CN110134514A (en) | Expansible memory object storage system based on isomery memory | |
US9996404B2 (en) | Message cache management for message queues | |
CN111737168B (en) | Cache system, cache processing method, device, equipment and medium | |
CN110555001B (en) | Data processing method, device, terminal and medium | |
CN109344092A (en) | A kind of method and system improving cold storing data reading speed | |
CN107103068A (en) | The update method and device of service buffer | |
CN101753439A (en) | Method for distributing and transmitting streaming media | |
CN108932150A (en) | Caching method, device and medium based on SSD and disk mixing storage | |
CN109918450A (en) | Based on the distributed parallel database and storage method under analysis classes scene | |
CN107426318A (en) | It is synchronous to store affined shared content item | |
CN104717247A (en) | Method and system for dynamically scheduling storage resources in cloud storage system | |
JPH10124396A (en) | Buffer exchanging method | |
Otoo et al. | Disk cache replacement algorithm for storage resource managers in data grids | |
CN116185308B (en) | Data set processing method, device, equipment, medium and model training system | |
CN109165096A (en) | The caching of web cluster utilizes system and method | |
Zhou et al. | Improving big data storage performance in hybrid environment | |
CN115408149A (en) | Time sequence storage engine memory design and distribution method and device | |
CN104144194A (en) | Data processing method and device for cloud storage system | |
Lau et al. | Scheduling and data layout policies for a near-line multimedia storage architecture | |
Otoo et al. | Accurate modeling of cache replacement policies in a data grid | |
JP2006085208A (en) | Information life cycle management system and data arrangement determination method therefor | |
CN108243228B (en) | Method for data scheduling and intelligent servo cluster |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |