CN105373347B - A kind of hot spot data identification of storage system and dispatching method and system - Google Patents

A kind of hot spot data identification of storage system and dispatching method and system Download PDF

Info

Publication number
CN105373347B
CN105373347B CN201510696498.4A CN201510696498A CN105373347B CN 105373347 B CN105373347 B CN 105373347B CN 201510696498 A CN201510696498 A CN 201510696498A CN 105373347 B CN105373347 B CN 105373347B
Authority
CN
China
Prior art keywords
hot spot
data
spot data
time
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510696498.4A
Other languages
Chinese (zh)
Other versions
CN105373347A (en
Inventor
赵祯龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201510696498.4A priority Critical patent/CN105373347B/en
Publication of CN105373347A publication Critical patent/CN105373347A/en
Application granted granted Critical
Publication of CN105373347B publication Critical patent/CN105373347B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • G06F3/0649Lifecycle management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Hot spot data identification and dispatching method and system the invention discloses a kind of storage system, including:Collector collects the access information of data object in storage system, and the access information of the data object is sent to monitoring data library;Scheduler obtains the access information of data object from the monitoring data library, and hot spot data is identified, and will be in the high-speed memory of hot spot data write-in storage system according to the access information of the data object;The high-speed memory sets the expired time of hot spot data respectively, and scheduler carries out the scheduling of hot spot data according to the expired time.It is of the invention effectively to solve the bottleneck problem that hot spot data accesses.

Description

A kind of hot spot data identification of storage system and dispatching method and system
Technical field
The present invention relates to memory system technologies field, the identification of the hot spot data of espespecially a kind of storage system and dispatching method and System.
Background technology
With the continuous development of Internet technology, all kinds of unstructured datas, such as picture, audio and video, text information be in Reveal the trend of explosive increase.Meanwhile for the access feature of data, there is also the difference on visiting frequency, no With data between be divided into " cold and hot ", if the same data slot of multiple " hot spot " data referencings, the data slot " temperature " will be each " hot spot " data " temperature " adduction.Understand that the data of this kind of hot spot can become storage system and access Bottleneck.
It identifies " cold and hot " data block, Data Migration is carried out according to strategy, by the relatively low data Autonomic Migration Framework of access frequency To on the Data Migration frequently accessed to high performance storage hierarchy, be current distribution onto low speed, inexpensive accumulation layer A storage system problem urgently to be resolved hurrily.
Invention content
Hot spot data identification and dispatching method in order to solve the above technical problem, the present invention provides a kind of storage system And system, it can effectively solve the problem that the bottleneck problem that hot spot data accesses.
In order to reach the object of the invention, hot spot data identification and dispatching method the present invention provides a kind of storage system, It is characterised in that it includes:Collector collects the access information of data object in storage system, and by the access of the data object Information is sent to monitoring data library;Scheduler obtains the access information of data object from the monitoring data library, according to the number Hot spot data is identified according to the access information of object, and will be in the high-speed memory of hot spot data write-in storage system;It is described High-speed memory sets the expired time of hot spot data respectively, and scheduler carries out the tune of hot spot data according to the expired time Degree.
Further, the collector collects the access information of data object in storage system, specially:
Collector obtains the access information of the data object of history by monitoring data library, is obtained by message queue real-time Data object access information, wherein the access information of the real-time data object of the message queue is ordered since monitor The real-time system monitoring data read.
Further, it is described that hot spot data is identified according to the access information of the data object, specially:According to the number According to the access information of object, the discriminant function based on hot spot data identifies hot spot data, and the discriminant function of the hot spot data is:
Get_topk (db, K, time);
Wherein, db is monitoring data library, and K is access times ranking, and time is the initial time for calculating access times;It is described Discriminant function represents to obtain in the db of monitoring data library in from time time point to current time, the heat of K before access times ranking Point data, the return value of function are the set of the hot spot data of K before access times ranking.
Further, the high-speed memory sets the expired time of hot spot data respectively, specially:It divides time into Time slot TIMESLOT, a TIMESLOT are a period for performing scheduling;To the hot spot data point into store through cache The expired time of acquiescence is not set, and the expired time of the acquiescence adds T TIMESLOT for current time;Each TIMESLOT The repetitive rate of middle data is λ (1 >=λ >=0), then after T TIMESLOT, the hot spot data quantity possessed in high-speed memory is (1-λ)×T×K;To carrying the data object of n times access times, it is a that expired time can be delayed to l (l=f (n), l ∈ N) TIMESLOT。
Further, the scheduler carries out the scheduling of hot spot data according to the expired time, specially:The scheduling Device carries out the scheduling of hot spot data based on scheduling function, the scheduling function is according to the expired time:
Get_swapout (schedule, count);
Wherein, schedule is the hot spot data table in high-speed memory, and count is to need to dispatch out high-speed memory The counting of hot spot data;The scheduling function represents to obtain count in the hot spot data table schedule of high-speed memory Hot spot data is scheduled out high-speed memory, and the return value of function is the set of the count hot spot datas that are scheduled out.
Further, the scheduling that hot spot data is carried out based on scheduling function, specially:Initialize high-speed memory, Schedule set in high-speed memory is emptied, and calculates the open ended maximum hot spot data number of storage system, is denoted as volume;Record current time time;The row of access in a upper TIMESLOT is obtained from monitoring data library by discriminant function The hot spot data set of K, is denoted as volunteer before name, and the expired time of hot spot data should be in the hot spot data set Time+ [T+f (n)] × TIMESLOT, wherein T are the retardation time of acquiescence, and n is the access times of data;It takes out Part in volunteer and not in schedule is denoted as insert set, the insert collection is merged into schedule Set;The intersection of volunteer and schedule is taken out, is denoted as refresh set, the hot spot during the refresh is gathered Data carry out the postponement of expired time;Refreshing in schedule needs to carry out the newer hot spot data of expired time, by described in more The expired time of new hot spot data is postponed according to access times;Data expired in schedule are counted, are put into deletion Delete gathers;It calculates after completing the addition of new data and the deletion of stale data, if the length of schedule tables is more than magnetic Disk capacity volume, then carry out the scheduling of hot spot data, it would be desirable to which the hot spot data to swap out is incorporated to swapout set;It will After hot spot data in delete set is deleted from high-speed memory, then hot spot data during delete is gathered from It is deleted in schedule;After hot spot data during insert is gathered and store through cache, then the heat during insert is gathered Point data is incorporated in schedule;The scheduling of a hot spot data is performed every TIMESLOT.
The present invention also provides a kind of hot spot data identification for storage system and system is dispatched, including:Collector is used Monitoring data is sent in the access information for collecting data object in storage system, and by the access information of the data object Library;Scheduler for obtaining the access information of data object from the monitoring data library, is believed according to the access of the data object Breath identification hot spot data, and will be in the high-speed memory of hot spot data write-in storage system;The high-speed memory, is used for The expired time of hot spot data is set respectively, and scheduler carries out the scheduling of hot spot data according to the expired time.
Further, the system also includes:Monitor and message queue;The collector is obtained by monitoring data library The access information of the data object of history obtains the access information of real-time data object by message queue, wherein described disappear The access information for ceasing the real-time data object of queue carrys out the real-time system monitoring data subscribed to since monitor.
Further, discriminant function identification hot spot data of the scheduler based on hot spot data, the hot spot data Discriminant function is:
Get_topk (db, K, time);
Wherein, db is monitoring data library, and K is access times ranking, and time is the initial time for calculating access times;It is described Discriminant function represents to obtain in the db of monitoring data library in from time time point to current time, the heat of K before access times ranking Point data, the return value of function are the set of the hot spot data of K before access times ranking.
Further, it is one that the high-speed memory, which divides time into time slot TIMESLOT, a TIMESLOT, Perform the period of scheduling;The expired time of acquiescence, the mistake of the acquiescence are set respectively to the hot spot data into store through cache Time phase adds T TIMESLOT for current time;The repetitive rate of data is λ (1 >=λ >=0) in each TIMESLOT, then T After a TIMESLOT, the hot spot data quantity possessed in high-speed memory is (1- λ) × T × K;To carrying n times access times Expired time can be delayed l (l=f (n), l ∈ N) a TIMESLOT by data object.
Further, the scheduler carries out the scheduling of hot spot data based on scheduling function, and the scheduling function is:
Get_swapout (schedule, count);
Wherein, schedule is the hot spot data table in high-speed memory, and count is to need to dispatch out high-speed memory The counting of hot spot data;The scheduling function represents to obtain count in the hot spot data table schedule of high-speed memory Hot spot data is scheduled out high-speed memory, and the return value of function is the set of the count hot spot datas that are scheduled out.
Further, the high-speed memory carries out the scheduling of hot spot data based on scheduling function, specially:Initialization is high Fast memory empties the schedule set in high-speed memory, and calculates the open ended maximum hot spot data of storage system Number is denoted as volume;Record current time time;A upper TIMESLOT is obtained from monitoring data library by discriminant function The hot spot data set of K before middle access ranking, is denoted as volunteer, in the hot spot data set hot spot data it is expired when Between should be time+ [T+f (n)] × TIMESLOT, wherein T is the retardation time of acquiescence, and n is the access times of data;It takes out Part in volunteer and not in schedule is denoted as insert set, the insert collection is merged into schedule Set;The intersection of volunteer and schedule is taken out, is denoted as refresh set, the hot spot during the refresh is gathered Data carry out the postponement of expired time;Refreshing in schedule needs to carry out the newer hot spot data of expired time, by described in more The expired time of new hot spot data is postponed according to access times;Data expired in schedule are counted, are put into deletion Delete gathers;It calculates after completing the addition of new data and the deletion of stale data, if the length of schedule tables is more than magnetic Disk capacity volume, then carry out the scheduling of hot spot data, it would be desirable to which the hot spot data to swap out is incorporated to swapout set;It will After hot spot data in delete set is deleted from high-speed memory, then hot spot data during delete is gathered from It is deleted in schedule;After hot spot data during insert is gathered and store through cache, then the heat during insert is gathered Point data is incorporated in schedule;The scheduling of a hot spot data is performed every TIMESLOT.
Further, the monitor is realized using Python Tornado frames.
Further, the message queue and monitoring data Cooley realize that schedule tables are deposited using Redis with Redis The ordered set of storage system is realized.
Further, the collector collects the visit of data object in storage system using the monitoring interface in storage system Ask information.
Compared with prior art, hot spot data is sent into high-speed memory, and tie up by the present invention by the statistics of visiting frequency Its expired time is protected, and complex optimum is carried out in terms of hot spot data differentiation, hot spot data maintenance, the displacement three of data, is had Effect solves the bottleneck problem of hot spot data access.In addition, K before access times ranking is obtained in the period in monitoring data library Data slot, this part hot spot data can set corresponding expired time according to its access times;When will organize the time to become Between slot, within each dispatching cycle, its expired time is extended to the data repeatedly accessed;It has been got in hot spot data dispatch list Time phase earliest data acquisition system swaps out, and achievees the purpose that memory space recycles, optimizes the overall performance of storage, so as to The development of mass data storage system structure is pushed.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that being understood by implementing the present invention.The purpose of the present invention and other advantages can be by specification, rights Specifically noted structure is realized and is obtained in claim and attached drawing.
Description of the drawings
Attached drawing is used for providing further understanding technical solution of the present invention, and a part for constitution instruction, with this The embodiment of application technical solution for explaining the present invention together, does not form the limitation to technical solution of the present invention.
Fig. 1 is to show in a kind of embodiment of the present invention for the system architecture of hot spot data identification and the scheduling of storage system It is intended to.
Fig. 2 is the flow signal of the method for hot spot data identification and the scheduling of storage system in a kind of embodiment of the invention Figure.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention Embodiment be described in detail.It should be noted that in the absence of conflict, in the embodiment and embodiment in the application Feature mutually can arbitrarily combine.
Step shown in the flowchart of the accompanying drawings can be in the computer system of such as a group of computer-executable instructions It performs.Also, although logical order is shown in flow charts, it in some cases, can be to be different from herein suitable Sequence performs shown or described step.
Fig. 1 is to show in a kind of embodiment of the present invention for the system architecture of hot spot data identification and the scheduling of storage system It is intended to.As shown in Figure 1, including:
Monitor:A substantially Web server, it is main to handle the front end request from the user to monitoring data. On the one hand the data of monitor come from monitoring data library, inquiry of the main processing to historical information;On the other hand message is come from Queue, inquiry of the main processing to real-time monitoring data.Monitor can utilize Python Tornado frames to realize.
Message queue and monitoring data library:It is realized using Redis, Redis is that a use ANSI C language increased income is compiled Write, support network, can based on memory also can persistence log type high performance Key-Value storages, and provide high performance Message Queuing Services are mainly used to preserve the historical data of monitoring and forward real-time monitoring data.
Collector:Using the monitoring interface in storage system, such as Recon components and StatsD, to the performance number of system According to being collected and monitoring data library be written, and to the real-time system monitoring data that message queue publication monitor is subscribed to.
Scheduler:By reading the historical performance and access information of system in monitoring data library, dsc data segment is identified, really Determine scheduling strategy, and the scheduling of control command progress data is sent to storage system.
Based on framework shown in FIG. 1, the present invention also provides a kind of hot spot data identification of storage system and the sides of scheduling Method, as shown in Fig. 2, including:
Step 201, collector collects the access information of data object in storage system, and the access of the data object is believed Breath is sent to monitoring data library.
In this step, collector obtains the access information of the data object of history by monitoring data library, passes through message Queue obtains the access information of real-time data object, the access information of the wherein real-time data object of message queue come since The real-time system monitoring data that monitor is subscribed to.
Step 202, scheduler obtains the access information of data object from monitoring data library, is believed according to the access of data object Breath identifies hot spot data, and hot spot data is put into the high-speed memory of storage system.
In this step, the discriminant function of hot spot data is as follows:
Get_topk (db, K, time)
Wherein, db is monitoring data library, and K is access times ranking, and time is the initial time for calculating access times.This is sentenced Other function representation:It is obtained out of time this periods till now at time point in the db of monitoring data library, K before access times ranking Hot spot data, the return value of function is the set of these hot spot datas.
Step 203, high-speed memory sets the expired time of hot spot data respectively, and scheduler carries out heat according to expired time The scheduling of point data.
In this step, the scheduling of hot spot data is carried out by expired time.In the tissue of time, divide time into Timeslice, also referred to as time slot, are expressed as TIMESLOT.One TIMESLOT is a period for performing scheduling.For entering The hot spot data of high-speed memory, can set the expired time of an acquiescence, and the expired time of the acquiescence adds T for current time A TIMESLOT.The setting of T and K is related to the utilization rate of high speed storing, and simple hypothesis hot spot data expired time is not updated, And the repetitive rate for assuming data in each TIMESLOT is λ (1 >=λ >=0), then possesses in system after T TIMESLOT Data slot quantity is (1- λ) × T × K.To carrying the object of n times access times, expired time can be delayed to l (l=f (n), l ∈ N) a TIMESLOT.
The scheduling of hot spot data is based on scheduling function:
Get_swapout (schedule, count)
Wherein, schedule is the hot spot data table in high-speed memory, and count is to need to dispatch out high-speed memory The counting of hot spot data.The scheduling function represents:Count hot spot data is obtained in hot spot data table schedule to be scheduled Go out high-speed memory, the return value of function is the set of hot spot data of being scheduled out.The acquisition of the set is by from schedule Count earliest hot spot data of expired time is obtained in table.
The specific algorithm of the scheduling function is as follows:
Wherein, 1-3 rows, expression are prepared work, initialize high-speed memory first, then put schedule set Sky, the last open ended maximum data segment number of computing system, is denoted as volume.
4-25 rows represent systemic circulation of the scheduler program as service progress, and Wait (TIMESLOT) represents the tune in 25 rows Journey is spent to perform once, specifically every TIMESLOT:
5 rows represent record current time time.
6 rows, represent to obtain from database by get_topk functions K before ranking are accessed in a upper TIMESLOT Data acquisition system is denoted as volunteer, notices that the expired time of data in the set should be time+ [T+f (n)] × TIMESLOT, T is the retardation time of acquiescence, and n is the access times of data.
7 rows, represent take out volunteer set in and not schedule set in part, be denoted as insert collection It closes, which needs to be incorporated to schedule set.
8 rows represent to take out volunteer and schedule intersection of sets collection, are denoted as refresh and gather, in the set Data need to carry out the postponement of expired time.
9-10 rows, representing to refresh in schedule needs to carry out expired time newer part, this partial data it is expired Time can postpone several TIMESLOT according to its access times.
11-14 rows represent data expired in statistics schedule tables, are put into and delete set delete.
15-18 rows, represent calculate complete new data addition and stale data deletion after whether the length of schedule tables Degree can be more than disk size volume, if it exceeds progress data is then needed to swap out.The data for needing to swap out are incorporated to swapout Set.
19-21 rows after representing that the data during delete is gathered are deleted from high-speed memory, then delete are gathered Data from schedule set in delete.
22-24 rows, after representing the data push-in high-speed memory during insert is gathered, then the number during insert is gathered Gather according to schedule is incorporated to.
The management of high-speed processing apparatus is needed to complete jointly by collector, scheduler, monitoring data library, relational graph Shown in 1.Collector collects the access information of object, these access informations are provided by the monitor of agent node, then by these Information write-in monitoring data library, scheduler completes identification and the scheduling decision of hot spot data after obtaining data from monitoring data library after Traffic order is sent to high speed storing layer.
Dispatching algorithm needs to consider the problems of schedule list data structure and operation when realizing.Schedule tables " ordered set " for being achieved with Redis storage systems realizes that the member of each ordered set is associated with a scoring, this Member in ordered set for being assigned to best result arrangement by a scoring by minimum, for mistake of the scoring for data in this algorithm Time phase.The key operation list of schedule tables and in Redis it is corresponding operation and time complexity it is as shown in table 1.
Table 1
The present invention provides hot spot data identification and dispatching method and system in a kind of storage system, passes through visiting frequency Statistics, hot spot data is sent into high-speed memory, and safeguard its expired time, and from hot spot data differentiate, hot spot data safeguard, Three aspects of displacement of data carry out complex optimum, efficiently solve the bottleneck problem of hot spot data access.In addition, in monitoring number According to the data slot that K before access times ranking in the period is obtained in library, this part hot spot data can be set according to its access times Put corresponding expired time;It will organize time to become time slot, within each dispatching cycle, it is extended to the data repeatedly accessed Expired time;The earliest data acquisition system of expired time is obtained in hot spot data dispatch list to swap out, and is reached memory space and is returned The purpose of receipts optimizes the overall performance of storage, so as to push the development of mass data storage system structure.
It merits attention, the hot spot data identification proposed in the present invention and dispatching algorithm are also pervasive in all kinds of distributed storages System.Therefore the present invention has very high technological value and practical value in the practice of large-scale distributed object storage system.
Although disclosed herein embodiment as above, the content only for ease of understanding the present invention and use Embodiment is not limited to the present invention.Technical staff in any fields of the present invention is taken off not departing from the present invention Under the premise of the spirit and scope of dew, any modification and variation, but the present invention can be carried out in the form and details of implementation Scope of patent protection, still should be subject to the scope of the claims as defined in the appended claims.

Claims (13)

1. a kind of hot spot data identification of storage system and dispatching method, which is characterized in that including:
Collector collects the access information of data object in storage system, and the access information of the data object is sent to prison Control database;
Scheduler obtains the access information of data object from the monitoring data library, is known according to the access information of the data object Other hot spot data, and will be in the high-speed memory of hot spot data write-in storage system;
The high-speed memory sets the expired time of hot spot data respectively, and scheduler carries out hot spot number according to the expired time According to scheduling;
Wherein, the collector collects the access information of data object in storage system, specially:
Collector obtains the access information of the data object of history by monitoring data library, and number in real time is obtained by message queue According to the access information of object, wherein the access information of the real-time data object of the message queue comes what is subscribed to since monitor Real-time system monitoring data.
2. the hot spot data identification of storage system according to claim 1 and dispatching method, which is characterized in that the basis The access information identification hot spot data of the data object, specially:
According to the access information of the data object, the discriminant function identification hot spot data based on hot spot data, the hot spot number According to discriminant function be:
Get_topk (db, K, time);
Wherein, db is monitoring data library, and K is access times ranking, and time is the initial time for calculating access times;
The discriminant function represents to obtain in the db of monitoring data library in from time time point to current time, access times row The hot spot data of K before name, the return value of function are the set of the hot spot data of K before access times ranking.
3. the hot spot data identification of storage system according to claim 2 and dispatching method, which is characterized in that the high speed Memory sets the expired time of hot spot data respectively, specially:
It is a period for performing scheduling to divide time into time slot TIMESLOT, a TIMESLOT;
The expired time of acquiescence is set respectively to the hot spot data into store through cache, and the expired time of the acquiescence is current Time adds T TIMESLOT;
The repetitive rate of data is λ in each TIMESLOT, wherein 1 >=λ >=0, then after T TIMESLOT, in high-speed memory The hot spot data quantity possessed is (1- λ) × T × K;
To carrying the data object of n times access times, expired time can be delayed to the ∈ N of 1 TIMESLOT, wherein 1=f (n), 1.
4. the hot spot data identification of storage system according to claim 3 and dispatching method, which is characterized in that the scheduling Device carries out the scheduling of hot spot data according to the expired time, specially:
The scheduler carries out the scheduling of hot spot data based on scheduling function, the scheduling function is according to the expired time:
Get_swapout (schedule, count);
Wherein, schedule is the hot spot data table in high-speed memory, and count is the hot spot for needing to dispatch out high-speed memory The counting of data;
The scheduling function represents to obtain count hot spot data quilt in the hot spot data table schedule of high-speed memory High-speed memory is dispatched out, the return value of function is the set of the count hot spot datas that are scheduled out.
5. the hot spot data identification of storage system according to claim 4 and dispatching method, which is characterized in that described to be based on Scheduling function carries out the scheduling of hot spot data, specially:
High-speed memory is initialized, the schedule set in high-speed memory is emptied, and it is open ended to calculate storage system Maximum hot spot data number, is denoted as volume;
Record current time time;
The hot spot data set of K before access ranking in a upper TIMESLOT is obtained from monitoring data library by discriminant function, Be denoted as volunteer, in the hot spot data set expired time of hot spot data should be time+ [T+f (n)] × TIMESLOT, wherein n are the access times of data;
The part in volunteer and not in schedule is taken out, insert set is denoted as, the insert collection is merged Enter schedule set;
The intersection of volunteer and schedule is taken out, is denoted as refresh set, the hot spot number during the refresh is gathered According to the postponement for carrying out expired time;
Refresh schedule in need carry out the newer hot spot data of expired time, by the newer hot spot data it is expired when Between postponed according to access times;
Data expired in schedule are counted, is put into and deletes delete set;
It calculates after completing the addition of new data and the deletion of stale data, if the length of schedule tables is more than disk size Volume then carries out the scheduling of hot spot data, it would be desirable to which the hot spot data to swap out is incorporated to swapout set;
After hot spot data during delete is gathered is deleted from high-speed memory, then hot spot data during delete is gathered from It is deleted in schedule;
After hot spot data during insert is gathered and store through cache, then hot spot data during insert is gathered is incorporated to In schedule;
The scheduling of a hot spot data is performed every TIMESLOT.
It is 6. a kind of for the hot spot data identification of storage system and scheduling system, which is characterized in that including:
Collector for collecting the access information of data object in storage system, and the access information of the data object is sent out Give monitoring data library;
Scheduler, for obtaining the access information of data object from the monitoring data library, according to the access of the data object Information identifies hot spot data, and will be in the high-speed memory of hot spot data write-in storage system;
The high-speed memory, for setting the expired time of hot spot data respectively, scheduler is carried out according to the expired time The scheduling of hot spot data;
Wherein, the system also includes:Monitor and message queue;
The collector obtains the access information of the data object of history by monitoring data library, is obtained by message queue real-time Data object access information, wherein the access information of the real-time data object of the message queue is ordered since monitor The real-time system monitoring data read.
It is 7. according to claim 6 for the hot spot data identification of storage system and scheduling system, which is characterized in that described Discriminant function identification hot spot data of the scheduler based on hot spot data, the discriminant function of the hot spot data are:
Get_topk (db, K, time);
Wherein, db is monitoring data library, and K is access times ranking, and time is the initial time for calculating access times;
The discriminant function represents to obtain in the db of monitoring data library in from time time point to current time, access times row The hot spot data of K before name, the return value of function are the set of the hot spot data of K before access times ranking.
It is 8. according to claim 7 for the hot spot data identification of storage system and scheduling system, which is characterized in that described It is a period for performing scheduling that high-speed memory, which divides time into time slot TIMESLOT, a TIMESLOT,;To entering The hot spot data of high-speed memory sets the expired time of acquiescence respectively, and the expired time of the acquiescence adds T for current time A TIMESLOT;The repetitive rate of data is λ in each TIMESLOT, wherein 1 >=λ >=0, then after T TIMESLOT, deposit at a high speed The hot spot data quantity possessed in reservoir is (1- λ) × T × K;It, can be when will be expired to carrying the data object of n times access times Between delay the ∈ N of 1 TIMESLOT, wherein 1=f (n), 1.
It is 9. according to claim 8 for the hot spot data identification of storage system and scheduling system, which is characterized in that described Scheduler carries out the scheduling of hot spot data based on scheduling function, and the scheduling function is:
Get_swapout (schedule, count);
Wherein, schedule is the hot spot data table in high-speed memory, and count is the hot spot for needing to dispatch out high-speed memory The counting of data;
The scheduling function represents to obtain count hot spot data quilt in the hot spot data table schedule of high-speed memory High-speed memory is dispatched out, the return value of function is the set of the count hot spot datas that are scheduled out.
It is 10. according to claim 9 for the hot spot data identification of storage system and scheduling system, which is characterized in that institute The scheduling that high-speed memory carries out hot spot data based on scheduling function is stated, specially:
High-speed memory is initialized, the schedule set in high-speed memory is emptied, and it is open ended to calculate storage system Maximum hot spot data number, is denoted as volume;
Record current time time;
The hot spot data set of K before access ranking in a upper TIMESLOT is obtained from monitoring data library by discriminant function, Be denoted as volunteer, in the hot spot data set expired time of hot spot data should be time+ [T+f (n)] × TIMESLOT, wherein n are the access times of data;
The part in volunteer and not in schedule is taken out, insert set is denoted as, the insert collection is merged Enter schedule set;
The intersection of volunteer and schedule is taken out, is denoted as refresh set, the hot spot number during the refresh is gathered According to the postponement for carrying out expired time;
Refresh schedule in need carry out the newer hot spot data of expired time, by the newer hot spot data it is expired when Between postponed according to access times;
Data expired in schedule are counted, is put into and deletes delete set;
It calculates after completing the addition of new data and the deletion of stale data, if the length of schedule tables is more than disk size Volume then carries out the scheduling of hot spot data, it would be desirable to which the hot spot data to swap out is incorporated to swapout set;
After hot spot data during delete is gathered is deleted from high-speed memory, then hot spot data during delete is gathered from It is deleted in schedule;
After hot spot data during insert is gathered and store through cache, then hot spot data during insert is gathered is incorporated to In schedule;
The scheduling of a hot spot data is performed every TIMESLOT.
It is 11. according to claim 6 for the hot spot data identification of storage system and scheduling system, which is characterized in that institute Monitor is stated to realize using Python Tornado frames.
It is 12. according to claim 9 for the hot spot data identification of storage system and scheduling system, which is characterized in that institute It states message queue and monitoring data Cooley is realized with Redis, schedule tables are real using the ordered set of Redis storage systems It is existing.
It is 13. according to claim 6 for the hot spot data identification of storage system and scheduling system, which is characterized in that institute State the access information that collector collects data object in storage system using the monitoring interface in storage system.
CN201510696498.4A 2015-10-23 2015-10-23 A kind of hot spot data identification of storage system and dispatching method and system Active CN105373347B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510696498.4A CN105373347B (en) 2015-10-23 2015-10-23 A kind of hot spot data identification of storage system and dispatching method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510696498.4A CN105373347B (en) 2015-10-23 2015-10-23 A kind of hot spot data identification of storage system and dispatching method and system

Publications (2)

Publication Number Publication Date
CN105373347A CN105373347A (en) 2016-03-02
CN105373347B true CN105373347B (en) 2018-06-29

Family

ID=55375581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510696498.4A Active CN105373347B (en) 2015-10-23 2015-10-23 A kind of hot spot data identification of storage system and dispatching method and system

Country Status (1)

Country Link
CN (1) CN105373347B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912703A (en) * 2016-04-26 2016-08-31 北京百度网讯科技有限公司 Data storage method and data query method and device
CN108156193B (en) * 2016-12-02 2022-08-19 阿里巴巴集团控股有限公司 Hotspot determination method and system
CN106682202B (en) * 2016-12-29 2020-01-10 北京奇艺世纪科技有限公司 Search cache updating method and device
CN107463514B (en) * 2017-08-16 2021-06-29 郑州云海信息技术有限公司 Data storage method and device
CN109933575B (en) * 2019-02-28 2021-04-27 鲁东大学 Monitoring data storage method and device
CN110704466B (en) * 2019-09-27 2021-12-17 武汉极意网络科技有限公司 Black product data storage method and device
CN111309270B (en) * 2020-03-13 2021-04-27 清华大学 Persistent memory key value storage system
CN113225396B (en) * 2021-04-30 2022-08-12 深圳市腾讯网域计算机网络有限公司 Hot spot data packet distribution method and device, electronic equipment and medium
CN113849131B (en) * 2021-09-28 2024-09-06 咪咕文化科技有限公司 Data storage method, device, computing equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103713861A (en) * 2014-01-09 2014-04-09 浪潮(北京)电子信息产业有限公司 File processing method and system based on hierarchical division
US8868839B1 (en) * 2011-04-07 2014-10-21 Symantec Corporation Systems and methods for caching data blocks associated with frequently accessed files
CN104133909A (en) * 2014-08-08 2014-11-05 浪潮电子信息产业股份有限公司 Multi-layer file system
CN104834609A (en) * 2015-05-31 2015-08-12 上海交通大学 Multi-level cache method based on historical upgrading and downgrading frequency

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8868839B1 (en) * 2011-04-07 2014-10-21 Symantec Corporation Systems and methods for caching data blocks associated with frequently accessed files
CN103713861A (en) * 2014-01-09 2014-04-09 浪潮(北京)电子信息产业有限公司 File processing method and system based on hierarchical division
CN104133909A (en) * 2014-08-08 2014-11-05 浪潮电子信息产业股份有限公司 Multi-layer file system
CN104834609A (en) * 2015-05-31 2015-08-12 上海交通大学 Multi-level cache method based on historical upgrading and downgrading frequency

Also Published As

Publication number Publication date
CN105373347A (en) 2016-03-02

Similar Documents

Publication Publication Date Title
CN105373347B (en) A kind of hot spot data identification of storage system and dispatching method and system
US20200089624A1 (en) Apparatus and method for managing storage of data blocks
CN103795781B (en) A kind of distributed caching method based on file prediction
CN110134514A (en) Expansible memory object storage system based on isomery memory
US9996404B2 (en) Message cache management for message queues
CN111737168B (en) Cache system, cache processing method, device, equipment and medium
CN110555001B (en) Data processing method, device, terminal and medium
CN109344092A (en) A kind of method and system improving cold storing data reading speed
CN107103068A (en) The update method and device of service buffer
CN101753439A (en) Method for distributing and transmitting streaming media
CN108932150A (en) Caching method, device and medium based on SSD and disk mixing storage
CN109918450A (en) Based on the distributed parallel database and storage method under analysis classes scene
CN107426318A (en) It is synchronous to store affined shared content item
CN104717247A (en) Method and system for dynamically scheduling storage resources in cloud storage system
JPH10124396A (en) Buffer exchanging method
Otoo et al. Disk cache replacement algorithm for storage resource managers in data grids
CN116185308B (en) Data set processing method, device, equipment, medium and model training system
CN109165096A (en) The caching of web cluster utilizes system and method
Zhou et al. Improving big data storage performance in hybrid environment
CN115408149A (en) Time sequence storage engine memory design and distribution method and device
CN104144194A (en) Data processing method and device for cloud storage system
Lau et al. Scheduling and data layout policies for a near-line multimedia storage architecture
Otoo et al. Accurate modeling of cache replacement policies in a data grid
JP2006085208A (en) Information life cycle management system and data arrangement determination method therefor
CN108243228B (en) Method for data scheduling and intelligent servo cluster

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant