CN105512185A

CN105512185A - Cache sharing method based on operation sequence

Info

Publication number: CN105512185A
Application number: CN201510830806.8A
Authority: CN
Inventors: 何晓斌; 魏巍; 王红艳
Original assignee: Wuxi Jiangnan Computing Technology Institute
Current assignee: Wuxi Jiangnan Computing Technology Institute
Priority date: 2015-11-24
Filing date: 2015-11-24
Publication date: 2016-04-20
Anticipated expiration: 2035-11-24
Also published as: CN105512185B

Abstract

The invention provides a cache sharing method based on operation sequence. The cache sharing method comprises following steps: announcing the resources amount of DFS Cache needed during JOB operation before JOB is submitted and operated; allocate corresponding DFS Cache resources to all JOB operation by a system and starting JOB operation to launch; operating all JOB operation in a multi-wheeled mode, wherein JOB may generate access to DFS Cache for multiple times such that the system acquires time intervals of all JOB operation access to DFS Cache; recording time intervals and starting a DFS Cache sharing allocation algorithm on the condition that time intervals of the JOB operation access to cache resources tend to stabilize; and reading and writing data of JOB operation by a storage management system in accordance with an all operating JOB operation access to DFS Cache and starting again the sharing allocation algorithm when the storage management system determines that JOB data access exceeds the time window.

Description

A kind of method based on operation timing Cache Design

Technical field

The present invention relates to field of computer technology, be specifically related to a kind of method based on operation timing Cache Design.

Background technology

High-performance computer is huge, data access amount concurrent during tasks carrying is also ten hundreds of, therefore the performance for distributed file system has very high requirement, distributed file system generally can configure on the server and must accelerate resource for this reason, realize the read-write requests process for mass data, these cache resources capacity for the memory capacity of distributed file system itself is very little, but its performance height but may be several times as much as the performance of distributed file system itself, cost is also very high, although therefore in job run process, the cache resource allocation of exclusive formula is simple, but it is and unreasonable.

JOB is that one operates in application software on high-performance computer computational resource, the scientific algorithm task that its general execution is certain, and at the certain phase of calculation task, data are write distributed file system, the data volume that problem once exports is very large, often at tens of TB thousands of TB even up to a hundred, the performance requirement for the storage resources of distributed document structure is very high.

When in high-performance calculation, JOB starts, system generally can for JOB distributes must DFS (DistributeFileSystem, the integrated a large amount of storage server resource of distributed file system and cache resources, operate on storage server, a shared storage space by software simulating, for high-performance computer provides high-performance, high concurrent reading and writing data support) cache resources, be used for accelerating the reading and writing data performance of JOB, in JOB implementation, usually this DFS cache resources can distribute to JOB regularly, due to the stage that JOB at data access is, namely there is gap, and access gap is generally more than more than ten minutes, therefore the free time of cache resources is caused in access gap, also the waste of cache resources is indirectly caused.

More particularly, when a JOB startup optimization, system can distribute DFSCache resource for this JOB.DFSCache is a kind of distributed file system buffer memory, this distributed file system generally can dispose the resource of the special acceleration distributed file access such as certain SSD, internal memory on its server run, and dispatch these cache resources by distributed file system, accelerate to provide support for calculation task obtains data access performance.

Like this, in JOB operational process, this resource is monopolized by JOB, and JOB is in operational process, often have the stage of data access, namely after a stage of reading and writing data completes, JOB just can carry out the reading and writing data in next stage after only completing certain calculation task, due to DFSCache resource in HPC, often performance is high, cost is high, therefore can cause the wasting of resources.

Summary of the invention

Technical matters to be solved by this invention is for there is above-mentioned defect in prior art, provides a kind of method based on operation timing Cache Design that DFSCache that can realize between JOB shares.

According to the present invention, provide a kind of method based on operation timing Cache Design, comprising:

First step: before JOB submits operation to, the DFSCache stock number of the needs in statement JOB operational process;

Second step: system is that the operation of each JOB distributes corresponding DFSCache resource, and starts JOB job run;

Third step: each JOB job run is taken turns more, and wherein JOB can produce repeatedly for the access of DFSCache, makes each JOB operation of system acquisition access the time interval of DFSCache;

4th step: when the time interval of JOB operation access cache resource tends towards stability, record the described time interval, and start DFSCache and share allocation algorithm.

Preferably, the described method based on operation timing Cache Design also comprises:

5th step: when performing JOB operation, the data of situation to JOB operation of accessing DFSCache according to all JOB operations run by storage management system are read and write, if storage management system determines to there is JOB data access overtime window, return to the 4th step and share allocation algorithm to restart DFSCache.

6th step: after Job execution completes, discharges DFSCache resource shared by described operation.

Preferably, described DFSCache shares allocation algorithm and comprises: the operation time interval allocation table setting up all JOB in system, determines whether there is DFSCache and has free time section the JOB corresponding to this DFSCache and other JOB can be allowed to share DFSCache; And when there is DFSCache and having that free time, section can allow the JOB corresponding to this DFSCache and other JOB share DFSCache, judge whether the spatial cache of this DFSCache exists remaining cache space.

Preferably, described DFSCache shares allocation algorithm and also comprises: if described free time section and the remaining cache space of this DFSCache meet the requirement of the new JOB started, so direct by described free time section and the remaining cache allocation of space of this DFSCache give the new JOB started.

Preferably, described DFSCache shares allocation algorithm and also comprises: if described free time section and DFSCache remaining cache insufficient space to meet the requirement of the new JOB started, then after making the new JOB started utilize described remaining cache space resources again for the new JOB started distributes new resource.

Preferably, described DFSCache shares allocation algorithm and also comprises: if DFSCache spatial cache does not remain, then for the new JOB started distributes new DFSCache, and monopolized the described new DFSCache of distribution by the operation of the JOB of this new startup before another JOB job initiation.

Preferably, the time interval tends towards stability and refers to that the time interval is steady state value or is greater than particular value.

The invention solves DFS cache resources fixed allocation to the drawback of JOB, by the gap information at accumulation layer Collecting operation access DFS buffer memory, operation is transferred according to operation cache access gap, realize multiplexing between operation access DFS buffer memory gap of different DFS cache resources, improve the comprehensive utilization ratio of system.

Accompanying drawing explanation

By reference to the accompanying drawings, and by reference to detailed description below, will more easily there is more complete understanding to the present invention and more easily understand its adjoint advantage and feature, wherein:

Fig. 1 schematically shows according to the preferred embodiment of the invention based on the process flow diagram of the method for operation timing Cache Design.

It should be noted that, accompanying drawing is for illustration of the present invention, and unrestricted the present invention.Note, represent that the accompanying drawing of structure may not be draw in proportion.Further, in accompanying drawing, identical or similar element indicates identical or similar label.

Embodiment

In order to make content of the present invention clearly with understandable, below in conjunction with specific embodiments and the drawings, content of the present invention is described in detail.

High-performance calculation HPC (HighPerformanceComputing) is the system of integrated large-scale calculations resource and storage resources, realize the process for ultra-large problem, this system integration tens thousand of central processing unit parallel computations, and mass data is written in the storage resources that distributed file system builds, the requirements such as institute supports for the concurrency of storage resources, performance are all very high.

In high-performance calculation HPC, apply and use computational resource and storage resources to solve problem in science by the mode of submit job and JOB, for the JOB that data output quantity is larger, when its startup optimization, system can distribute DFSCache resource for this JOB, and in JOB operational process, simple for JOB data management, guarantee JOB data security, this resource is monopolized by JOB, and JOB is in operational process, often there is the stage of data access, namely after a stage of reading and writing data completes, JOB just can carry out the reading and writing data in next stage after only completing certain calculation task, due to DFSCache resource in high-performance calculation HPC, often performance is high, cost is high, therefore the wasting of resources can be caused.

What the present invention proposed is according to schedule job according to operation access DFSCache time slot, realizes operation sharing DFSCache resource.After Hand up homework, be that operation distributes different cache resources in former steps of job run, and the time interval of Collecting operation access cache resource, after collecting more stable time interval value, by process of ranking to these time interval values, obtain the tandem that JOB accesses DFSCache, system can access certain data dispatch strategy realization in different work gap for the share and access of DFSCache afterwards.

As shown in Figure 1, comprise based on the method for operation timing Cache Design according to the preferred embodiment of the invention:

The DFSCache stock number of the needs in first step S1: before JOB submits operation to, statement JOB operational process;

Second step S2: system (such as according to acquiescence mode) distributes corresponding DFSCache resource for the operation of each JOB, and starts JOB job run;

Third step S3: each JOB job run is taken turns more, and wherein JOB can produce repeatedly for the access of DFSCache, makes each JOB operation of system acquisition access the time interval of DFSCache;

4th step S4: tending towards stability in the time interval (time interval corresponding to JOB operation access cache resource) of often taking turns operation of JOB operation, (" tending towards stability " refers to, the time interval is steady state value or is greater than particular value) when, record the described time interval, and start DFSCache and share allocation algorithm;

Particularly, described DFSCache shares allocation algorithm and comprises:

Set up the operation time interval allocation table of all JOB in system, determine whether there is DFSCache and there is free time section the JOB corresponding to this DFSCache and other JOB can be allowed to share DFSCache; When there is DFSCache and having that free time, section can allow the JOB corresponding to this DFSCache and other JOB share DFSCache, judge whether the spatial cache of this DFSCache exists remaining cache space;

Further, if described free time section and the remaining cache space of this DFSCache meet the requirement of the new JOB started, so direct by described free time section and the remaining cache allocation of space of this DFSCache give the new JOB started;

On the other hand, if described free time section and DFSCache remaining cache insufficient space to meet the requirement of the new JOB started, then after making the new JOB started utilize described remaining cache space resources again for the new JOB started distributes new resource;

If DFSCache spatial cache does not remain, then for the new JOB started distributes new DFSCache, and monopolized the described new DFSCache of distribution by the operation of the JOB of this new startup before another JOB job initiation.

And, 5th step S5: when performing JOB operation, the data of situation to JOB operation of accessing DFSCache according to all JOB operations run by storage management system are read and write, if storage management system determines to there is JOB data access overtime window, return to the 4th step S4 and carry out restarting DFSCache and share allocation algorithm;

6th step S6: after Job execution completes, can discharge the time interval shared by described operation and DFSCache resource.

The present invention, in conjunction with the time interval between different JOB between access cache, realizes sharing for memory buffers resource, and above shared transparent for user JOB.The invention has the advantages that and achieve the sharing of cache resources for preciousness of multiple JOB in high-performance calculation, overcome the shortcoming that traditional cache resources monopolizes use-pattern, improve the utilization factor of system cache resource.

In addition, it should be noted that, unless otherwise indicated, otherwise the term " first " in instructions, " second ", " the 3rd " etc. describe only for distinguishing each assembly, element, step etc. in instructions, instead of for representing logical relation between each assembly, element, step or ordinal relation etc.

Be understandable that, although the present invention with preferred embodiment disclose as above, but above-described embodiment and be not used to limit the present invention.For any those of ordinary skill in the art, do not departing under technical solution of the present invention ambit, the technology contents of above-mentioned announcement all can be utilized to make many possible variations and modification to technical solution of the present invention, or be revised as the Equivalent embodiments of equivalent variations.Therefore, every content not departing from technical solution of the present invention, according to technical spirit of the present invention to any simple modification made for any of the above embodiments, equivalent variations and modification, all still belongs in the scope of technical solution of the present invention protection.

Claims

1., based on a method for operation timing Cache Design, it is characterized in that comprising:

2. the method based on operation timing Cache Design according to claim 1, characterized by further comprising:

3. the method based on operation timing Cache Design according to claim 1 and 2, characterized by further comprising:

4. the method based on operation timing Cache Design according to claim 1 and 2, it is characterized in that, described DFSCache shares allocation algorithm and comprises: the operation time interval allocation table setting up all JOB in system, determines whether there is DFSCache and has free time section the JOB corresponding to this DFSCache and other JOB can be allowed to share DFSCache; And when there is DFSCache and having that free time, section can allow the JOB corresponding to this DFSCache and other JOB share DFSCache, judge whether the spatial cache of this DFSCache exists remaining cache space.

5. the method based on operation timing Cache Design according to claim 4, it is characterized in that, described DFSCache shares allocation algorithm and also comprises: if described free time section and the remaining cache space of this DFSCache meet the requirement of the new JOB started, so direct by described free time section and the remaining cache allocation of space of this DFSCache give the new JOB started.

6. the method based on operation timing Cache Design according to claim 5, it is characterized in that, described DFSCache shares allocation algorithm and also comprises: if described free time section and DFSCache remaining cache insufficient space to meet the requirement of the new JOB started, then after making the new JOB started utilize described remaining cache space resources again for the new JOB started distributes new resource.

7. the method based on operation timing Cache Design according to claim 6, it is characterized in that, described DFSCache shares allocation algorithm and also comprises: if DFSCache spatial cache does not remain, then for the new JOB started distributes new DFSCache, and monopolized the described new DFSCache of distribution by the operation of the JOB of this new startup before another JOB job initiation.

8. the method based on operation timing Cache Design according to claim 1 and 2, is characterized in that, the time interval tends towards stability and refers to that the time interval is steady state value or is greater than particular value.