CN110727406A - Data storage scheduling method and device - Google Patents

Data storage scheduling method and device Download PDF

Info

Publication number
CN110727406A
CN110727406A CN201910965867.3A CN201910965867A CN110727406A CN 110727406 A CN110727406 A CN 110727406A CN 201910965867 A CN201910965867 A CN 201910965867A CN 110727406 A CN110727406 A CN 110727406A
Authority
CN
China
Prior art keywords
data
storage
preset
stored
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910965867.3A
Other languages
Chinese (zh)
Inventor
董维
张磊
黄如
向洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Liwei Zhilian Technology Co Ltd
Shenzhen ZNV Technology Co Ltd
Nanjing ZNV Software Co Ltd
Original Assignee
Shenzhen Liwei Zhilian Technology Co Ltd
Nanjing ZNV Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Liwei Zhilian Technology Co Ltd, Nanjing ZNV Software Co Ltd filed Critical Shenzhen Liwei Zhilian Technology Co Ltd
Priority to CN201910965867.3A priority Critical patent/CN110727406A/en
Publication of CN110727406A publication Critical patent/CN110727406A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0685Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays

Abstract

A data storage scheduling method and device comprises the following steps: firstly, monitoring the attribute of stored data in a storage medium in real time, wherein the attribute of the stored data comprises storage time and a data type, so as to redetermine the data type of the stored data according to preset data classification logic and the storage time and the data type of the stored data, then determining a scheduling strategy of the stored data according to preset storage scheduling logic and the redetermined data type, wherein the scheduling strategy comprises the storage medium corresponding to the stored data, and finally, performing storage scheduling on the stored data according to the scheduling strategy. The data are scheduled and stored by monitoring the attribute change of the data in real time, the mobility of the data is improved, the data with high real-time performance are stored in a high-performance high-speed storage medium, the data with low real-time performance can be stored in the storage medium with low access speed, and the problem of contradiction among large capacity of a storage, high speed and low cost of system storage is solved.

Description

Data storage scheduling method and device
Technical Field
The invention relates to the technical field of communication, in particular to a data storage scheduling method and device.
Background
The data storage is a temporary file generated in the processing process of the data stream or information needing to be searched in the processing process. Data is recorded on a storage device of a computer in a certain format, the storage device of the computer can be divided into an internal memory and an external memory from the architectural point of view, and the internal memory (namely the internal memory) is directly connected with a CPU of the computer and is positioned at the top layer of data storage. Its access speed requirement can be matched with that of CPU, and is usually formed from semiconductor memory chip, and its capacity is not too large due to its high cost. For storing a large amount of data, an external memory is usually used, which may be divided into several layers, where the first layer is connected to an internal memory, and includes an online memory (or called online memory), such as a hard disk drive, a disk array, etc.; the next layer is a backup memory (or called a near-line memory) which is composed of devices such as an optical disk drive, an optical disk library, a tape library and the like with the access speed slower than that of a hard disk; the bottom layer is an offline storage (or called offline storage), a warehouse is formed by a tape drive, a tape library and the like, the access speed of the warehouse is relatively slow and is only an order of magnitude, and the storage medium can be saved offline and replaced, so the capacity is almost infinite. For ordinary personal computer users, it is sufficient to use storage media such as hard disk, software, and optical disk to store data, but for business users and some network systems, tape drives, tape libraries, and optical disk libraries are indispensable data storage and backup devices, and there are now rapidly developing storage networks that can provide more convenient data storage.
In a monitoring system, a plurality of monitored equipment objects are provided, wherein one monitored equipment object has a plurality of monitoring indexes, all monitoring indexes need to meet the requirement of timing acquisition and analysis, data storage can be involved when all monitoring data are processed, the data are stored on a common hard disk, the reading and writing speed is limited by the hard disk when the data are accessed at high concurrency, the data reading and writing efficiency can be improved when a solid state storage disk or a memory is used for replacing a traditional mechanical hard disk, but under the condition of mass data scale, a high-performance storage medium can bring huge hardware cost, but at present, no scheme related to data storage transfer exists, the cost of data reading and storage can be influenced when the data with low timeliness are always occupied in the high-performance storage medium, the system data storage scheduling has large storage capacity, The speed is high and the cost is low. Taking a data center as an example, the larger the scale of the data center is, the more the types and the number of objects to be monitored in real time are, and how to design an efficient storage scheduling mechanism under mass data is an important technical direction and a difficult problem in the field of operation and maintenance.
Disclosure of Invention
The invention mainly solves the technical problem of how to solve the contradiction among large memory capacity, high speed and low cost of system storage.
According to a first aspect, an embodiment provides a data storage scheduling method, including:
monitoring the attribute of the data stored in the storage medium in real time; wherein the attribute of the storage data comprises storage time and data type;
re-determining the data type of the stored data according to preset data classification logic and the attribute of the stored data;
determining a scheduling strategy of the stored data according to a preset storage scheduling logic and the redetermined data type; the scheduling policy comprises a storage medium corresponding to the storage data;
and performing storage scheduling on the storage data according to the scheduling strategy.
In one possible implementation manner, the re-determining the type of the storage data according to the preset data classification logic and the attribute of the storage data includes:
acquiring a data type corresponding to the storage data according to the storage equipment;
acquiring preset data classification logic corresponding to the data type;
and re-determining the type of the stored data according to the preset data classification logic and the storage time of the stored data.
In one possible implementation manner, the obtaining the data type of the storage data includes:
calculating a data heat value according to the behavior time of the stored data;
detecting that the heat value of the data is within a preset first time range, classifying the data into heat data and storing the heat data into a corresponding storage medium;
detecting that the heat value of the data is within a preset second time range, classifying the data into warm data and storing the warm data into a corresponding storage medium;
detecting that the heat value of the data is within a preset third time range, classifying the data into cold data and storing the cold data into a corresponding storage medium; and the time dimensions of the preset first time range, the preset second time range and the preset third time range are gradually increased.
In one possible implementation manner, the obtaining of the preset data classification logic corresponding to the data type includes:
when the data type is thermal data, the corresponding preset data classification logic comprises a preset fourth time range;
when the data type is warm data, the corresponding preset data classification logic comprises a preset fifth time range;
when the data type is cold data, the corresponding preset data classification logic comprises a preset sixth time range; wherein the first time range, the second time range, and the third time range have different time dimensions.
In one possible implementation manner, the re-determining the data type of the stored data according to the preset data classification logic and the attribute of the stored data includes:
when the storage time of the hot data is within the fourth time range, determining that the hot data is warm data or data to be deleted;
when the storage time of the warm data is within the fifth time range, determining that the warm data is cold data or data to be deleted;
and when the storage time of the cold data is within the sixth time range, determining that the cold data is to-be-deleted data or to-be-compressed archived data.
In one possible implementation manner, the attribute of the stored data further includes a data service type;
and dividing the preset fourth time range, the preset fifth time range and the preset sixth time range according to the data service type.
In one possible implementation manner, the outputting the scheduling policy of the storage data according to a preset storage scheduling logic and the type of the storage data includes:
when the thermal data is determined to be warm data again, outputting a scheduling strategy of the thermal data to be transferred to a warm data storage medium according to preset storage scheduling logic;
when the hot data is determined to be the data to be deleted again, outputting a scheduling strategy of the hot data to be deleted according to preset storage scheduling logic;
when the warm data is determined to be cold data again, outputting a scheduling strategy of the warm data to be transferred to a cold data storage medium according to preset storage scheduling logic;
when the warm data is determined to be the data to be deleted again, outputting a scheduling strategy of the warm data to be deleted according to preset storage scheduling logic;
when the cold data is determined to be the data to be deleted again, outputting a scheduling strategy of the cold data to be deleted according to preset storage scheduling logic;
and when the cold data is redetermined as the to-be-compressed archived data, outputting the scheduling strategy of the cold data as the to-be-compressed archived according to preset storage scheduling logic.
In one possible implementation manner, the performing storage scheduling on the storage data according to the scheduling policy includes:
responding to a data storage request, and receiving a transfer storage strategy in the scheduling strategies;
the data to be stored are transferred and stored according to a transfer storage strategy, so that hot data are stored in a top-layer storage medium, warm data are stored in a middle-layer storage medium and/or cold data are stored in a bottom-layer storage medium; the top storage medium, the middle storage medium and the bottom storage medium are different from each other, and the data access speed is decreased gradually.
In one possible implementation manner, the performing storage scheduling on the storage data according to the scheduling policy further includes:
and verifying the data before and/or after the data is transferred and stored to ensure the integrity of the data.
According to a second aspect, an embodiment provides a data storage scheduling apparatus, including:
the monitoring module is used for monitoring the attribute of the data stored in the storage medium in real time; wherein the attribute of the storage data comprises storage time and data type;
the type determining module is used for re-determining the data type of the stored data according to preset data classification logic and the attribute of the stored data;
the result output module is used for determining the scheduling strategy of the stored data according to the preset storage scheduling logic and the redetermined data type; the scheduling policy comprises a storage medium corresponding to the storage data;
and the processing module is used for carrying out storage scheduling on the storage data according to the scheduling strategy.
According to the data storage scheduling method and device of the embodiment, firstly, the attributes of the storage data in the storage medium are monitored in real time, wherein the attributes of the storage data comprise storage time and data types, the data types of the storage data are re-determined according to preset data classification logic and the storage time and the data types of the storage data, then, the scheduling strategy of the storage data is determined according to preset storage scheduling logic and the re-determined data types, wherein the scheduling strategy comprises the storage medium corresponding to the storage data, and finally, the storage scheduling is carried out on the storage data according to the scheduling strategy. By monitoring the attribute change of the data in real time and scheduling and storing the data according to the storage time of the data and the corresponding data type of the data, the mobility of the data is improved, the data with high real-time performance is stored into a high-performance high-speed storage medium, the data with low real-time performance can be stored into a storage medium with low access speed, and the problem of contradiction among large storage capacity, high speed and low cost of system storage is solved.
Drawings
Fig. 1 is a schematic flow chart of a data storage scheduling method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a data type determining method according to an embodiment of the present invention;
FIG. 3 is a flow chart illustrating a method for determining a storage medium according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating another data storage scheduling method according to an embodiment of the present invention;
fig. 5 is a schematic flowchart of a method for obtaining preset data classification logic according to an embodiment of the present invention;
fig. 6 is a flowchart illustrating a method for re-determining a data type according to an embodiment of the present invention;
fig. 7 is a flowchart illustrating a method for determining a scheduling policy of stored data according to an embodiment of the present invention;
fig. 8 is a schematic flowchart of a storage scheduling method according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a data storage scheduling apparatus according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.
Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.
The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).
In the embodiment of the invention, based on the contradiction among large capacity, high speed and low cost of the existing system storage, the inventor provides a data storage scheduling scheme, firstly, data monitoring and management are realized through real-time monitoring of data attributes, certain scheduling is carried out on the data according to a monitoring result, the mobility of the data is improved, the occupation of outdated data on a high-performance storage medium is avoided through improving the mobility of the data, the data access speed and the storage cost are ensured, and the data are stored into different storage media according to different data types, so that the balance among large capacity, high speed and low cost of the system storage is achieved.
Example one
Referring to fig. 1, a data storage scheduling method according to an embodiment of the present invention includes steps S10 to S40, which are described in detail below.
Step S10: monitoring the attribute of the data stored in the storage medium in real time; wherein the attribute of the storage data comprises storage time and data type.
In the embodiment of the present invention, in step S10, the storage data in the storage medium is monitored to obtain the status of the storage data, including the data type of the storage data of the current storage medium, the storage time of the storage data stored in the storage medium, and the service type of the storage data, and then the following steps are performed according to the obtained status of the storage data.
It should be noted that the storage medium includes a top storage medium, a middle storage medium, a bottom storage medium, and corresponding storage software middleware, and the top storage medium includes a top storage device and a top storage software middleware, and the top storage device is a high-speed data product and may include a memory and the like. The top storage software middleware comprises Redis, E1 static search, Influxdb and the like. The data stored in the top storage medium is large in data volume and high in real-time performance. The middle storage medium comprises a middle storage device and a middle storage software middleware, wherein the middle storage device can be a memory + SSD or a high-speed hard disk or other high-speed storage devices. The middle-layer storage software middleware comprises a time sequence database, such as infinxdb, elastic search and the like. The bottom storage medium comprises bottom storage equipment and bottom storage software middleware, and the bottom storage equipment comprises a common hard disk or a traditional mechanical hard disk plus HDFS (Hadoop distributed File System), and the like. The underlying storage software middleware comprises middleware using Hadoop big data storage technology. The speed of accessing data of the top storage medium, the middle storage medium and the bottom storage medium is gradually reduced, the real-time performance of data bits stored in the bottom storage medium is not high, such as historical monitoring data, and the data bits are mainly used for data analysis and statistical query of historical data. However, the bottom storage device in the bottom storage medium is managed based on the cooperation of the bottom storage software middleware, so that the bottom storage medium has the performances of multiple backups, high reliability, distributed storage, high storage throughput and the like.
In embodiments of the present invention, hot data is stored in the top storage media, warm data is stored in the middle storage media and/or stored in the top storage media, and cold data is stored in the bottom storage media. The storage data stored therein is defined as hot data when accessing the top storage medium, and based on the inventive idea of the present application, even the storage data stored to the top storage medium is not always hot data, the data type is redefined according to the preset data classification logic, and the redefined data type may be defined as warm data, cold data or data to be deleted. It is possible that the data types of the corresponding warm data and cold data are redefined, and the present invention is not particularly limited thereto.
In the embodiment of the invention, the thermal data in the top storage medium at least comprises two sources, one source is data which is taken out from the relational database and is put into the memory database, and the other source is real-time reported data. For the data taken out from the relational database, the latest data can be taken out from the relational database at regular time, and the data in the memory database is refreshed; and for the data reported in real time, the latest reported monitoring data is always stored according to the service scene, and the data within more than one day is directly deleted. The warm data in the middle storage medium at least comprises data reported in real time, and can also be stored in the top storage medium.
In the embodiment of the present invention, data stored in the underlying storage medium is mainly used for data analysis, statistics, and the like, and as time increases, the data volume will continue to increase, thereby increasing the storage media, and supporting consumption along with maintenance of corresponding manpower and material resources, so that it is necessary to comprehensively consider from the aspects of business requirements and storage costs, and correspondingly, the following operations are periodically performed: a pair of data with super-long time is compressed and stored at regular time, for example, the data with more than 5 years is taken out at regular time, compressed and filed, and then the original data is deleted, so that the storage space is saved on the premise of not losing the data. And secondly, analyzing the service data, only keeping the fields which are currently and possibly used, and deleting the fields which are not significant or are rarely used, so that the data storage space is reduced. In the above embodiment, warm data and cold data may share the same storage medium in consideration of the limited data volume of the items of the medium and small sizes, but in order to improve the system response efficiency, it is necessary to perform the sorting and the tabulation according to the data time and the size of the database or the table. The present invention is not particularly limited in this regard.
Step S20: and re-determining the data type of the stored data according to preset data classification logic and the attribute of the stored data.
In the embodiment of the present invention, referring to fig. 2, step S20 includes step S21, step S22 and step S23, which are described in detail below.
Step S21: and acquiring the data type corresponding to the stored data according to the storage equipment.
In the embodiment of the present invention, referring to fig. 3, it is determined according to steps S201 to S204 that each storage data is stored in the corresponding storage medium, which is described in detail below.
Step S201: and calculating a data heat value according to the behavior time of the stored data.
It should be noted that the behavior time of the storage data includes the time of the storage data being operated, and for example, if the storage data has been subjected to operations such as adding content, deleting content, modifying content, or being accessed, the time from the time when the storage data has been operated to the current time is the corresponding behavior time. Illustratively, for the data center, it is the time from when the reported monitoring or alarm data is collected to the current time.
Step S202: and classifying the data into thermal data and storing the thermal data into a corresponding storage medium when the data heat value is detected to be within a preset first time range.
Step S203: and classifying the data as warm data to be stored in the corresponding storage medium when the data heat value is detected to be within a preset second time range.
Step S204: and classifying the data into cold data and storing the cold data into corresponding storage media when the data heat value is detected to be within a preset third time range, wherein the time dimensions of the preset first time range, the preset second time range and the preset third time range are gradually increased.
Referring to fig. 4, for the real-time data collected at the current time, the real-time data may be classified into a warm data cluster or a hot data cluster according to the behavior time of the real-time data, and when the corresponding heat value is calculated to be within the first time range according to the behavior time of the real-time data, the real-time data is inserted into the hot data cluster, which is a top storage medium corresponding to the hot data cluster, that is, the data is classified as hot data and stored in the corresponding storage medium. And when the corresponding heat value is calculated to be within a second time range according to the behavior time of the real-time data, the real-time data is inserted into a warm data cluster, the warm data cluster corresponds to the middle-layer storage medium, and the data is classified as warm data and stored into the corresponding storage medium. When the corresponding heat value is calculated to be within the third time range according to the behavior time of the real-time data, the real-time data is inserted into a cold data cluster, and the cold data cluster corresponds to the bottom storage medium, that is, the data is classified as cold data and stored in the corresponding storage medium, which is not limited in the invention.
In this embodiment of the present invention, the attribute of the stored data further includes a data service type, and the preset first time range, the preset second time range, and the preset third time range may be divided according to the data service type. Taking performance data of equipment reported in real time as an example, data of behavior time within 24 hours is determined as hot data, behavior time data greater than 1 day and less than or equal to 14 days is determined as warm data, and data of behavior time greater than 14 days is determined as cold data. The boundary of the partial time division can be flexibly configured according to different actual services. When the currently acquired real-time data is AI preset data, the preset first time range, the preset second time range and the preset third time range set by the AI preset data may be different from the preset first time range, the preset second time range and the preset third time range of the monitoring data.
It should be noted that, the execution sequence of the above steps S202 to S204 is not limited, and when the heat value of the currently stored data is within a preset first time range, the data is classified as hot data and stored in the corresponding storage medium, when the heat value of the currently stored data is within a preset second time range, the data is classified as warm data and stored in the corresponding storage medium, and when the heat value of the currently stored data is within a preset third time range, the data is classified as cold data and stored in the corresponding storage medium.
Step S22: and acquiring preset data classification logic corresponding to the data type.
In the embodiment of the present invention, referring to fig. 5, step S22 includes steps S221 to S223, which are described in detail below.
Step S221: when the data type is thermal data, the corresponding preset data classification logic comprises a preset fourth time range.
Step S222: and when the data type is warm data, the corresponding preset data classification logic comprises a preset fifth time range.
Step S223: when the data type is cold data, the corresponding preset data classification logic comprises a preset sixth time range; wherein the first time range, the second time range, and the third time range have different time dimensions.
In one possible implementation manner, if the attribute of the stored data further includes a data service type, the preset fourth time range, the preset fifth time range, and the preset sixth time range are divided according to the data service type. That is, the preset fourth time range, the preset fifth time range and/or the preset sixth time range divided by the data type may be different for different data types. For example, the monitoring data may correspond to the sixth time range of 14 days when the monitoring data is warm data, and the AI prediction data which is also warm data may correspond to the sixth time range of 7 days, which is not particularly limited in the present invention.
In the embodiment of the present invention, the attribute of the stored data includes a data service type, the stored data has dimensions such as a service attribute and a time attribute, and different service data, and according to the actual service, there may be data of corresponding cold, warm, and hot three time dimensions, so that for monitoring the service, the reported telemetry data (in dynamic loop monitoring, the measurement point data of a monitored object reported in real time) is expanded according to time, and there are three data, namely hot data, warm data, and cold data. The data reported in real time is stored in a storage medium corresponding to the warm data, partial warm data can be stored in a storage medium corresponding to hot data for convenient processing, warm data can be dumped into cold data for persistent storage after being classified according to the preset data classification logic, and the stored data in the storage medium corresponding to the cold data can be compressed and filed after being classified according to the preset data classification logic. For the configuration management module of the monitoring system, only part of data may be frequently used, and the part of data can be put into a storage medium corresponding to the hot data. For the AI prediction module, the result of the model calculation can be simultaneously put into the storage media corresponding to the hot data and the warm data, after a period of time, the hot data is deleted, and the warm data is dumped into the storage media corresponding to the cold data.
Step S23: and re-determining the type of the stored data according to the preset data classification logic and the storage time of the stored data.
In the embodiment of the present invention, referring to fig. 6, the step S23 includes steps S231 to S233, which are described in detail below.
Step S231: and when the storage time of the hot data is within the fourth time range, determining that the hot data is warm data or data to be deleted.
Step S232: and when the storage time of the warm data is within the fifth time range, determining that the warm data is cold data or data to be deleted.
Step S233: and when the storage time of the cold data is within the sixth time range, determining that the cold data is to-be-deleted data or to-be-compressed archived data.
In the embodiment of the present invention, scheduling is performed according to the time for storing data in each storage medium, and may also be performed according to the storage capacity of the storage medium, referring to fig. 4, for the storage data in the warm data cluster, a timing task may be set, the storage data whose storage time is in the fifth time range is determined to be cold data at regular time, or the capacity in the warm data storage medium is restored at regular time of the timing task, when the storage capacity in the warm data storage medium is greater than a preset threshold, the storage data in the warm data storage medium is determined to be cold data, and in combination with the storage time, when the storage capacity in the warm data storage medium is greater than the preset threshold, the warm data whose storage time is longest is determined to be cold data. The type of the warm data may be determined according to the data traffic type when the storage time of the warm data is within the fifth time range, for example, for the monitoring data of which the data type is warm data, it is determined that the monitoring data is cold data when the storage time is within the fifth time range, and for the AI prediction data of which the data type is warm data, it is determined that the AI prediction data is to-be-deleted data when the storage time is within the fifth time range.
Step S30: determining a scheduling strategy of the stored data according to a preset storage scheduling logic and the redetermined data type; the scheduling policy comprises a storage medium corresponding to the storage data.
In the embodiment of the present invention, referring to fig. 7, the step S30 includes steps S31 to S36, which are described in detail below.
Step S31: and when the thermal data is determined to be the warm data again, outputting a scheduling strategy of the thermal data to be transferred to a warm data storage medium according to a preset storage scheduling logic.
Step S32: and when the hot data is determined to be the data to be deleted again, outputting the scheduling strategy of the hot data to be deleted according to a preset storage scheduling logic.
Step S33: and when the warm data is determined to be cold data again, outputting a scheduling strategy of the warm data to be transferred to a cold data storage medium according to a preset storage scheduling logic.
Step S34: and when the warm data is determined to be the data to be deleted again, outputting the scheduling strategy of the warm data to be deleted according to a preset storage scheduling logic.
Step S35: and when the cold data is determined to be the data to be deleted again, outputting the scheduling strategy of the cold data to be deleted according to a preset storage scheduling logic.
Step S36: and when the cold data is redetermined as the to-be-compressed archived data, outputting the scheduling strategy of the cold data as the to-be-compressed archived according to preset storage scheduling logic.
Step S40: and performing storage scheduling on the storage data according to the scheduling strategy.
Referring to fig. 8, in one possible implementation, step S40 includes steps S41 to S42, which are explained in detail below.
Step S41: and responding to a data storage request, and receiving a transfer storage strategy in the scheduling strategies.
Step S42: the data to be stored are transferred and stored according to a transfer storage strategy, so that hot data are stored in a top-layer storage medium, warm data are stored in a middle-layer storage medium and/or cold data are stored in a bottom-layer storage medium; the top storage medium, the middle storage medium and the bottom storage medium are different from each other, and the data access speed is decreased gradually.
In one possible implementation manner, the method further includes: and verifying the data before and/or after the data is transferred and stored to ensure the integrity of the data. In order to ensure the reliability of the data migration operation, automatic or manual testing needs to be performed on the integrity reliability of the data and other operations from the service and the data before and after the operation, and meanwhile, verification and comparison of the integrity of the file are assisted, such as verification of MD5 of the file.
Referring to FIG. 4, for a hot data cluster, the data operations within include inserting the most recent data determined to be hot, and also deleting the "stale" old data. For a warm data cluster, data in the warm data cluster can be periodically transferred and stored by overtime data and/or data exceeding a capacity threshold value through a timing task, the data is transferred and stored into a cold data cluster after the data stored for 14 days is verified, the data in the middle storage medium is verified and transferred into the cold data cluster when the data storage capacity of the middle storage medium exceeds 80%, the corresponding warm data can be deleted after the data in the middle storage medium is transferred and stored into the cold data, and correspondingly, the corresponding hot data is deleted after the hot data in the top storage medium is transferred and stored. Or the warm data stored for more than 14 days is directly deleted. And compressing and archiving the cold data in the cold data cluster when the storage time of the cold data in the cold data cluster exceeds 3 years.
The real-time above embodiment has the following characteristics:
firstly, monitoring the attribute of stored data in a storage medium in real time, wherein the attribute of the stored data comprises storage time and a data type, so as to redetermine the data type of the stored data according to preset data classification logic and the storage time and the data type of the stored data, then determining a scheduling strategy of the stored data according to preset storage scheduling logic and the redetermined data type, wherein the scheduling strategy comprises the storage medium corresponding to the stored data, and finally, performing storage scheduling on the stored data according to the scheduling strategy. By monitoring the attribute change of the data in real time and scheduling and storing the data according to the storage time of the data and the corresponding data type of the data, the mobility of the data is improved, the data with high real-time performance is stored into a high-performance high-speed storage medium, the data with low real-time performance can be stored into a storage medium with low access speed, and the problem of contradiction among large storage capacity, high speed and low cost of system storage is solved.
Example two
Referring to fig. 9, a data storage scheduling apparatus includes:
the monitoring module 21 is used for monitoring the attribute of the data stored in the storage medium in real time; wherein the attribute of the storage data comprises storage time and data type.
And the type determining module 22 is configured to re-determine the data type of the stored data according to preset data classification logic and the attribute of the stored data.
A result output module 23, configured to determine a scheduling policy of the stored data according to a preset storage scheduling logic and the re-determined data type; the scheduling policy comprises a storage medium corresponding to the storage data.
And the processing module 24 is configured to perform storage scheduling on the storage data according to the scheduling policy.
The embodiment of the invention has the following characteristics:
firstly, monitoring the attribute of stored data in a storage medium in real time, wherein the attribute of the stored data comprises storage time and a data type, so as to redetermine the data type of the stored data according to preset data classification logic and the storage time and the data type of the stored data, then determining a scheduling strategy of the stored data according to preset storage scheduling logic and the redetermined data type, wherein the scheduling strategy comprises the storage medium corresponding to the stored data, and finally, performing storage scheduling on the stored data according to the scheduling strategy. By monitoring the attribute change of the data in real time and scheduling and storing the data according to the storage time of the data and the corresponding data type of the data, the mobility of the data is improved, the data with high real-time performance is stored into a high-performance high-speed storage medium, the data with low real-time performance can be stored into a storage medium with low access speed, and the problem of contradiction among large storage capacity, high speed and low cost of system storage is solved.
Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage device, and the storage device may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage device such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, so that when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.
The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.
The present invention is described in connection with the accompanying drawings, but the present invention is not limited to the above embodiments, which are only illustrative and not restrictive, and those skilled in the art can make various changes without departing from the spirit and scope of the invention as defined by the appended claims, and all changes that come within the meaning and range of equivalency of the specification and drawings that are obvious from the description and the attached claims are intended to be embraced therein.

Claims (10)

1. A data storage scheduling method, comprising:
monitoring the attribute of the data stored in the storage medium in real time; wherein the attribute of the storage data comprises storage time and data type;
re-determining the data type of the stored data according to preset data classification logic and the attribute of the stored data;
determining a scheduling strategy of the stored data according to a preset storage scheduling logic and the redetermined data type; the scheduling policy comprises a storage medium corresponding to the storage data;
and performing storage scheduling on the storage data according to the scheduling strategy.
2. The method of claim 1, wherein said re-determining the type of the stored data based on preset data classification logic and attributes of the stored data comprises:
acquiring a data type corresponding to the storage data according to the storage equipment;
acquiring preset data classification logic corresponding to the data type;
and re-determining the type of the stored data according to the preset data classification logic and the storage time of the stored data.
3. The method of claim 2, further comprising:
calculating a data heat value according to the behavior time of the stored data;
detecting that the heat value of the data is within a preset first time range, classifying the data into heat data and storing the heat data into a corresponding storage medium;
detecting that the heat value of the data is within a preset second time range, classifying the data into warm data and storing the warm data into a corresponding storage medium;
detecting that the heat value of the data is within a preset third time range, classifying the data into cold data and storing the cold data into a corresponding storage medium; and the time dimensions of the preset first time range, the preset second time range and the preset third time range are gradually increased.
4. The method of claim 3, wherein the obtaining the preset data classification logic corresponding to the data type comprises:
when the data type is thermal data, the corresponding preset data classification logic comprises a preset fourth time range;
when the data type is warm data, the corresponding preset data classification logic comprises a preset fifth time range;
when the data type is cold data, the corresponding preset data classification logic comprises a preset sixth time range; wherein the first time range, the second time range, and the third time range have different time dimensions.
5. The method of claim 4, wherein said re-determining the data type of the stored data according to the preset data classification logic and the attributes of the stored data comprises:
when the storage time of the hot data is within the fourth time range, determining that the hot data is warm data or data to be deleted;
when the storage time of the warm data is within the fifth time range, determining that the warm data is cold data or data to be deleted;
and when the storage time of the cold data is within the sixth time range, determining that the cold data is to-be-deleted data or to-be-compressed archived data.
6. The method of claim 4 or 5, wherein the attributes of the stored data include further including a data traffic type;
dividing the preset first time range, the preset second time range, the preset third time range, the preset fourth time range, the preset fifth time range and the preset sixth time range according to the data service type.
7. The method of claim 5, wherein outputting the scheduling policy of the storage data according to a preset storage scheduling logic and the type of the storage data comprises:
when the thermal data is determined to be warm data again, outputting a scheduling strategy of the thermal data to be transferred to a warm data storage medium according to preset storage scheduling logic;
when the hot data is determined to be the data to be deleted again, outputting a scheduling strategy of the hot data to be deleted according to preset storage scheduling logic;
when the warm data is determined to be cold data again, outputting a scheduling strategy of the warm data to be transferred to a cold data storage medium according to preset storage scheduling logic;
when the warm data is determined to be the data to be deleted again, outputting a scheduling strategy of the warm data to be deleted according to preset storage scheduling logic;
when the cold data is determined to be the data to be deleted again, outputting a scheduling strategy of the cold data to be deleted according to preset storage scheduling logic;
and when the cold data is redetermined as the to-be-compressed archived data, outputting the scheduling strategy of the cold data as the to-be-compressed archived according to preset storage scheduling logic.
8. The method of claim 7, wherein said scheduling storage of said stored data according to said scheduling policy comprises:
responding to a data storage request, and receiving a transfer storage strategy in the scheduling strategies;
the data to be stored are transferred and stored according to a transfer storage strategy, so that hot data are stored in a top-layer storage medium, warm data are stored in a middle-layer storage medium and/or cold data are stored in a bottom-layer storage medium; the top storage medium, the middle storage medium and the bottom storage medium are different from each other, and the data access speed is decreased gradually.
9. The method of claim 8, wherein said scheduling storage of said stored data according to said scheduling policy further comprises:
and verifying the data before and/or after the data is transferred and stored to ensure the integrity of the data.
10. A data storage scheduling apparatus, comprising:
the monitoring module is used for monitoring the attribute of the data stored in the storage medium in real time; wherein the attribute of the storage data comprises storage time and data type;
the type determining module is used for re-determining the data type of the stored data according to preset data classification logic and the attribute of the stored data;
the result output module is used for determining the scheduling strategy of the stored data according to the preset storage scheduling logic and the redetermined data type; the scheduling policy comprises a storage medium corresponding to the storage data;
and the processing module is used for carrying out storage scheduling on the storage data according to the scheduling strategy.
CN201910965867.3A 2019-10-10 2019-10-10 Data storage scheduling method and device Pending CN110727406A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910965867.3A CN110727406A (en) 2019-10-10 2019-10-10 Data storage scheduling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910965867.3A CN110727406A (en) 2019-10-10 2019-10-10 Data storage scheduling method and device

Publications (1)

Publication Number Publication Date
CN110727406A true CN110727406A (en) 2020-01-24

Family

ID=69220977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910965867.3A Pending CN110727406A (en) 2019-10-10 2019-10-10 Data storage scheduling method and device

Country Status (1)

Country Link
CN (1) CN110727406A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112711386A (en) * 2021-01-18 2021-04-27 深圳市龙信信息技术有限公司 Storage capacity detection method and device of storage device and readable storage medium
CN112732726A (en) * 2021-04-02 2021-04-30 云和恩墨(北京)信息技术有限公司 Data processing method and device, processor and computer storage medium
CN113900597A (en) * 2021-11-30 2022-01-07 深圳市安信达存储技术有限公司 Data storage method, system, equipment and storage medium
CN114201119A (en) * 2022-02-17 2022-03-18 天津市天河计算机技术有限公司 Hierarchical storage system and method for super computer operation data
WO2022088983A1 (en) * 2020-10-30 2022-05-05 华为技术有限公司 Data management method and apparatus
CN115134239A (en) * 2022-08-31 2022-09-30 广州市千钧网络科技有限公司 Client configuration method, system, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140089257A1 (en) * 2012-09-24 2014-03-27 International Business Machines Corporation Increased database performance via migration of data to faster storage
US20160224600A1 (en) * 2015-01-30 2016-08-04 Splunk Inc. Systems And Methods For Managing Allocation Of Machine Data Storage
US20160274819A1 (en) * 2015-03-16 2016-09-22 Samsung Electronics Co., Ltd. Memory system including host and a plurality of storage devices and data migration method thereof
CN106528002A (en) * 2016-12-06 2017-03-22 郑州云海信息技术有限公司 Time-based storage scheduling method
CN107193500A (en) * 2017-05-26 2017-09-22 郑州云海信息技术有限公司 A kind of distributed file system Bedding storage method and system
CN108563730A (en) * 2018-04-04 2018-09-21 北京蓝杞数据科技有限公司天津分公司 A kind of cold and hot data automatic switching method, device, electronic equipment and storage medium
KR20190061426A (en) * 2017-11-28 2019-06-05 성균관대학교산학협력단 Flash memory system and control method thereof
CN109919193A (en) * 2019-01-31 2019-06-21 中国科学院上海光学精密机械研究所 A kind of intelligent stage division, system and the terminal of big data
CN110134723A (en) * 2019-05-22 2019-08-16 网易(杭州)网络有限公司 A kind of method and database of storing data

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140089257A1 (en) * 2012-09-24 2014-03-27 International Business Machines Corporation Increased database performance via migration of data to faster storage
US20160224600A1 (en) * 2015-01-30 2016-08-04 Splunk Inc. Systems And Methods For Managing Allocation Of Machine Data Storage
US20160274819A1 (en) * 2015-03-16 2016-09-22 Samsung Electronics Co., Ltd. Memory system including host and a plurality of storage devices and data migration method thereof
CN106528002A (en) * 2016-12-06 2017-03-22 郑州云海信息技术有限公司 Time-based storage scheduling method
CN107193500A (en) * 2017-05-26 2017-09-22 郑州云海信息技术有限公司 A kind of distributed file system Bedding storage method and system
KR20190061426A (en) * 2017-11-28 2019-06-05 성균관대학교산학협력단 Flash memory system and control method thereof
CN108563730A (en) * 2018-04-04 2018-09-21 北京蓝杞数据科技有限公司天津分公司 A kind of cold and hot data automatic switching method, device, electronic equipment and storage medium
CN109919193A (en) * 2019-01-31 2019-06-21 中国科学院上海光学精密机械研究所 A kind of intelligent stage division, system and the terminal of big data
CN110134723A (en) * 2019-05-22 2019-08-16 网易(杭州)网络有限公司 A kind of method and database of storing data

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022088983A1 (en) * 2020-10-30 2022-05-05 华为技术有限公司 Data management method and apparatus
CN112711386A (en) * 2021-01-18 2021-04-27 深圳市龙信信息技术有限公司 Storage capacity detection method and device of storage device and readable storage medium
CN112711386B (en) * 2021-01-18 2021-07-16 深圳市龙信信息技术有限公司 Storage capacity detection method and device of storage device and readable storage medium
CN112732726A (en) * 2021-04-02 2021-04-30 云和恩墨(北京)信息技术有限公司 Data processing method and device, processor and computer storage medium
CN112732726B (en) * 2021-04-02 2022-04-29 云和恩墨(北京)信息技术有限公司 Data processing method and device, processor and computer storage medium
CN113900597A (en) * 2021-11-30 2022-01-07 深圳市安信达存储技术有限公司 Data storage method, system, equipment and storage medium
CN114201119A (en) * 2022-02-17 2022-03-18 天津市天河计算机技术有限公司 Hierarchical storage system and method for super computer operation data
CN115134239A (en) * 2022-08-31 2022-09-30 广州市千钧网络科技有限公司 Client configuration method, system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110727406A (en) Data storage scheduling method and device
AU2017202873B2 (en) Efficient query processing using histograms in a columnar database
US9367574B2 (en) Efficient query processing in columnar databases using bloom filters
CN104040481B (en) Method and system for merging, storing and retrieving incremental backup data
US8799238B2 (en) Data deduplication
US10061834B1 (en) Incremental out-of-place updates for datasets in data stores
CN111352925A (en) Policy driven data placement and information lifecycle management
Agrawal et al. Low-latency analytics on colossal data streams with summarystore
US8311982B2 (en) Storing update data using a processing pipeline
CN104239377A (en) Platform-crossing data retrieval method and device
US11422721B2 (en) Data storage scheme switching in a distributed data storage system
US9734171B2 (en) Intelligent redistribution of data in a database
US10095738B1 (en) Dynamic assignment of logical partitions according to query predicate evaluations
CN109947730A (en) Metadata restoration methods, device, distributed file system and readable storage medium storing program for executing
CN110019017B (en) High-energy physical file storage method based on access characteristics
CN107430633B (en) System and method for data storage and computer readable medium
EP3550451A1 (en) Data storage and maintenance method and device, and computer storage medium
Lu et al. Research on Cassandra data compaction strategies for time-series data
CN115437997A (en) Intelligent identification optimization system for data life cycle
CN114297196A (en) Metadata storage method and device, electronic equipment and storage medium
US10540329B2 (en) Dynamic data protection and distribution responsive to external information sources
Zhang et al. Research and optimization of meteorological big data storage technology
CN110196785A (en) Backup data management method, apparatus and electronic equipment
CN113312414B (en) Data processing method, device, equipment and storage medium
Gao et al. Research on ILM Model based on FV Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination