CN115167762A - Data hierarchical storage method and device - Google Patents

Data hierarchical storage method and device Download PDF

Info

Publication number
CN115167762A
CN115167762A CN202210709184.3A CN202210709184A CN115167762A CN 115167762 A CN115167762 A CN 115167762A CN 202210709184 A CN202210709184 A CN 202210709184A CN 115167762 A CN115167762 A CN 115167762A
Authority
CN
China
Prior art keywords
data
cold
hot
time parameter
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210709184.3A
Other languages
Chinese (zh)
Inventor
戴志勇
杨世泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dingtalk China Information Technology Co Ltd
Original Assignee
Dingtalk China Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dingtalk China Information Technology Co Ltd filed Critical Dingtalk China Information Technology Co Ltd
Priority to CN202210709184.3A priority Critical patent/CN115167762A/en
Publication of CN115167762A publication Critical patent/CN115167762A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • G06F3/0649Lifecycle management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/068Hybrid storage device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

One or more embodiments of the present specification provide a data hierarchical storage method and apparatus, including: acquiring data written by a user, and adding a corresponding time attribute in the data according to data acquisition time to form a time parameter of the data; storing the received data with the time parameter; when the data storage quantity meets the set condition, scanning the data with the time parameter, dividing the corresponding data into cold data and hot data according to the time parameter, and writing the cold data or the hot data into the corresponding medium.

Description

Data hierarchical storage method and device
Technical Field
One or more embodiments of the present disclosure relate to the field of data storage, and in particular, to a method and an apparatus for hierarchical data storage.
Background
With the development of information technology, electronic data has recently shown a growth in the wellbore category, which in turn has driven a large demand for storage. Meanwhile, a large amount of data is stored under the same condition, and a large amount of storage space is required to be provided by storage equipment, so that the storage cost is increased, and therefore, researchers provide a method for storing data in a layered mode. In the data hierarchical storage method, data frequently accessed by a user or generated recently is generally defined as hot data and stored in a storage medium capable of retrieving data more quickly, while data infrequently accessed by other users or generated in a long time is defined as cold data and stored in a low-cost storage medium. By taking the communication message as an example, the history message generated in the communication process has obvious cold and hot properties, and the cost for storing data can be reduced by storing the history message in a layering manner by the method. However, in the existing data hot and cold hierarchical storage method, a way of adding a hot and cold tag to data is generally adopted, and data is hierarchically stored according to the hot and cold tag, so that the data hierarchy is relatively fixed.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure provide a data hierarchical storage method and apparatus.
To achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
acquiring data written by a user, and adding a corresponding time attribute in the data according to data acquisition time to form a time parameter of the data;
storing the received data with the time parameter;
when the data storage quantity meets the set condition, scanning the data with the time parameter, dividing the corresponding data into cold data and hot data according to the time parameter, and writing the cold data or the hot data into the corresponding medium.
According to a second aspect of one or more embodiments of the present specification, there is provided a data tiered storage device comprising:
the acquisition unit is used for acquiring data written by a user, and adding corresponding time attributes into the data according to data acquisition time to form time parameters of the data;
a storage unit for storing the received data with the time parameter;
and the scanning unit is used for scanning the data with the time parameter when the data storage capacity meets the set condition, dividing the corresponding data into cold data and hot data according to the time parameter, and writing the cold data or the hot data into the corresponding medium.
According to a third aspect of one or more embodiments of the present description, there is provided a computer readable storage medium, having stored thereon a computer program which, when executed by a processor, performs the steps of the method according to the first aspect.
According to a fourth aspect of one or more embodiments of the present description, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect when executing the program.
In the technical solution provided in this specification, while data written by a user is acquired, data acquisition time is recorded so that a corresponding time attribute is added to the data to form a time parameter as a basis for data layered storage, the corresponding data is divided into cold and hot data according to the time parameter, and the cold or hot data is written into a corresponding medium. By the method, the hierarchical storage strategy of the data can be determined according to the time parameters, the storage cost is reduced by hierarchically storing the data, meanwhile, the hierarchical storage strategy can be adaptively changed according to the use scene, and the flexibility and the application range of the data hierarchical storage method are improved.
Drawings
FIG. 1 is a block diagram illustrating an architecture of a data tiered storage system provided in an exemplary embodiment of the present specification;
FIG. 2 is a schematic flow chart diagram illustrating a data hierarchical storage method according to an exemplary embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a data classification compression provided by an exemplary embodiment of the present specification;
fig. 4 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure;
fig. 5 is a schematic diagram of a data hierarchical storage device according to an exemplary embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the methods may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may, in other embodiments, be combined into a single step.
With the development of information technology, electronic data has recently exhibited an increase in the well-injection type, which in turn has driven a large demand for storage. Meanwhile, storing a large amount of data under the same condition requires a large amount of storage space provided by computing equipment, which increases the storage cost, and therefore, researchers propose a method for storing data in a hierarchical manner. In the hierarchical storage method, data frequently accessed by a user or generated recently is generally defined as hot data and stored in a storage medium which is convenient and quick to retrieve, and data which is not frequently accessed by other users or is long-lasting is defined as cold data and stored in a low-cost storage medium. By taking the communication message as an example, the historical message generated in the communication process has obvious cold and hot properties, and the cost for storing data can be reduced by hierarchically storing the historical message by the method. However, in the existing data hot and cold hierarchical storage method, a way of adding a hot and cold tag to data is generally adopted, and data is hierarchically stored according to the hot and cold tag, so that the data hierarchy is relatively fixed.
For example, in the storage process of message data in communication message software, the message volume will be larger and larger as time is accumulated. A portion of infrequently accessed messages may be transferred to other lower cost storage media, such as cloud storage, which may reduce storage costs. Results from frequency of access studies on historical data in messaging messages show that users have very significant cold and hot attributes for accessing historical information data, i.e., users will access recent messages at high frequency, data several months ago at low frequency, and data that is accessed at much lower frequency for longer periods of time, e.g., data a year ago. Placing the message in different storage media according to the time of generation is a good choice. For example, a tag of hot data may be added to a newer message and stored on an SSD (Solid State Drive) storage medium, a tag of cold data may be added to an older message and stored on an HDD (Hard Disk Drive) storage medium, and the older message may be stored in a cloud storage space. The storage costs of different storage media are greatly different, for example, the storage cost of cloud storage is far lower than the SSD storage cost. Therefore, the storage cost of the communication message can be obviously reduced by storing the long-term historical message in a storage medium with lower storage cost, such as a cloud storage space. Although the storage cost is reduced, for different application scenarios, data is hierarchically stored by adding a hot tag and a cold tag to the data, which results in that the hierarchy of the data is relatively fixed. When the application scenes are different, the media corresponding to the constant types of the labels are also relatively fixed, so that the layering rules cannot be adjusted in time according to different application scenes. The data is added with a label of cold data or hot data according to a fixed preset hierarchical storage rule and is stored hierarchically according to a preset storage medium, so that the hierarchical storage of the data is not flexible enough.
The present specification provides a data storage hierarchical storage method, which determines a time attribute of data according to a time written by a user, adds the time attribute into the data as a time parameter of the data, and hierarchically stores the data according to the time parameter. By the method, the label representing the cold and hot attributes of the data is determined to be the data hierarchy, but the time parameter of the data is not a fixed label, and the same time parameter may represent different categories of data in different time periods, for example, data with the time of 1 month and 1 day is acquired, and the data belongs to hot data at 1 month and 7 days, so that the data may be changed into cold data at 1 month and 20 days according to the data hierarchy storage rule, but the time parameter of the data may not be changed. The layered storage rule can be more flexible by determining the cold and hot layered storage of the data according to the time parameter, the storage cost can be reduced by storing the data in a layered manner, and meanwhile, the storage rule can be changed according to the use scene, so that the flexibility and the application range of the data layered storage method are improved.
Fig. 1 is a schematic architecture diagram of a data hierarchical storage system shown in the present specification. As shown in fig. 1, may include a server 11, a network 12, a terminal device 13, a terminal device 14, and a terminal device 15.
The server 11 may be a physical server comprising a separate host, or the server 11 may be a virtual server carried by a cluster of hosts. In the operation process, the server 11 may be configured with a data hierarchical storage device, which may be implemented in a software and/or hardware manner to provide a function of acquiring data written by a user and adding a time attribute to the data for storage, and when the stored data meets a set condition, scan the data, perform cold and hot differentiation on the data according to a time parameter thereof, and write classified cold data or hot data into a corresponding medium for hierarchical storage.
The terminal device 13, the terminal device 14, and the terminal device 15 refer to electronic devices that can be used by a user for writing data, for example, the terminal device 13 is a mobile phone, the terminal device 14 is a notebook computer, and the terminal device 15 is a desktop computer. Indeed, it is obvious that the user may also use electronic devices of the type such as: tablet devices, smart watches, personal Digital Assistants (PDAs), wearable devices (e.g., smart glasses, VR glasses, etc.), etc., which are not limited by one or more embodiments of the present disclosure.
While the network 12 for interaction between the server 11 and the terminal device 13, the terminal device 14, or the terminal device 15 may include various types of wired or wireless networks, which is not limited by one or more embodiments of the present disclosure.
A data hierarchical storage method provided in this specification is described below with reference to fig. 2. Fig. 2 is a schematic flowchart of a data hierarchical storage method according to an exemplary embodiment. As shown in fig. 2, the method may include the steps of:
s201, acquiring data written by a user, and adding corresponding time attributes to the data according to data acquisition time to form time parameters of the data.
In an exemplary embodiment of the present specification, the time attribute is related to the data acquisition time, for example, the data acquisition time may be written as a parameter into a corresponding time parameter in units of days, the data acquired on a certain day of a certain month of a year may be written as a time parameter into the data, and when performing subsequent data cold-hot classification, it is determined whether the data is cold data or hot data according to a time difference between the data acquisition time and a system time at the time of classification.
S202, storing the received data with the time parameter.
In an exemplary embodiment of the present specification, the step S202 may specifically include the following steps:
receiving data written by a user and caching the data into a memory table (MemTable);
and writing the data in the memory table into a physical file under the condition that the data amount in the memory table reaches a preset memory threshold value.
In the above embodiment, the data written by the user is received and cached in the memory table first, so that the data to which the time attribute is added is prevented from being lost, and then the data in the memory table is written into the physical file when the data amount in the memory table reaches the preset threshold value. The data cached in the memory table is not divided into cold and hot attributes, and is firstly persisted to avoid the loss of the data.
S203, when the data storage quantity meets the set condition, scanning the data with the time parameter, dividing the corresponding data into cold data and hot data according to the time parameter, and writing the cold data or the hot data into the corresponding medium.
In an exemplary embodiment of the present specification, to avoid resource waste, when the data storage amount meets a set condition, the data with the time parameter is scanned, for example, when the number of physical files reaches a preset threshold, the data in the physical files may be classified.
In the physical file generated through the above steps, the hot and cold attributes of the data may be different, and therefore, in order to perform hierarchical storage of the data, the data having the time attribute in the physical file may be scanned, and the data having the time parameter may be scanned, and the corresponding data may be classified into hot and cold data according to the time parameter, and the cold or hot data may be written into the corresponding medium.
Taking the physical file in fig. 3 as an example, assuming that the preset threshold is 2, when the number of the physical files generated from the persistent data in the memory table reaches 2, scanning the data with the time parameter stored in the physical file is started, and distinguishing cold data from hot data.
In an exemplary embodiment of the present specification, the above distinguishing the corresponding data into the hot and cold data according to the time parameter may be implemented as follows:
and according to a preset data hierarchical storage rule, dividing corresponding data into cold and hot data according to the time parameter of the data, wherein the data hierarchical storage rule specifies one or more of the cold and hot attributes of the data, the corresponding relation between the cold and hot attributes of the data and the time parameter, and the corresponding relation between the cold and hot attributes of the data and a medium.
Or in another exemplary embodiment of the present specification, the data hierarchical storage rule may also be a user-defined data hierarchical storage rule. The method comprises the steps of receiving a classification rule configuration request initiated by a user, and obtaining a user-defined data hierarchical storage rule, wherein the data hierarchical storage rule specifies one or more of a cold and hot property of data, a corresponding relation between the cold and hot property of the data and a time parameter, and a corresponding relation between the cold and hot property of the data and a medium. And according to the data hierarchical storage rule customized by the user, dividing the corresponding data into cold data and hot data according to the time parameter of the data.
For example, assuming that the time parameter includes data acquisition time, when the corresponding data is classified into cold and hot data according to the time parameter of the data, the data is classified according to the time interval between the system time at the time of data classification and the data acquisition time, and the user-defined data hierarchical storage rule specifies: data are divided into three categories according to cold and hot attributes: hot data, warm data, cold data. Wherein the thermal data corresponds to data displayed by the time parameter at a time interval of 7 days; data from 7 to 30 days are warm data, and data from 30 days onwards are cold data. Certainly, the user can define more data hot and cold attributes, and adjust the time parameter corresponding to each data hot and cold attribute, so that the data is hierarchically stored according to the classification rule defined by the user, and the purpose of flexibly hierarchically storing the data is achieved. For example, the user may further define four types of data cold and hot attributes, for example, the data cold and hot attribute-cold data described above is further divided, wherein the cold and hot attributes of data from 30 days to 365 days are defined as cold data, and data exceeding 365 days are defined as super-cold data. In this regard, the present specification is not particularly limited. The types of the media corresponding to different data cold and hot attributes may be different, or may be partially or completely the same, for example, the cold data and the super-cold data may be written into the same storage medium, or may be written into different media, which is not limited in this specification.
For example, in an exemplary embodiment of the present specification, when corresponding data is classified into cold and hot data according to a time parameter, data belonging to the same cold and hot attributes of the data may be written in at least one file, wherein the cold and hot attributes of the data in each file are the same. And then storing the files into corresponding media according to the cold and hot properties of the data.
In an exemplary embodiment of the present specification, the user may also modify the existing data tiered storage rules by initiating a new classification rule configuration request.
In an exemplary embodiment of the present specification, the process of scanning the physical file may further include the steps of:
splitting the physical file, and writing data belonging to the same category into at least one file, wherein the data category in each file is the same; and storing the files hierarchically according to the storage media corresponding to the categories.
In an exemplary embodiment of the present specification, physical files may be classified and compressed into a plurality of compressed files according to preset data hierarchical storage rules or user-defined data hierarchical storage rules in a file compression manner. And the hot and cold attributes of the data stored in each compressed file are the same. As shown in fig. 3, it is assumed that data of three hot and cold attributes are defined in a data hierarchical storage rule preset by a user, and the user defines that hot data is stored in a storage medium 1, warm data is stored in a storage medium 2, and cold data is stored in a storage medium 3. According to the data hierarchical storage rule, hot data and temperature data are included in the physical file 1, and cold data and temperature data are included in the physical file 2. In an exemplary embodiment of the present specification, the physical file may be compressed by classification through a compression thread. For example, when the number of the physical files reaches a preset threshold, the compression thread may scan the physical files, and perform classified compression on the data according to the time parameters of the data in the physical files. According to a data layered storage rule customized by a user, data with different cold and hot attributes in a physical file are split into different files, the files are compressed to generate corresponding compressed files, and the compressed files are stored in corresponding storage media. The storage space of the data can be further saved through data compression. For example, as shown in fig. 3, after splitting a physical file according to a data hierarchical storage rule defined by a user, hot data, warm data, and cold data are compressed into three different compressed files, respectively. And the three compressed files are respectively stored in corresponding storage media.
In an exemplary embodiment of the present specification, the hot data or the cold data, which has been stored into the corresponding medium, may be newly distinguished according to a time parameter of the data. Due to the passage of time, the time parameters of the data that has been hierarchically stored into the respective corresponding media may no longer satisfy the time parameter specification of the medium in which it is located. In order to divide the data which no longer satisfies the data hierarchical storage rule of the corresponding medium into the corresponding medium for process storage, the cold data or the hot data written into the corresponding medium can be scanned, the cold data or the hot data written into the corresponding medium can be re-distinguished according to the time parameter, and the re-distinguished cold data or hot data can be written into the corresponding medium.
For example, in the three data types exemplified in the foregoing, the data in the time interval of 7 days is hot data, the data in 7 days to 30 days is warm data, and the data in more than 30 days is cold data, and the user defines that the hot data is stored in the storage medium 1, the warm data is stored in the storage medium 2, and the cold data is stored in the storage medium 3. The hot data stored in the storage medium 1 no longer satisfy the corresponding relationship between the hot data and the time parameter after 7 days, at this time, the data is rescanned, the data is again distinguished according to the time parameter of the data and a data layering storage rule customized by a user, the data which no longer satisfy the cold and hot properties of the data of the hot data is divided into the category of the temperature data, and the data is transferred and stored into the storage medium 2 corresponding to the temperature data.
In the process, a user does not need to pay attention to which storage medium the data is stored on, only needs to initiate an access request for the data, and after receiving the access request of the user, the data to which the access request points can be directly read from the corresponding storage medium and fed back to the user. Generally, hot data can be stored in a storage medium with a higher reading speed in practical applications because of its higher probability of being frequently accessed by a user, while cold data can be stored in a storage medium with a lower storage cost because of its lower probability of being accessed, but the storage medium with a lower storage cost generally has a lower reading speed.
In another exemplary embodiment of the present specification, the time parameter further records: the access frequency of the data, whether the data is cached in the memory table, the data in the physical file or the data which is stored in the corresponding medium hierarchically, can record the number of times the data is accessed and the access frequency of the data as a time attribute on the time parameter of the data, and is used as a reference condition for distinguishing cold data from hot data. Wherein a relatively hot data category corresponds to a relatively high access frequency. For example, assume that the user defines in a custom data classification storage rule: data are divided into three categories according to cold and hot attributes: hot data, warm data, cold data. The corresponding relation between the data cold and hot attributes and the time parameters is as follows: the data access frequency is more than 1 time per week, and the data acquisition time is within 7 days; data with a time interval between the data acquisition time and the system time when the data are distinguished being in 7 days to 30 days and data with a data access frequency being less than 1 time per week and a time interval being in 7 days are temperature data; data with time intervals of more than 30 days are cold data. In the data hierarchical storage rule, the access frequency of the data is added to the time parameter of the data as a time attribute, so that cold and hot data of the data can be more accurately distinguished, and the access efficiency of the data can be improved.
For ease of understanding, the present specification provides the following specific examples: assuming that the time attribute of the data comprises data acquisition time, the user-defined data hierarchical storage rule specifies: the user-defined data hierarchical storage rule stipulates that: data are divided into three categories according to cold and hot attributes: hot data, warm data, cold data. Wherein the thermal data corresponds to data displayed by the time parameter at a time interval of 7 days; data from 7 to 30 days are warm data, and data from more than 30 days are cold data. And the user definition stores hot data in the SSD, warm data in the HDD, and cold data in the OSS (Object Storage Service).
And acquiring data written by a user, and adding a corresponding time attribute in the data according to the data acquisition time to form a time parameter of the data. And caching the data with the time parameter into a memory table. And writing the data in the memory table into a physical file under the condition that the data amount in the memory table reaches a preset memory threshold value. Assuming that the data amount in the memory table reaches 1G, the data in the memory table is written into a physical file. When the number of the physical files reaches a preset threshold value 2, the compression thread traverses all data with time parameters stored in the two physical files, and performs classification compression.
It is assumed that the two physical files include data having three kinds of cold and hot attributes, wherein the physical file 1 includes cold data and warm data, and the physical file 2 includes cold data and hot data. After data to be classified stored in the physical files are classified and compressed through a compression thread, three compressed files are generated respectively, wherein the compressed file 1 corresponds to hot data, the compressed file 2 corresponds to warm data, and the compressed file 3 corresponds to cold data. Then, the compressed file 1 is stored in the storage medium SSD corresponding to the hot data, the compressed file 2 is stored in the storage medium HDD corresponding to the warm data, and the compressed file 3 is uploaded to the OSS for storage. Of course, in an exemplary embodiment of the present specification, the number of generated compressed files and the number of classifications do not necessarily correspond, and for example, when there is a large amount of hot data in a physical file, a plurality of compressed files classified as hot data may be generated. When the physical file has no cold data, correspondingly, a compressed file corresponding to the cold data does not need to be generated. However, the data hot and cold attributes stored in the same compressed file must be the same.
The hot data stored in the SSD is again scanned after 7 days, and the data is again distinguished according to the time attribute thereof and the data having the cold-hot property of the data that does not satisfy the hot data again according to the data hierarchical storage rule defined by the user, and the data having the cold-hot property of the data that does not satisfy the hot data is divided into the warm data and is transferred to and stored in the storage medium HDD corresponding to the warm data.
Fig. 4 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present specification. Referring to fig. 4, at the hardware level, the apparatus includes a processor 402, an internal bus 404, a network interface 406, a memory 408, and a non-volatile memory 410. Of course it is also possible to include hardware required for other functions. The processor 402 reads a corresponding computer program from the non-volatile memory 410 into the memory 408 and then runs the computer program, forming a data hierarchical storage device on a logical level. Of course, besides software implementation, the one or more embodiments in this specification do not exclude other implementations, such as logic devices or combinations of software and hardware, and so on, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.
Corresponding to the embodiment of the method, the present specification further provides a data hierarchical storage device, as shown in fig. 5, where the data hierarchical storage device may include:
an obtaining unit 510, configured to obtain data written by a user, and add a corresponding time attribute to the data according to data obtaining time to form a time parameter of the data;
a storage unit 520 for storing the received data having the time parameter;
the scanning unit 530 is configured to scan the data with the time parameter when the data storage amount satisfies a set condition, divide the corresponding data into cold data and hot data according to the time parameter, and write the cold data or the hot data into a corresponding medium.
Optionally, the storage unit 520 may be specifically configured to:
receiving data written by a user and caching the data into a memory table;
and writing the data in the memory table into a physical file under the condition that the data amount in the memory table reaches a preset memory threshold value.
Optionally, the scanning unit 530 may be specifically configured to:
and according to a preset data hierarchical storage rule, dividing corresponding data into cold and hot data according to the time parameter of the data, wherein the data hierarchical storage rule specifies one or more of the cold and hot attributes of the data, the corresponding relation between the cold and hot attributes of the data and the time parameter, and the corresponding relation between the cold and hot attributes of the data and a medium.
Optionally, the apparatus further comprises:
a receiving unit 540, configured to receive a classification rule configuration request initiated by a user, where the classification rule configuration request includes a data hierarchical storage rule defined by the user, and the data hierarchical storage rule specifies one or more of a hot and cold attribute of data, a correspondence between the hot and cold attribute of the data and a time parameter, and a correspondence between the hot and cold attribute of the data and a medium;
the scanning unit 530 may be specifically configured to:
and according to a data hierarchical storage rule customized by a user, dividing the corresponding data into cold and hot data according to the time parameter of the data.
The scanning unit 530 may be further specifically configured to:
writing data belonging to the same data cold and hot attributes into at least one file, wherein the cold and hot attributes of the data in each file are the same;
and storing the data into the corresponding medium according to the cold and hot properties of the data.
Optionally, the apparatus further comprises:
the rescanning unit 550 is configured to scan the cold data or the hot data written in the corresponding medium, re-distinguish the cold data or the hot data written in the corresponding medium according to the time parameter, and write the re-distinguished cold data or hot data in the corresponding medium.
Optionally, the time parameter further records: the frequency of access to the data.
The specific details of the implementation process of the functions and actions of each unit in the above device are the implementation processes of the corresponding steps in the above method, and are not described herein again.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
In one or more embodiments of the present specification, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments herein. The word "if," as used herein, may be interpreted as "at \8230; \8230when" or "when 8230; \823030when" or "in response to a determination," depending on the context.
The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.

Claims (10)

1. A data hierarchical storage method is characterized by comprising the following steps:
acquiring data written by a user, and adding a corresponding time attribute in the data according to data acquisition time to form a time parameter of the data;
storing the received data with the time parameter;
when the data storage quantity meets the set condition, scanning the data with the time parameter, dividing the corresponding data into cold data and hot data according to the time parameter, and writing the cold data or the hot data into the corresponding medium.
2. The method of claim 1, wherein the storing the received data with the time parameter comprises:
receiving data written by a user and caching the data into a memory table;
and writing the data in the memory table into a physical file under the condition that the data volume in the memory table reaches a preset memory threshold value.
3. The method of claim 1, wherein said distinguishing the corresponding data into cold and hot data according to the time parameter thereof comprises:
and according to a preset data hierarchical storage rule, dividing the corresponding data into cold and hot data according to the time parameter of the data, wherein the data hierarchical storage rule specifies one or more of the cold and hot attributes of the data, the corresponding relation between the cold and hot attributes of the data and the time parameter, and the corresponding relation between the cold and hot attributes of the data and the medium.
4. The method of claim 1, further comprising:
receiving a classification rule configuration request initiated by a user, wherein the classification rule configuration request comprises a data hierarchical storage rule customized by the user, and the data hierarchical storage rule specifies one or more of a hot and cold attribute of data, a corresponding relation between the hot and cold attribute of the data and a time parameter, and a corresponding relation between the hot and cold attribute of the data and a medium;
the dividing of the corresponding data into cold and hot data according to the time parameter thereof includes:
and according to a data hierarchical storage rule defined by a user, dividing the corresponding data into cold data and hot data according to the time parameter of the data.
5. The method of claim 1, wherein said distinguishing the corresponding data into cold and hot data according to the time parameter thereof comprises:
writing data belonging to the same data cold and hot attributes into at least one file, wherein the cold and hot attributes of the data in each file are the same;
and storing the data into the corresponding medium according to the cold and hot properties of the data.
6. The method of claim 1, further comprising:
and scanning the cold data or the hot data written into the corresponding medium, re-distinguishing the cold data or the hot data written into the corresponding medium according to the time parameter, and writing the re-distinguished cold data or hot data into the corresponding medium.
7. The method of claim 1, wherein the time parameter further records: the frequency of access to the data.
8. A hierarchical data storage device, comprising:
the acquisition unit is used for acquiring data written by a user, and adding corresponding time attributes into the data according to data acquisition time to form time parameters of the data;
a storage unit for storing the received data with the time parameter;
and the scanning unit is used for scanning the data with the time parameter when the data storage capacity meets the set condition, distinguishing the corresponding data into cold data and hot data according to the time parameter, and writing the cold data or the hot data into the corresponding medium.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-7 when executing the program.
CN202210709184.3A 2022-06-21 2022-06-21 Data hierarchical storage method and device Pending CN115167762A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210709184.3A CN115167762A (en) 2022-06-21 2022-06-21 Data hierarchical storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210709184.3A CN115167762A (en) 2022-06-21 2022-06-21 Data hierarchical storage method and device

Publications (1)

Publication Number Publication Date
CN115167762A true CN115167762A (en) 2022-10-11

Family

ID=83487192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210709184.3A Pending CN115167762A (en) 2022-06-21 2022-06-21 Data hierarchical storage method and device

Country Status (1)

Country Link
CN (1) CN115167762A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472967A (en) * 2023-12-28 2024-01-30 江西铜锐信息技术有限公司 Data life cycle management method and system based on data use heat

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472967A (en) * 2023-12-28 2024-01-30 江西铜锐信息技术有限公司 Data life cycle management method and system based on data use heat
CN117472967B (en) * 2023-12-28 2024-05-03 江西铜锐信息技术有限公司 Data life cycle management method and system based on data use heat

Similar Documents

Publication Publication Date Title
US10671290B2 (en) Control of storage of data in a hybrid storage system
US11372568B2 (en) System and method for storing and accessing blockchain data
CN107168651B (en) Small file aggregation storage processing method
CN109885577A (en) Data processing method, device, terminal and storage medium
CN106708912B (en) Junk file identification and management method, identification device, management device and terminal
CN110858210B (en) Data query method and device
CN111427885B (en) Database management method and device based on lookup table
CN115167762A (en) Data hierarchical storage method and device
CN114816240A (en) Data writing method and data reading method
CN114647658A (en) Data retrieval method, device, equipment and machine-readable storage medium
CN111475099A (en) Data storage method, device and equipment
CN112035524B (en) List data query method, device, computer equipment and readable storage medium
CN110716940B (en) Incremental data access system
CN112597151A (en) Data processing method, device, equipment and storage medium
CN115079957B (en) Request processing method, device, controller, equipment and storage medium
CN108536759B (en) Sample playback data access method and device
CN116185305A (en) Service data storage method, device, computer equipment and storage medium
CN116303278A (en) File merging method, file reading method, device, equipment and storage medium
CN114691612A (en) Data writing method and device and data reading method and device
CN110837338A (en) Storage index processing method and device
CN112307272B (en) Method, device, computing equipment and storage medium for determining relation information between objects
CN113760854A (en) Method for identifying data in HDFS memory and related equipment
CN113051105A (en) Data processing method, device, equipment and storage medium
CN118567577B (en) Data access method and device based on distributed block storage and electronic equipment
CN113296970B (en) Message processing and message queue management method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination