CN116643704A - Storage management method, storage management device, electronic equipment and storage medium - Google Patents

Storage management method, storage management device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116643704A
CN116643704A CN202310722926.0A CN202310722926A CN116643704A CN 116643704 A CN116643704 A CN 116643704A CN 202310722926 A CN202310722926 A CN 202310722926A CN 116643704 A CN116643704 A CN 116643704A
Authority
CN
China
Prior art keywords
data
storage
operation data
layer
hot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310722926.0A
Other languages
Chinese (zh)
Inventor
陈仲涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202310722926.0A priority Critical patent/CN116643704A/en
Publication of CN116643704A publication Critical patent/CN116643704A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • G06F3/0622Securing storage systems in relation to access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a storage management method, a storage management device, electronic equipment and a storage medium, and relates to the technical field of storage. According to the method, the data heat types of the operation data are identified according to the data attribute of the operation data of the virtual machine, and the data heat types are matched with the current storage area of the operation data to determine whether the hot data are stored in the hot data layer or not and whether the cold data are stored in the cold data layer or not, so that the operation data can be stored and managed according to a matching result, such as migration of unmatched data, so that data of different data heat types can be stored in corresponding storage areas, and therefore, some hot data are stored in the hot data layer in most cases, and are not stored in the cold data layer for a long time, and the overall performance of a storage system can be effectively improved.

Description

Storage management method, storage management device, electronic equipment and storage medium
Technical Field
The present application relates to the field of storage technologies, and in particular, to a storage management method, a storage management device, an electronic device, and a storage medium.
Background
With the advent of the big data age, the data volume is increasing, which drives a great deal of demand for storage, and the storage demand cannot be met by the traditional single storage medium. Therefore, a hierarchical storage technology is proposed, which generally defines data that is frequently accessed or recently generated by a user as hot data, other data as cold data, the hot data is stored in a storage medium that can be read quickly, and the cold data is stored in a storage medium that is read slowly, so that the cost of storing the data can be reduced.
In the current method, data is generally stored in a storage medium corresponding to hot data, and after a period of time, the data is migrated to a storage medium corresponding to cold data for storage, so that the original hot data is migrated to the storage medium corresponding to cold data, and the access to the hot data is changed to frequently read the data in the storage medium corresponding to cold data, thereby reducing the overall performance of the storage system.
Disclosure of Invention
An embodiment of the application aims to provide a storage management method, a storage management device, electronic equipment and a storage medium, which are used for solving the problem that the overall performance of a storage system is low due to the existing storage mode.
In a first aspect, an embodiment of the present application provides a storage management method, where the method includes:
acquiring a current storage area of operation data of a stored virtual machine, wherein the storage area comprises a hot data layer and a cold data layer, and the operation data is preferentially stored in the hot data layer during initial storage;
identifying a data heat type of the operation data according to the data attribute of the operation data, wherein the data heat type comprises hot data and cold data;
matching the data heat type with the current storage area to obtain a matching result;
And carrying out storage management on the operation data according to the matching result.
In the implementation process, the data heat types of the operation data are identified according to the data attribute of the operation data of the virtual machine, and the data heat types are matched with the current storage area of the operation data to determine whether the hot data are stored in the hot data layer or not and whether the cold data are stored in the cold data layer or not, so that the operation data can be stored and managed according to the matching result, such as migration of unmatched data, so that the data with different data heat types can be stored in the corresponding storage areas, and therefore, some hot data are in the hot data layer in most cases, but not in the cold data layer for a long time, and the overall performance of the storage system can be effectively improved.
Optionally, the data attribute includes at least one of a service to which the data belongs, a data access frequency, a data access time, and a data source;
identifying the data heat type of the operation data according to the data attribute of the operation data, wherein the data heat type comprises at least one of the following judgment modes:
if the service to which the operation data belongs is a target service, determining that the data heat type of the operation data is heat data;
If the data access frequency of the operation data is greater than the set frequency, determining that the data heat type of the operation data is heat data;
if the data access time of the operation data is a set time period, determining that the data heat type of the operation data is heat data;
and if the data source of the operation data is a set source, determining that the data heat type of the operation data is heat data.
In the implementation process, the hot data and the cold data can be effectively identified by identifying the data heat type of the operation data based on the data attribute, so that the accurate identification of the hot data is realized.
Optionally, if the data attribute includes a data access frequency, the data access frequency is obtained by:
and monitoring the data access frequency of the operation data through the setting process operated by the virtual machine.
In the implementation process, the data access frequency of the data is monitored through the set process in the virtual machine, so that the monitoring can be realized through the existing process without adopting extra resources, and the reasonable utilization of the resources is realized.
Optionally, if the data attribute includes at least two of a service to which the data belongs, a data access frequency, a data access time and a data source, in the thermal data layer, the thermal data determined according to each attribute is allocated with a corresponding storage proportion;
If the data heat type of the operation data is determined to be heat data according to the target data attribute and the matching result is that the data heat type is not matched with the current storage area, performing storage management on the operation data according to the matching result, including:
and migrating the operation data to the thermal data layer according to the storage proportion of the thermal data in the thermal data layer, which is determined by the target data attribute. This ensures that different types of thermal data can be stored in the thermal data layer in the corresponding storage proportions.
Optionally, the acquiring the current storage area of the stored running data of the virtual machine includes:
acquiring a storage position of the operation data in a cloud hard disk;
and determining the current storage area of the operation data in the storage system according to the storage position of the cloud hard disk.
Optionally, the acquiring the storage location of the operation data in the cloud hard disk includes:
acquiring a mapping relation between data and a cloud hard disk storage position through a file system;
and determining the storage position of the operation data in the cloud hard disk through the file system according to the mapping relation.
In the implementation process, the mapping relation between the data and the cloud hard disk storage position can be obtained rapidly through the file system.
Optionally, the acquiring the storage location of the operation data in the cloud hard disk includes:
creating a new access service in a host machine corresponding to the virtual machine, wherein the new access service is used for detecting the storage position of data in a cloud hard disk;
receiving a read request for the operation data through a file system;
converting the read request into an IO request for the cloud hard disk through the file system, and sending the IO request to the new access service;
and acquiring a storage position of the cloud hard disk corresponding to the IO request through the new access service, and determining the storage position as the storage position of the cloud hard disk where the operation data are located.
In the implementation process, the mapping relation between the data and the cloud hard disk storage position is acquired by newly creating a new access service, so that the method is applicable to a scene in which the mapping relation cannot be acquired through a file system.
Optionally, the storing and managing the operation data according to the matching result includes:
and if the matching result is that the data heat type is not matched with the current storage area, migrating the operation data to another data layer. Therefore, unmatched data can be migrated to a corresponding storage area, so that layered storage of cold and hot data is realized, and storage performance is improved.
In a second aspect, an embodiment of the present application provides a storage management apparatus, including:
the storage area determining module is used for obtaining a current storage area of the stored operation data of the virtual machine, wherein the storage area comprises a hot data layer and a cold data layer, and the operation data is preferentially stored in the hot data layer during initial storage;
the heat type determining module is used for identifying the data heat type of the operation data according to the data attribute of the operation data, wherein the data heat type comprises hot data and cold data;
the matching module is used for matching the data heat type with the current storage area to obtain a matching result;
and the management module is used for carrying out storage management on the operation data according to the matching result.
In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory storing computer readable instructions which, when executed by the processor, perform the steps of the method as provided in the first aspect above.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method as provided in the first aspect above.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a storage management method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a positional mapping relationship between a cloud hard disk location and a data block of a storage system according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of acquiring a mapping relationship between data and a cloud hard disk location through a new access service according to an embodiment of the present application;
FIG. 4 is a block diagram illustrating a storage management apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device for executing a storage management method according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
It should be noted that the terms "system" and "network" in embodiments of the present application may be used interchangeably. "plurality" means two or more, and "plurality" may also be understood as "at least two" in this embodiment of the present application. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/", unless otherwise specified, generally indicates that the associated object is an "or" relationship.
The embodiment of the application provides a storage management method, which is characterized in that the data heat type of operation data is identified according to the data attribute of the operation data of a virtual machine, and the data heat type is matched with the current storage area of the operation data to determine whether hot data is stored in a hot data layer or not and whether cold data is stored in a cold data layer or not, so that the operation data can be stored and managed according to a matching result, such as unmatched data is migrated, so that data with different data heat types can be stored in corresponding storage areas, and therefore, some hot data are in the hot data layer in most cases, but not in the cold data layer for a long time, and the whole access performance of a storage system can be effectively improved.
Referring to fig. 1, fig. 1 is a flowchart of a storage management method according to an embodiment of the present application, where the method includes the following steps:
step S110: and acquiring the current storage area of the running data of the stored virtual machine.
Wherein the storage area includes a hot data layer and a cold data layer. It can be understood that multiple storage media of the storage system can be layered in advance, for example, some storage media with high read-write performance are divided into hot data layers, so that quick read-write of data can be realized, some storage media with low read-write performance are divided into cold data layers, and the specific dividing rule and the number of the storage media can be flexibly set according to practical situations.
The application stores data in a hot data layer and a cold data layer, which can be understood as a layered storage technology, and the layered storage is a method for dividing the data into a plurality of layers according to the requirement and using different storage media and technologies according to the data requirements of different layers. In the super fusion system, the layered storage generally has two layers, namely an SSD layer, a hot data layer and a mechanical hard disk layer, wherein the hot data layer is used for storing data which needs to be accessed quickly, and the other layer is a mechanical hard disk layer, namely a cold data layer and is used for storing a large amount of data which has less access.
It will be appreciated that there may be other ways to divide the hot data layer and the cold data layer, for example, the buffer memory in the SSD is used as the hot data layer, and the rest of the space is used as the cold data layer, so that the data can be stored in layers, and thus, the data with different reading requirements can be stored in different data layers.
The running data of the virtual machine comprises database data, data of various services, data of various application software and the like, and the running data can be stored in the hot data layer preferentially when initially stored, so that the running data which is originally hot data can be ensured to be always in the hot data layer and not to be migrated, and the high access performance of the hot data is realized.
In some embodiments, the running data may be first roughly identified during initial storage, for example, for some system files, such as system logs and system operation data, which are generally read only once at power-on or are written only once and not read, and for some data, if they are stored in the hot data layer, storage space in the hot data layer is wasted, so that during initial storage, they may be stored in the cold data layer. After the subsequent data heat type identification, the data are also cold data, so that the data migration is not triggered, and the occupation of data migration resources is reduced.
Step S120: and identifying the data heat type of the operation data according to the data attribute of the operation data.
The data heat type includes hot data and cold data, the hot data can be understood as some data with higher access and the cold data can be understood as some data with lower access. It will be appreciated that the data heat type may be subdivided in practical applications, for example, further comprising warm data, which is a data between hot data and cold data, for example, some data that is not high but not low in access. Of course, the thermal data may be further subdivided, for example, divided into the first thermal data and the second thermal data, so that finer storage of different data may be realized, and the performance of the whole storage system is further improved.
It will be appreciated that if the data heat types are subdivided, the data layers may be divided into multiple layers, for example, including a hot data layer, a warm data layer, and a cold data layer, where the division of the data layers corresponds to the division of the data heat types, i.e., how many data heat types, and how many data layers correspond to the specific division rule may be flexibly selected in practical situations.
In some embodiments, the data attributes may include at least one of a service to which the data belongs, a frequency of data access, a time of data access, and a source of data.
The service to which the data belongs can be understood as a specific service operated by the virtual machine, such as a service of virus scanning, flow monitoring, data screening and the like; the data source may refer to what kind of data is specifically, such as running data of a certain application software, data of a certain service process, data of a certain database, etc. It can be appreciated that in practical applications, the data attribute may be flexibly configured according to practical situations. And, the data attribute also includes not only the above listed ones, and in practice, more data attributes can be configured according to the requirement to identify the data heat type from more aspects.
In specific identification, at least one of the following judging modes can be included:
if the service to which the operation data belongs is a target service, determining that the data heat type of the operation data is hot data, otherwise, determining that the operation data is cold data;
if the data access frequency of the operation data is greater than the set frequency, determining that the data heat type of the operation data is hot data, otherwise, determining that the operation data is cold data;
If the data access time of the operation data is a set time period, determining that the data heat type of the operation data is hot data, otherwise, determining that the operation data is cold data;
if the data source of the operation data is the set source, determining that the data heat type of the operation data is hot data, otherwise, determining that the operation data is cold data.
The target service may be user-defined, and may be flexibly configured according to actual requirements, for example, for a service of data screening, the data may need to be read frequently, so that the data of the service may be considered as hot data. The data access frequency can reflect the heat of the data on one hand, and if the data access frequency is high, the heat of the data is high, and the setting frequency can be flexibly set according to actual conditions. The data access time may also reflect the heat of the data, e.g., if the access time is the most recent access, indicating that the heat of the data is high, the set time period may refer to a time period near the current time, e.g., 4 hours before the current time, and if the data is accessed during this time period, it is considered to be hot data. The data from some set sources may also be considered thermal data, such as if it is the data from some databases that are read frequently.
That is, if the operation data satisfies at least one of the above-described judgment modes, it can be regarded as hot data, and if none of the above-described judgment modes is satisfied, it can be regarded as cold data.
Step S130: and matching the data heat type with the current storage area to obtain a matching result.
The purpose of the matching is to determine whether hot data is stored in the hot data layer and whether cold data is stored in the cold data layer, in principle, hot data should be stored in the hot data layer and cold data should be stored in the cold data layer, but since data is stored in the hot data layer preferentially when initially stored, and if the storage space of the hot data layer is full, data is stored in the cold data layer, so that some original hot data may be stored in the cold data layer and original cold data is stored in the hot data layer, the data which is stored incorrectly needs to be identified.
For example, if the data heat type of the operation data is hot data and the current storage area is a cold data layer, the two are considered to be not matched, if the data heat type of the operation data is hot data and the current storage area is a hot data layer, the two are considered to be matched, otherwise, the two are considered to be not matched if the data heat type of the operation data is cold data and the current storage area is a hot data layer, and if the data heat type of the operation data is cold data and the current storage area is a cold data layer, the two are considered to be matched.
Step S140: and carrying out storage management on the operation data according to the matching result.
And if the data heat type is not matched with the current storage area by the mode of judging the matching, migrating the operation data to another data layer. If the data heat type of the operation data is cold data, the current storage area is the hot data layer, and if the data heat type of the operation data is cold data, the operation data is migrated to the cold data layer. Of course, if the matching result is that the data heat type is matched with the current storage area, the running data is not migrated, but the unmatched data is migrated, so that the hot data can be located in a hot data layer in most cases and not located in a cold data layer for a long time, the access performance of the hot data layer is much higher than that of the cold data layer, so that the hot data is stored in the hot data layer, the problem of performance bottleneck caused by frequent access of the hot data in the cold data layer can be avoided, and the scheme can effectively realize layered storage of the hot and cold data and improve the data storage performance.
In some embodiments, in order to save storage management resources, the execution time of the method may be set to a period, for example, the method is executed after a period of time, that is, the storage management of the method is performed on the data at a period of time, so as to realize migration of the data. In other words, if a migration period is over, the execution of the above method steps is triggered.
In the implementation process, the data heat types of the operation data are identified according to the data attribute of the operation data of the virtual machine, and the data heat types are matched with the current storage area of the operation data to determine whether the hot data are stored in the hot data layer or not and whether the cold data are stored in the cold data layer or not, so that the operation data can be stored and managed according to the matching result, such as migration of unmatched data, so that the data with different data heat types can be stored in the corresponding storage areas, and therefore, some hot data are in the hot data layer in most cases, but not in the cold data layer for a long time, and the overall performance of the storage system can be effectively improved.
On the basis of the above embodiment, if the above data attribute includes a data access frequency, the data access frequency may be obtained by: monitoring the data access frequency of the operation data through a setting process operated by the virtual machine.
The setting process may refer to a process shared by multiple virtual machines on the host machine, such as a agent process, so that existing services of the virtual machines may not be affected. In the super fusion system, the virtual machine generally runs a agent process, and is mainly used for collecting statistical information of the virtual machine and configuration online modification of the virtual machine.
It will be appreciated that the setting process may be another process, for example, a new process created by the user, and the new process runs in each virtual machine and is used for monitoring the data access frequency of the data in each virtual machine. Or the setting process can be other processes shared by all virtual machines, so that a new process is not required to be created, and only new functions are required to be added to the existing process.
In some embodiments, if the data attribute includes the above multiple attributes, the data may be first identified by other attributes except the data access frequency, and the setting process is then started to monitor the data access frequency of the data only when the other attributes cannot be identified or the other attributes all identify that the data is cold data, so that if the running data can be identified as hot data by the other attributes, the setting process is not started to monitor the data access frequency, and the setting process is started to monitor only when the other attributes cannot be identified, so that the resource occupation of the setting process can be reduced, and the setting process has more resources to implement the main function (for example, the main function of the agent process is to collect the statistics information of the virtual machine and online modification of the configuration of the virtual machine).
In other embodiments, when the data access frequency is counted, the setting process may also filter some data, for example, running data of cold data with a high probability of system operation data, system logs, and the like may not be counted, because the part of data is placed in the cold data layer when being stored at first, and the data is still cold data with a high probability, the setting process may exclude the data, and count the data access frequency for the rest of data, so that the work of the setting process may be reduced, and the work efficiency of the setting process may be improved.
On the basis of the above embodiment, if the data attribute includes at least two of a service to which the data belongs, a data access frequency, a data access time, and a data source, in the thermal data layer, the thermal data determined according to each attribute is allocated with a corresponding storage proportion.
For example, in the thermal data layer, the storage proportion of the thermal data (which can be called important service data) belonging to the target service in the thermal data layer is a%, the storage proportion of the thermal data (which can be called recently accessed data) with the data access time within the set time period in the thermal data layer is b%, for thermal data (which may be referred to as most frequently used data) having a data access frequency greater than a set frequency, the storage proportion in the thermal data layer is c%, if the thermal data layer stores only these three types of thermal data, a% + b% + c% <100%, the remaining space is reserved space for storing newly created data blocks.
In this case, a storage space that is preferentially occupied by the thermal data, for example, a storage space that is preferentially occupied by thermal data corresponding to the target service is set, and if the storage space is insufficient, the storage space of thermal data corresponding to the data access frequency is occupied. That is, if the storage proportion of the thermal data of the data types of the thermal data layer is occupied, the thermal data needs to be stored in the cold data layer, and when the next migration period is up, if some storage space is released by the thermal data layer, the thermal data is stored in the thermal data layer.
In some embodiments, when performing data storage management, if the data heat type of the operation data is determined to be the heat data according to the target data attribute and the matching result is that the data heat type is not matched with the current storage area, the operation data may be migrated to the heat data layer according to the storage proportion of the heat data determined by the target data attribute in the heat data layer.
The target data attribute refers to one of the above data attributes, for example, the target data attribute is a data access frequency, and the corresponding storage proportion is c%, so that during migration, the remaining storage space corresponding to the target data attribute in the thermal data layer can be calculated first, if the storage space occupied by the running data does not exceed the remaining storage space, the running data can be migrated to the thermal data layer entirely, if the storage space occupied by the running data exceeds the remaining storage space, part of the running data can be migrated to the thermal data layer, the remaining running data can wait for the next migration, or if other data attributes still have the remaining storage space in the thermal data layer, the remaining running data can be migrated to the storage space corresponding to other data attributes in the thermal data layer.
In other embodiments, in addition to performing migration management according to the matching result, performing migration management according to the storage proportion of various hot data in the hot data layer (in this manner, the target data attribute is multiple) further includes: for example, firstly, counting whether the total amount of important service data exceeds a set threshold (a%), if not, all the important service data are marked as to-be-migrated data, if so, sorting according to the data access frequency, marking the data which are sorted in the front as to-be-migrated data, sorting the rest of the unlabeled data according to the data access time, marking the data with the nearest b% of the access time as to-be-migrated data, and finally sorting the rest of the data according to the data access frequency, marking the data with the nearest c% of the data which are sorted in the front as to-be-migrated data, so that the data to be migrated can be counted.
And judging how much data of the to-be-migrated hot data are in the hot data layer, and for the to-be-migrated hot data stored in the hot data layer, judging which data do not belong to the to-be-migrated hot data in the hot data layer, migrating the data which are not to-be-migrated hot data to the cold data layer for storage, and migrating the to-be-migrated hot data in the cold data layer to the hot data layer, so that migration can be carried out according to the storage proportion occupied by various hot data, and the purpose of storing various hot data in the hot data layer according to the corresponding storage proportion is achieved.
In the super fusion system, a virtual hard disk (i.e. a cloud hard disk) used by a virtual machine is block storage of a storage system, the size of the virtual hard disk may be between several GB and several TB, the block storage can be seen as being composed of a plurality of small data blocks at the bottom layer, in the distributed block storage system, a request sent by the virtual machine to the storage system is a certain position of a read-write block device, and the storage system converts the received position and length information into a read-write small data block, so that the read-write of data is realized.
Therefore, when the current storage area of the operation data is acquired, the storage position of the operation data in the cloud hard disk needs to be acquired first, and then the current storage area of the operation data in the storage system is determined according to the storage position of the cloud hard disk.
As shown in fig. 2, the first layer in the figure is a mapping relationship between data and a cloud hard disk, the mapping relationship is established and maintained by a file system in a virtual machine, and the second layer is a mapping relationship between the cloud hard disk and a bottom storage system, and the mapping relationship is established and maintained by a distributed storage system and can be directly queried through the storage system.
Therefore, in some embodiments, a mapping relationship between data and a storage location of the cloud hard disk may be obtained through a file system, and then the storage location of the running data in the cloud hard disk may be determined according to the mapping relationship.
In this way, the file system may directly map the relationship through some system calls, for example, xfs and ext4 file systems may obtain the above-mentioned mapping relationship through fiemap system calls.
In other cases, a possible file system cannot acquire the mapping relationship through the call, and for such a scenario, an implementation manner is provided in an embodiment of the present application, where the implementation manner includes: creating a new access service in a host machine corresponding to the virtual machine, wherein the new access service is used for detecting the storage position of data in the cloud hard disk, then receiving a read request for operation data through a file system, converting the read request into an IO request for the cloud hard disk, sending the IO request to the new access service, acquiring the storage position of the cloud hard disk corresponding to the IO request through the new access service, and determining the storage position as the storage position of the cloud hard disk in which the operation data is located.
As shown in fig. 3, the left side is an IO path for normal data reading and writing in the virtual machine, in order to obtain the position mapping relationship between the cloud hard disk and the data, in this scheme, the cloud hard disk is generated on the host machine, and the file system is also mounted on the host machine, which can be understood as providing an interface for the host machine to access the cloud hard disk, where the host machine and the virtual machine see the same cloud hard disk and file system. The virtual machine originally has an access service, and the access service is used for processing normal reading and writing of data in the virtual machine.
And re-creating a new access service on the host machine, wherein the new access service is the same as the access service of the virtual machine, and only the new access service can carry an identifier when being started and is used for indicating that the new access service is only used for detecting the storage position of data in the cloud hard disk and cannot be used for common IO reading and writing.
The user sends the read-write request of the data to the new access service, the access services of the virtual machines are not affected, the new access service does not receive the IO request sent by the virtual machines, and thus the data read by the two file systems are the same, so that only the data in the file systems need to be read on the host machine at the moment, namely the file systems receive the read request, then the read request is converted into the IO request of the cloud hard disk, and then the IO request is sent to the new access service. After receiving the IO request, the new access service records the positions of the cloud hard disk of the IO request, wherein the positions are the storage positions of the cloud hard disk where the running data are located, and the mapping relation between the data and the positions of the cloud hard disk can be obtained after all data reading is completed. After the new access service acquires the position information, the IO request is directly returned, and the IO request is not forwarded to the storage system for reading the real data, so that the whole detection process does not need to consume the disk bandwidth of the storage system, and the detection speed is greatly improved.
In some embodiments, in the manner of acquiring the mapping relationship between the data and the cloud hard disk storage location, the mapping relationship may be preferentially acquired through a file system, and if the mapping relationship is not acquired by the file system, the mapping relationship is acquired through a new access service. Or a new access service is newly established without starting, and when the mapping relation is not acquired by the file system, the new access service is started for acquisition, so that the mapping relation can be rapidly acquired.
After the hot data are identified in the embodiment, the storage positions of the hot data in the cloud hard disk can be obtained first, then the storage positions are sent to the storage system, the storage system can determine whether the data are stored in small data blocks according to the storage positions, the small data blocks are identified as hot data blocks, the data blocks where the hot data are located are represented, and when the data are migrated subsequently, the positions where the hot data are located can be known, so that the storage area where the hot data are located can be judged directly, whether the storage area where the hot data are located needs to be migrated or not can be judged, for example, whether the hot data blocks are located in a cold data layer or not is judged, if yes, the migration is needed, and if not, the migration is not needed.
Because the data can be continuously increased and the occupied positions are increased, the position information of all the thermal data on the cloud hard disk can be periodically obtained again through the agent process and sent to the storage system, and the storage system updates the storage position of the corresponding thermal data according to the position information of the cloud hard disk, namely updates the corresponding thermal data block.
In the implementation process, the mapping relation between the data and the cloud hard disk storage position is acquired by newly creating a new access service, so that the method is applicable to a scene in which the mapping relation cannot be acquired through a file system.
Therefore, in the embodiment of the application, by identifying the data heat type of the data, important service data in the virtual machine, such as a database file, can be automatically identified, and the data is stored in the hot data layer, so that the reading and writing of the important service data can be ensured to be in the hot data layer, and the reading and writing performance of the service can be ensured to reach the standard. In addition, the application is set for the service data in the virtual machine, not the whole virtual machine, so that the data which is irrelevant to the service can be reduced to fall in the hot data layer, and the space utilization rate is improved. For example, a virtual machine may have a size of 100GB, where the important service data is only 10GB, and the rest is some system operation data and system logs, so by identifying these data, the corresponding data may be placed in the corresponding storage area, so that the problem that the important service performance is affected due to inaccurate statistics of thermal data is avoided, instead of storing all the data of the whole virtual machine in the thermal data layer, so that the storage space of the thermal data layer can be better utilized, the overall performance of the storage system is effectively improved, and more virtual machine services can be supported.
Referring to fig. 4, fig. 4 is a block diagram illustrating a storage management device 200 according to an embodiment of the present application, where the device 200 may be a module, a program segment, or a code on an electronic device. It should be understood that the apparatus 200 corresponds to the above embodiment of the method of fig. 1, and is capable of performing the steps involved in the embodiment of the method of fig. 1, and specific functions of the apparatus 200 may be referred to in the above description, and detailed descriptions thereof are omitted herein as appropriate to avoid redundancy.
Optionally, the apparatus 200 includes:
a storage area determining module 210, configured to obtain a current storage area of operation data of a stored virtual machine, where the storage area includes a hot data layer and a cold data layer, and the operation data is preferentially stored in the hot data layer during initial storage;
a heat type determining module 220, configured to identify a data heat type of the operation data according to a data attribute of the operation data, where the data heat type includes hot data and cold data;
a matching module 230, configured to match the data heat type with the current storage area, so as to obtain a matching result;
and the management module 240 is configured to store and manage the operation data according to the matching result.
Optionally, the data attribute includes at least one of a service to which the data belongs, a data access frequency, a data access time, and a data source; the heat type determining module 220 is configured to identify the heat type of the data according to at least one of the following manners:
if the service to which the operation data belongs is a target service, determining that the data heat type of the operation data is heat data;
if the data access frequency of the operation data is greater than the set frequency, determining that the data heat type of the operation data is heat data;
if the data access time of the operation data is a set time period, determining that the data heat type of the operation data is heat data;
and if the data source of the operation data is a set source, determining that the data heat type of the operation data is heat data.
Optionally, if the data attribute includes a data access frequency, the data access frequency is obtained by:
and monitoring the data access frequency of the operation data through the setting process operated by the virtual machine.
Optionally, if the data attribute includes at least two of a service to which the data belongs, a data access frequency, a data access time and a data source, in the thermal data layer, the thermal data determined according to each attribute is allocated with a corresponding storage proportion;
If the data heat type of the operation data is determined to be the heat data according to the target data attribute and the matching result is that the data heat type is not matched with the current storage area, the management module 240 is configured to migrate the operation data to the heat data layer according to the storage proportion of the heat data determined by the target data attribute in the heat data layer.
Optionally, the storage area determining module 210 is configured to obtain a storage location of the operation data in a cloud hard disk; and determining the current storage area of the operation data in the storage system according to the storage position of the cloud hard disk.
Optionally, the storage area determining module 210 is configured to obtain, through a file system, a mapping relationship between data and a storage location of the cloud hard disk; and determining the storage position of the operation data in the cloud hard disk through the file system according to the mapping relation.
Optionally, the storage area determining module 210 is configured to create a new access service in a host machine corresponding to the virtual machine, where the new access service is used to detect a storage location of data in a cloud hard disk; receiving a read request for the operation data through a file system; converting the read request into an IO request for the cloud hard disk through the file system, and sending the IO request to the new access service; and acquiring a storage position of the cloud hard disk corresponding to the IO request through the new access service, and determining the storage position as the storage position of the cloud hard disk where the operation data are located.
Optionally, the management module 240 is configured to migrate the operation data to another data layer if the matching result is that the data heat type does not match the current storage area.
It should be noted that, for convenience and brevity, a person skilled in the art will clearly understand that, for the specific working procedure of the apparatus described above, reference may be made to the corresponding procedure in the foregoing method embodiment, and the description will not be repeated here.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device for executing a storage management method according to an embodiment of the present application, where the electronic device may include: at least one processor 310, such as a CPU, at least one communication interface 320, at least one memory 330, and at least one communication bus 340. Wherein the communication bus 340 is used to enable direct connection communication of these components. The communication interface 320 of the device in the embodiment of the present application is used for performing signaling or data communication with other node devices. The memory 330 may be a high-speed RAM memory or a nonvolatile memory (non-volatile memory), such as at least one disk memory. Memory 330 may also optionally be at least one storage device located remotely from the aforementioned processor. The memory 330 has stored therein computer readable instructions which, when executed by the processor 310, perform the method process described above in fig. 1.
It will be appreciated that the configuration shown in fig. 5 is merely illustrative, and that the electronic device may also include more or fewer components than shown in fig. 5, or have a different configuration than shown in fig. 5. The components shown in fig. 5 may be implemented in hardware, software, or a combination thereof.
Embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs a method process performed by an electronic device in the method embodiment shown in fig. 1.
The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the methods provided by the above-described method embodiments, for example, comprising:
acquiring a current storage area of operation data of a stored virtual machine, wherein the storage area comprises a hot data layer and a cold data layer, and the operation data is preferentially stored in the hot data layer during initial storage;
identifying a data heat type of the operation data according to the data attribute of the operation data, wherein the data heat type comprises hot data and cold data;
Matching the data heat type with the current storage area to obtain a matching result;
and carrying out storage management on the operation data according to the matching result.
In summary, the embodiments of the present application provide a storage management method, an apparatus, an electronic device, and a storage medium, where the method identifies a data heat type of operation data according to a data attribute of the operation data of a virtual machine, and matches the data heat type with a current storage area of the operation data to determine whether hot data is stored in a hot data layer and cold data is stored in a cold data layer, so that storage management can be performed on the operation data according to a matching result, for example, unmatched data is migrated, so that data with different data heat types can be stored in corresponding storage areas, so that some hot data is in the hot data layer in most cases, and is not in the cold data layer for a long time, and thus the overall performance of a storage system can be effectively improved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
Further, the units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (11)

1. A storage management method, the method comprising:
acquiring a current storage area of operation data of a stored virtual machine, wherein the storage area comprises a hot data layer and a cold data layer, and the operation data is preferentially stored in the hot data layer during initial storage;
identifying a data heat type of the operation data according to the data attribute of the operation data, wherein the data heat type comprises hot data and cold data;
matching the data heat type with the current storage area to obtain a matching result;
and carrying out storage management on the operation data according to the matching result.
2. The method of claim 1, wherein the data attributes comprise at least one of traffic to which the data belongs, data access frequency, data access time, and data source;
identifying the data heat type of the operation data according to the data attribute of the operation data, wherein the data heat type comprises at least one of the following judgment modes:
if the service to which the operation data belongs is a target service, determining that the data heat type of the operation data is heat data;
if the data access frequency of the operation data is greater than the set frequency, determining that the data heat type of the operation data is heat data;
If the data access time of the operation data is a set time period, determining that the data heat type of the operation data is heat data;
and if the data source of the operation data is a set source, determining that the data heat type of the operation data is heat data.
3. The method of claim 2, wherein if the data attribute comprises a data access frequency, the data access frequency is obtained by:
and monitoring the data access frequency of the operation data through the setting process operated by the virtual machine.
4. The method according to claim 2, wherein if the data attributes include at least two of a service to which the data belongs, a data access frequency, a data access time, and a data source, the thermal data determined according to each attribute is allocated with a corresponding storage proportion in the thermal data layer;
if the data heat type of the operation data is determined to be heat data according to the target data attribute and the matching result is that the data heat type is not matched with the current storage area, performing storage management on the operation data according to the matching result, including:
And migrating the operation data to the thermal data layer according to the storage proportion of the thermal data in the thermal data layer, which is determined by the target data attribute.
5. The method of claim 1, wherein the obtaining the current storage area of the stored operating data of the virtual machine comprises:
acquiring a storage position of the operation data in a cloud hard disk;
and determining the current storage area of the operation data in the storage system according to the storage position of the cloud hard disk.
6. The method of claim 5, wherein the obtaining the operating data at the storage location of the cloud hard disk comprises:
acquiring a mapping relation between data and a cloud hard disk storage position through a file system;
and determining the storage position of the operation data in the cloud hard disk through the file system according to the mapping relation.
7. The method of claim 5, wherein the obtaining the operating data at the storage location of the cloud hard disk comprises:
creating a new access service in a host machine corresponding to the virtual machine, wherein the new access service is used for detecting the storage position of data in a cloud hard disk;
receiving a read request for the operation data through a file system;
Converting the read request into an IO request for the cloud hard disk through the file system, and sending the IO request to the new access service;
and acquiring a storage position of the cloud hard disk corresponding to the IO request through the new access service, and determining the storage position as the storage position of the cloud hard disk where the operation data are located.
8. The method of claim 1, wherein said storing and managing said operational data according to said matching result comprises:
and if the matching result is that the data heat type is not matched with the current storage area, migrating the operation data to another data layer.
9. A storage management device, the device comprising:
the storage area determining module is used for obtaining a current storage area of the stored operation data of the virtual machine, wherein the storage area comprises a hot data layer and a cold data layer, and the operation data is preferentially stored in the hot data layer during initial storage;
the heat type determining module is used for identifying the data heat type of the operation data according to the data attribute of the operation data, wherein the data heat type comprises hot data and cold data;
The matching module is used for matching the data heat type with the current storage area to obtain a matching result;
and the management module is used for carrying out storage management on the operation data according to the matching result.
10. An electronic device comprising a processor and a memory storing computer readable instructions that, when executed by the processor, perform the method of any of claims 1-8.
11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, performs the method according to any of claims 1-8.
CN202310722926.0A 2023-06-16 2023-06-16 Storage management method, storage management device, electronic equipment and storage medium Pending CN116643704A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310722926.0A CN116643704A (en) 2023-06-16 2023-06-16 Storage management method, storage management device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310722926.0A CN116643704A (en) 2023-06-16 2023-06-16 Storage management method, storage management device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116643704A true CN116643704A (en) 2023-08-25

Family

ID=87640019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310722926.0A Pending CN116643704A (en) 2023-06-16 2023-06-16 Storage management method, storage management device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116643704A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116956363A (en) * 2023-09-20 2023-10-27 微网优联科技(成都)有限公司 Data management method and system based on cloud computer technology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116956363A (en) * 2023-09-20 2023-10-27 微网优联科技(成都)有限公司 Data management method and system based on cloud computer technology
CN116956363B (en) * 2023-09-20 2023-12-05 微网优联科技(成都)有限公司 Data management method and system based on cloud computer technology

Similar Documents

Publication Publication Date Title
US8521986B2 (en) Allocating storage memory based on future file size or use estimates
CN108039964B (en) Fault processing method, device and system based on network function virtualization
CN106502587B (en) Hard disk data management method and hard disk control device
CN107015872A (en) The processing method and processing device of monitoring data
US9612760B2 (en) Modular block-allocator for data storage systems
US9792231B1 (en) Computer system for managing I/O metric information by identifying one or more outliers and comparing set of aggregated I/O metrics
CN104598495A (en) Hierarchical storage method and system based on distributed file system
CN111061752B (en) Data processing method and device and electronic equipment
CN108073352B (en) Virtual disk processing method and device
US20200125473A1 (en) Hybrid log viewer with thin memory usage
CN101751470B (en) System for storing and/or retrieving a data-set and method thereof
CN116643704A (en) Storage management method, storage management device, electronic equipment and storage medium
CN111291018B (en) Data management method, device, equipment and storage medium
CN109542841B (en) Method for creating data snapshot in cluster and terminal equipment
CN111399765A (en) Data processing method and device, electronic equipment and readable storage medium
CN104915376B (en) A kind of archival compression method of file in cloud storage
US9893972B1 (en) Managing I/O requests
CN116700634B (en) Garbage recycling method and device for distributed storage system and distributed storage system
US11210236B2 (en) Managing global counters using local delta counters
CN113064553A (en) Data storage method, device, equipment and medium
CN105893150B (en) Interface calling frequency control method and device and interface calling request processing method and device
CN105847329B (en) Management equipment and method based on stock data server
US7680921B2 (en) Management system, management computer, managed computer, management method and program
CN114518848B (en) Method, device, equipment and medium for processing stored data
CN113835613B (en) File reading method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination