WO2020134609A1 - 数据存储的方法及装置 - Google Patents

数据存储的方法及装置 Download PDF

Info

Publication number
WO2020134609A1
WO2020134609A1 PCT/CN2019/115774 CN2019115774W WO2020134609A1 WO 2020134609 A1 WO2020134609 A1 WO 2020134609A1 CN 2019115774 W CN2019115774 W CN 2019115774W WO 2020134609 A1 WO2020134609 A1 WO 2020134609A1
Authority
WO
WIPO (PCT)
Prior art keywords
heat
configuration information
service
monitoring configuration
data
Prior art date
Application number
PCT/CN2019/115774
Other languages
English (en)
French (fr)
Inventor
王波
屠要峰
黄震江
韩银俊
洪建峰
郭斌
丁毅
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2020134609A1 publication Critical patent/WO2020134609A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • This application relates to, but not limited to, the field of data storage, and in particular, to a method and device for data storage.
  • the distributed storage system architecture is generally composed of the following three parts: a file access client module, a metadata server module, and a storage server module.
  • Figure 1 is a structural model diagram of a distributed storage system according to the related art.
  • the file access client is an agent for the application to access the file system, and provides functions such as application file operation interface and heat statistics reporting; metadata
  • the server module has functions of configuration data management and file metadata management and hierarchical storage management; the storage server module actually stores file data in the storage system.
  • the distributed storage system (Distribute Storage System, referred to as DSS) generally mixes mechanical hard drives and SSD (Solid State Drives, solid state drives) flash memory to meet the needs of large capacity and high performance.
  • new SSD flash memory such as NVMe protocol type, etc.
  • the storage system uses hierarchical storage to manage different types of hard drives to balance storage performance and capacity requirements.
  • the main function of SSD flash memory in hierarchical storage is to serve as a cache of hotspot data to store the latest or hottest data of the current business.
  • the basis for judging the hot and cold data are: data value, data access frequency, retention time, data access size and other indicators, called data access heat.
  • the hierarchical storage integrates the above elements, stores the fragmented copies to different types of hard disks, and automatically migrates between different types of hard disks according to hot spots.
  • Embodiments of the present application provide a data storage method and device, to at least solve the problem of unsatisfactory hierarchical storage of hotspot data due to a single statistical method of heat value in related technologies.
  • a data storage method including: acquiring a plurality of heat monitoring configuration information set for a first service; and monitoring the heat value of the first service separately according to each heat monitoring configuration information , Wherein the heat value is used to indicate the frequency with which the first service is accessed; according to the plurality of heat values corresponding to the plurality of heat monitoring configuration information, a location for storing data corresponding to the first service is selected and stored The data.
  • a data storage device including: a first acquisition module configured to acquire a plurality of heat monitoring configuration information set for the first service; a second acquisition module configured as a basis Each heat monitoring configuration information separately monitors the heat value of the first service, wherein the heat value is used to indicate the frequency of the first service being accessed; the selection module is set to be based on the plurality of heat monitoring configuration information Corresponding multiple heat values, select a location to store data corresponding to the first business, and store the data.
  • a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the above method embodiments during runtime.
  • an electronic device including a memory and a processor, the memory stores a computer program, the processor is configured to run the computer program to perform any of the above The steps in the method embodiment.
  • Figure 1 is a structural model diagram of a distributed storage system according to the related art
  • FIG. 2 is a diagram of a hierarchical storage structure model according to the related art
  • FIG. 3 is a block diagram of a hardware structure of a computer terminal of a data storage method according to an embodiment of the present application
  • FIG. 4 is a flowchart of a data storage method according to an embodiment of the present application.
  • FIG. 5 is an interaction diagram of a multi-service hierarchical storage improvement module according to an embodiment of the present application.
  • FIG. 6 is an interaction diagram of newly added modules for multi-service hierarchical storage according to an embodiment of the present application.
  • Example 7 is a schematic diagram of a multi-service heat monitoring configuration information interface according to Example 1 of the present application.
  • FIG. 8 is a schematic diagram of a hierarchical storage multi-service list according to Example 2 of the present application.
  • FIG. 9 is a schematic diagram of a weight management process according to another example 3 of this application.
  • FIG. 10 is a structural diagram of hierarchical storage multi-directory configuration heat management and elimination according to Example 4 of the present application;
  • Example 11 is a schematic diagram of the main flow of fragment elimination according to Example 4 of the present application.
  • FIG. 2 is a hierarchical storage structure model diagram according to related technologies, such as As shown in Figure 2, it includes access client, metadata server, storage server, heat configuration module, heat management module, heat statistics module, shard elimination module, heat scheduling module, weight management module, and coordination scheduling module.
  • the file access client When an application program calls an interface (such as read, sendfile, etc.) to access a file segment, the file access client statistically reports information such as the number of reads and writes, and the number of bytes read and written to the metadata server thermal management module.
  • the metadata server receives the original information of the currently reported fragment, combines the historical heat and the current reported heat, and calculates the fragment heat according to the formula and saves it in the metadata.
  • the heat management module regularly scans the shards of metadata. If the shard heat is greater than the configured heat threshold and all copies of the shards are located on the mechanical hard disk, the relevant metadata is inserted into the list to be upgraded, and the list to be upgraded is re-installed. Sort. If the shard heat value is less than the heat threshold and there is a copy on the SSD flash memory, insert the relevant metadata into the list to be downgraded and reorder the list to be downgraded; here the heat threshold refers to the shards whose data access heat exceeds this value can be used as Candidate shards are upgraded to SSD flash memory.
  • To-be-upgraded list refers to the sorting order with the heat as the keyword from large to small and contains the shard information that meets the heat threshold; the degraded list refers to the sorting with the heat as the keyword from small to large and the heat is less than the heat threshold ⁇ Piece information.
  • the heat scheduling module regularly checks the system configuration, and takes out the eligible fragments in the list to be upgraded and the list to be degraded to issue instructions to the storage server module to transfer copies of the fragments.
  • the metadata server modifies the new hard disk location after the shard copy is migrated.
  • the related technology is to count the heat of a file or object as historical heat in several historical time periods to predict and reflect the heat of the file in a future period of time, and use this as a basis for judging the heat of the tiered storage to migrate files of different heat to hard disks of different performance.
  • the hierarchical storage technology in related technologies has many limitations.
  • a set of storage often needs to provide storage services for multiple services.
  • Different services have different hot content and hot time periods. They are generally based on historical files. Access statistics will lead to hot spots not hot, and the effect of hierarchical storage will be unsatisfactory.
  • the third is the difficulty of the configuration management of the hot spot statistical period. It is difficult to adapt to the changes in the hot spot content and time period by manually setting the hot spot period.
  • FIG. 3 is a block diagram of a hardware structure of a computer terminal of a data storage method according to an embodiment of the present application.
  • the computer terminal may include one or more (FIG. 3 only One is shown) a processor 302 (the processor 302 may include but is not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 304 configured to store data, optionally, the computer terminal may further include A transmission device 306 and an input and output device 308 provided as communication functions.
  • FIG. 3 is merely an illustration, which does not limit the structure of the computer terminal described above.
  • the computer terminal may also include more or fewer components than those shown in FIG. 3, or have a configuration different from that shown in FIG.
  • the memory 304 may be configured to store software programs and modules of application software, such as program instructions/modules corresponding to the data storage method in the embodiments of the present application, and the processor 302 executes the software programs and modules stored in the memory 304 to execute Various functional applications and data processing, namely to achieve the above method.
  • the memory 304 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 304 may further include memories remotely provided with respect to the processor 302, and these remote memories may be connected to a computer terminal through a network. Examples of the aforementioned network include, but are not limited to, the Internet, intranet, local area network, mobile communication network, and combinations thereof.
  • the transmission device 306 is configured to receive or transmit data via a network.
  • the specific example of the network described above may include a wireless network provided by a communication provider of computer terminals.
  • the transmission device 306 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices through the base station to communicate with the Internet.
  • the transmission device 306 may be a radio frequency (Radio Frequency) module, which is used to communicate with the Internet in a wireless manner.
  • Radio Frequency Radio Frequency
  • FIG. 4 is a flowchart of a data storage method according to an embodiment of the present application. As shown in FIG. 4, the process includes the following steps:
  • Step S402 acquiring multiple heat monitoring configuration information set for the first service
  • Step S404 Monitor the heat value of the first service separately according to each heat monitoring configuration information, where the heat value is used to indicate the frequency with which the first service is accessed;
  • Step S406 Select a location to store data corresponding to the first service according to the multiple heat values corresponding to the multiple heat monitoring configuration information, and store the data;
  • the metadata information can be modified accordingly.
  • a business configuration has multiple heat monitoring configuration information, which can more accurately and timely migrate the hot data of the business to the solid-state hard drive, which greatly improves the tiered storage efficiency and solves the related In the technology, the hot data gradation storage effect is not ideal due to the single heating value statistical method.
  • acquiring multiple heat monitoring configuration information set for the first service includes: obtaining at least one of the following information included in the heat monitoring configuration information: heat update period, heat statistics start time, heat statistics end time .
  • separately monitoring the heat value of the first service according to each heat monitoring configuration information includes: counting each heat update cycle from the start time of the heat statistics corresponding to each heat monitoring configuration information to the end time of the heat statistics The first number of times that the first service is accessed; obtaining the heat value of the first service corresponding to each heat monitoring configuration information according to the first number of times.
  • separately monitoring the heat value of the first service according to each heat monitoring configuration information includes: the first heat monitoring configuration information in the plurality of heat monitoring configuration information is directed to the first of the first service During the business directory, the heat value of one or more data fragments in the first business directory is counted according to the first heat monitoring configuration information.
  • selecting a location to store data corresponding to the first service according to the multiple heat values corresponding to the multiple heat monitoring configuration information, and storing the data includes: the multiple heat monitoring configuration information is When the associated heat monitoring configuration information is obtained, the product of the heat value corresponding to each heat monitoring configuration information and the preset weight is obtained; the sum of the products of the plurality of heat monitoring configuration information is obtained, and the storage is selected according to the sum
  • the first service corresponds to the location of the data and stores the data.
  • selecting a location to store data corresponding to the first service according to the sum value and storing the data includes: when the sum value is greater than a heat threshold, the data corresponding to the first service is stored by a machine The hard disk is migrated to the solid-state hard disk; when the sum value is less than the heat threshold, the data corresponding to the first service is migrated from the solid-state hard disk to the mechanical hard disk.
  • selecting a location to store data corresponding to the first service and storing the data includes: selecting a solid state drive or a mechanical hard drive that stores a copy of the first data segment of the first service; copying the copy Store to the selected solid state drive or mechanical hard drive.
  • the preset weight of the plurality of heat monitoring configuration information is adjusted to increase the proportion of times corresponding to the next heat update cycle.
  • the number of times is detected
  • the maximum value is reached; when the maximum value is still less than the preset ratio, a statistical report is generated and an alarm is generated.
  • selecting a location to store the data corresponding to the first service according to the plurality of heat values corresponding to the plurality of heat monitoring configuration information includes: each of the plurality of heat monitoring configuration information is independent of each other's heat monitoring When configuring information, the location where the data corresponding to the first service is stored is selected according to the heat value corresponding to each heat monitoring configuration information.
  • the plurality of heat values corresponding to the plurality of heat monitoring configuration information select a location to store data corresponding to the first service, and after storing the data, real-time statistics of the first Two times, when the second times meet a preset condition, automatically generate second heat monitoring configuration information of the first service.
  • the second heat monitoring configuration information is automatically generated for subsequent heat of the first service During monitoring, the specific configuration of the second heat monitoring configuration information may be learned from the heat monitoring configuration information of other services.
  • the storage space of the first hard disk is released in at least one of the following ways: the second service with the thermal value stored on the first hard disk below the thermal threshold or the minimum thermal value is migrated out; The data fragments with the smallest heat value of the second service stored on the first hard disk are migrated out.
  • the present application discloses a method for improving the efficiency of hierarchical storage in a distributed storage system. It is applicable to multiple business scenarios, and the statistical analysis of hotspots in multiple time periods at the same time through adaptation can solve the problem of hierarchical storage of distributed storage systems in the above scenarios.
  • a distributed storage system carries multiple services, different services have different access hotspots and peak time periods, and regardless of historical or current heat, the contribution value of the heat in different time periods is different of.
  • the related technology has a problem of low scheduling efficiency in terms of heat management.
  • this solution proposes a hierarchical storage method and device that can be flexibly deployed in a distributed storage system. It supports multi-service setting of different peak time periods, independent heat management, and uses the heat and performance of different time periods of the business. Data, automatically generate a variety of related heat monitoring configuration information in different time periods, improve the management of hierarchical storage heat, and automatically generate statistical data based on heat statistics, provide a method to automatically adjust the weight of related configuration, simplify the burden of O&M personnel.
  • this solution adds several functional modules (the dashed frame module of the metadata server in Figure 2) and optimizes the implementation of multiple modules to achieve flexible heat management when multi-service support and improve the scheduling problem of hotspot shards during peak hours
  • the dashed frame module of the metadata server in Figure 2 adds several functional modules (the dashed frame module of the metadata server in Figure 2) and optimizes the implementation of multiple modules to achieve flexible heat management when multi-service support and improve the scheduling problem of hotspot shards during peak hours
  • the storage system can automatically generate multiple time period related heat monitoring configuration information for each business based on the statistical data according to the operating situation; multiple heat monitoring configuration information in a business can be independently configured for independent heat management or correlation Configure for shared heat management.
  • the associated heat monitoring configuration information in different time periods of the business provides a method for automatically adjusting the associated configuration weight during the system operation.
  • the hierarchical storage system supports multiple services and multiple time periods for heat management and scheduling. It needs to be adjusted and optimized for the existing architecture.
  • the content of the elimination module is the following details the heat monitoring configuration information module, heat management, heat scheduling module, heat statistics module, and points.
  • each relevant field of the heat monitoring configuration information includes a service identifier, a heat scheduling time, a heat calculation formula, a shard retention time in the SSD flash memory, a maximum space occupied by the SSD flash memory, a heat statistics start time, and a heat statistics end time.
  • Table 1 is a table explaining the meaning of each main field in the heat monitoring configuration information according to this application, as shown in Table 1:
  • the basic field of this configuration is a combination of service identification, popularity update time period and weight.
  • the service identifier described here is the identifier of the resource used as the service running in the storage system.
  • the service can distinguish different service types by directory name, full path, relative path, file prefix or suffix format, etc. .
  • a business ID can contain multiple directories or full paths.
  • the heat update time period can be several time periods of the day, such as 10 o'clock to 14 o'clock, or it can be configured as a holiday (every Saturday, Sunday, May 1st, and 11th). Different time periods of the same service can be configured as independent configurations for independent management, or they can rely on the system to automatically generate associated configurations for shared heat management.
  • the associated configuration weights can be manually configured, or they can be automatically assigned initial values and automatically adjusted when the associated configuration is automatically generated during system operation. It can also include a combination of the preferred field associated label, the shard retention time in SSD, and the compilable heat calculation formula, etc. to complete this heat management scheme.
  • Table 2 is a schematic table of heat monitoring configuration information according to an embodiment of the present application. As shown in the following table, the four heat monitoring configuration information is as follows:
  • Configuration 1 and configuration 2 are associated configurations, all of which act on the business directory HOT, share the same list to be upgraded and downgraded, and the same heat management task.
  • Configuration 3 and configuration 4 are independent configurations. Each configuration has an independent list to be upgraded and downgraded and a separate thermal management task.
  • the structure diagram is shown in Figure 10.
  • the associated configuration of different time periods can be automatically generated by the system during operation.
  • the generation rules are as follows:
  • the prerequisite for automatic generation is the configuration of relevant business directories already in the configuration. It obtains certain time periods with high read performance of the business catalog according to the heat statistics module. The performance in this time period exceeds 1 or 2 times the preset value of the normal operation.
  • an associated configuration can be generated according to the configuration of the secondary business directory and this time period, and the initial weight can be set. In this way, this service contains multiple heat monitoring configuration information in the storage system, and each heat monitoring configuration information has a certain weight.
  • the storage system obtains data in multiple statistical periods according to the heat statistics module, and can automatically adjust the associated configuration weight.
  • each business directory can be configured with multiple heat monitoring configuration information.
  • the metadata server adds several original heat fields (such as h1, h2, h3) to the metadata related to file fragmentation ), used to store the original heat information of different heat monitoring configuration information in the same reporting period; add several heat monitoring configuration information tags (such as tag1, tag2, tag3), corresponding to which heat monitoring configuration information the original heat field corresponds to.
  • the file access client When an application reads a file through interfaces such as read and sendfile, the file access client counts the number of raw reads and writes, the number of bytes read and written, and sends it to the metadata server.
  • the metadata server receives the update shard heat message, reads the current time, finds the directory to which the corresponding file belongs, and then recursively searches the upper-level directory, for each level directory, checks whether the business directory heat monitoring configuration information is configured, and obtains the current time in the heat statistics range The configuration number within. Obtain an idle heat field in the metadata related to sharding, and fill in the current configuration number and the heat calculated according to this configuration.
  • the thermal management module will periodically scan the thermal monitoring configuration information, and start a separate thermal management task for each independent configuration, and only need to start a thermal management task for the associated configuration thermal management.
  • the heat management task After the heat management task enters the running time, scan the relevant metadata of the shards in the current business directory to obtain the current time, for example, the current time is from 9 o'clock to 12 o'clock, the heat is updated every hour, when the heat update task runs, configure 1 Both and configuration 2 come into effect, and the heat is calculated according to the calculation formula, which is represented by benefit1 and benfit2. Then the actual benefit of the current segment is corrected by the following formula (1):
  • w1 is the configuration weight associated with configuration 1 and w2 is the configuration weight associated with configuration 2.
  • the initial value of w1 and w2 is 0.5, that is, the default association configuration 1 and configuration 2 status is the same.
  • the weight of each configuration in the associated configuration can be adjusted automatically by the system.
  • the actual heat is closest to the heat in the relevant configuration, and the number of statistics for this configuration is increased by 1.
  • the heat statistics module calculates the performance data of SSD flash memory and mechanical hard disk read by this business, and obtains the actual efficiency of the current round of heat scheduling (actual read data volume of business SSD flash memory / total read data volume of business) ). Compare the actual scheduling efficiency with the preset desired scheduling efficiency such as 80%. If the actual efficiency is lower than the preset scheduling efficiency, the most relevant configuration weight in the associated configuration is increased by 10%. In this way, after several cycles of heat scheduling and heat statistics, the weight of the associated configuration is adjusted according to the adjustment rules in each cycle.
  • the thermal scheduling and actual data statistics find that the scheduling efficiency is less than the preset scheduling efficiency, then a statistical report and alarm are generated, warning the operation and maintenance personnel to re-evaluate the scheduling plan: adjust the heat statistics time , Calculation formula.
  • the heat management task calculates the current shard heat, it is determined whether the heat is greater than the heat threshold, and if the condition is met, it is added to the list to be upgraded.
  • the heat management task also processes whether the heat of the shards that have been upgraded to SSD flash memory is less than the heat threshold, and if it meets the conditions, it joins the queue to be downgraded. I won't repeat them here.
  • This module periodically fetches each heat monitoring configuration information, first check the list of to-be-upgraded corresponding to the heat monitoring configuration information, sequentially take out the hottest shard information, check whether all copies of the shard are all on the mechanical hard disk, and will meet the upgrade conditions.
  • a copy of the slice sends a request to the storage server to migrate the copy from the mechanical disk to the SSD flash memory; after the copy upgrade is complete, set the current upgrade time point of the slice.
  • the function of the heat statistics module is as follows. Count the times and read sizes of all shards reading mechanical hard disks and SSD flash memory in each business directory of the heat monitoring configuration information in each scheduling cycle; calculate the read fragment hits in the heat monitoring configuration information directory The percentage of SSD flash memory, that is, the heat dispatch efficiency; the SSD space and fragments of different business directories in the output system occupy space in the SSD. The above statistical information is used to evaluate the tiered storage efficiency, and feedback to the heat management module to improve the heat monitoring configuration information.
  • the following describes a method for sharding elimination under multi-service and multi-heat monitoring configuration information.
  • the main process is:
  • the heat management and heat scheduling of different services are independent of each other, so that different services use the tiered storage system at the same time. They share CPU, SSD flash memory, mechanical hard drives, and network resources. For example, during the peak period of HOT-corresponding service visits, mass scheduling of a large number of TV directory-corresponding services will affect the performance stability of the HOT directory. Therefore, coordinated scheduling of multiple independent heat management to prevent other business background scheduling reasons from affecting business stability.
  • the heat statistics module finds that the SSD flash memory or mechanical hard disk IO capability reaches a threshold within a certain period of time, or the storage system reports performance reaches the performance threshold, it will notify each business scheduler to control the speed of shard copy migration.
  • FIG. 5 is an interaction diagram of an improved module for multi-service hierarchical storage according to an embodiment of the present application.
  • this solution adds several management modules and optimization functions to better support the multi-service use of the same hierarchical storage system, and according to statistics
  • the module acquires automatically generated associated heat monitoring configuration information at different time periods, and provides an automatic adjustment method of associated configuration weights to simplify operation and maintenance complexity and improve scheduling efficiency.
  • the metadata server is optimized for related modules, it supports multiple business heat management, the main process is described as follows (see Figure 5):
  • the metadata server After receiving the shard heat information, the metadata server searches for the business directory to which the file of the shard belongs, reads the current time, and recursively searches the upper directory to check whether the directory performs heat monitoring configuration information, and then obtains the current time in the heat statistics range Configure the number and update the corresponding heat of this configuration.
  • the heat management module periodically scans the fragments of the metadata to obtain the service identifier of the fragmented file and the current time, and then obtains all independent and associated configurations of the business. Check whether the independent configuration or the associated configuration is in effect at the current time, and then calculate the shard heat.
  • FIG. 6 is an interaction diagram of newly added modules for multi-service hierarchical storage according to an embodiment of the present application. As shown in FIG. 6, the newly added weight management, coordinated scheduling, and fragment elimination modules in this solution are background functions. There is a description, and now show the interaction process between the new module and the existing module. Each new module interacts, as shown in Figure 6:
  • Weight management Obtain all the related heat monitoring configuration information of the business, retrieve the corresponding hot spot statistical data of the business, and calculate the associated configuration weight;
  • the coordination scheduling module interacts with the heat statistics, heat scheduling, and heat monitoring configuration information.
  • the main functional processes are:
  • the hot statistics task finds that there is a peak period of business volume and exceeds the threshold setting, and notifies the coordination scheduling module.
  • the coordination scheduling module obtains the business peak time period, checks all the heat monitoring configuration information of the business, automatically generates associated heat monitoring configuration information, initializes initial weights, and stores it in the heat monitoring configuration information.
  • a distributed storage system carries multiple services, different services have different access hotspots and peak time periods, and regardless of historical or current heat, the contribution value of the heat in different time periods is different of.
  • the existing technology has problems in terms of heat management and low scheduling efficiency. Therefore, the distributed storage system proposes a hierarchical storage device that can be flexibly deployed in the distributed storage system in response to the above problems. It supports multiple services and automatically generates various types of associated heat monitoring configuration information at different time periods to improve the hierarchical storage. Heat management, and automatically generate associated heat monitoring configuration information based on heat statistics, provide a method for automatically adjusting the weight of associated configuration, and simplify the burden of O&M personnel.
  • Example one multi-service heat monitoring configuration information and management
  • the above-mentioned hierarchical storage system can also carry services such as web page video cache, applet application, and mailbox backup. There are many differences between these services and video-on-demand user groups, access rates, and peak access time periods. They cannot migrate shard copies according to unified heat management. Then configure one basic heat monitoring configuration information and several related heat monitoring configuration information according to each business directory.
  • the service identifier described in this article is a resource identifier used as a service running in the storage system. The service can also distinguish files of different service types by using full path, relative path, file prefix or suffix format.
  • the time period can be not only a certain time interval every day (9am-11pm), but can also be configured as a holiday according to the day, such as Saturday, Sunday, and National Day (October 1st to October 7th).
  • the following configuration is added to the same storage system:
  • the heat statistics period is 8 o'clock to 18 o'clock every day, and the heat update cycle is every hour.
  • Configuration 5 as the associated configuration of configuration 4, the heat statistics period is from 8 am to 9:30 am, and the heat update period is every 30 minutes.
  • the specific configuration method is through human-computer interactive commands or interactive interfaces.
  • multi-service hierarchical storage heat monitoring configuration information When adding new services to the storage system, in addition to adding service paths, it is also necessary to perform multi-service hierarchical storage heat monitoring configuration information.
  • the following introduces the multi-service thermal monitoring configuration information hot interactive interface in the storage system.
  • Figure 7 is the multi-service thermal monitoring configuration information according to Example 1 of the present application Interface diagram.
  • FIG. 8 is a schematic diagram of a hierarchical storage multi-service list according to Example 2 of the present application. As shown in FIG. 8, the following shows that the storage system includes multiple service configuration lists.
  • the heat monitoring configuration information 1 is the Mail service, and the configurations 2 and 3 are the TV configuration, which is related to the heat monitoring configuration information.
  • Configuration 1 is independent heat monitoring configuration information
  • configurations 2 and 3 are associated configuration shared heat management.
  • the heat management and heat scheduling between different services are independent of each other, so that different services can use the tiered storage system at the same time.
  • the hierarchical storage system needs to coordinate and schedule multiple independent heat management. Multiple services share CPU, SSD flash memory, mechanical hard disk, and network resources in the storage system, and business stability cannot be reduced due to background heat scheduling.
  • the heat statistics module finds that the SSD flash memory or mechanical hard disk IO capacity reaches a threshold within a certain period of time, or the storage system reports performance reaches the performance threshold, it will notify each business scheduler to control the speed of shard copy migration.
  • a threshold within a certain period of time, or the storage system reports performance reaches the performance threshold.
  • it will notify each business scheduler to control the speed of shard copy migration.
  • a more common factor is that during a peak period of a business, such as when a TV viewer orders a program at 19-20, other services need to reduce the speed of heat management and scheduling during this time.
  • Example two generation of configuration information for associated heat monitoring
  • the content distribution network usually provides services such as user live broadcast and on-demand, and uses a hierarchical storage system to provide high-performance read IO and large-capacity capabilities.
  • the main requirements of the business on the storage system are: a lot of read bandwidth, lower latency and larger storage capacity.
  • This period of time is called the peak period.
  • users order programs, and the storage system has a stable business.
  • the heat management during the peak period is very different from the normal business heat and cannot be judged by a set of criteria.
  • more than three heat monitoring configuration information can be configured for the HOT directory, as follows:
  • the heat statistics period (start time, end time, the same below) is configured to 11-12 o'clock every day, the heat update time is every half an hour, the calculation formula, etc. without special instructions, the default configuration is used as an example.
  • the heat statistics period is from 8 am to 23 pm, and the heat update time is every hour.
  • the heat statistics period is 18-22 pm, and the heat update time is every half hour.
  • Configuration 1 is an independent configuration.
  • Configuration 2 and configuration 3 are set to the associated configuration, and the initial weights are 0.2 and 0.8 respectively.
  • the occupation of SSD storage space depends on the business plan. It is not necessary to set this value accurately for the same business-related heat monitoring configuration information, and the same configuration data can be used. Other configurations are not repeated here.
  • the logical configuration of the three configurations of the HOT service in the system is shown in Figure 8.
  • the storage system allocates the corresponding resources: generates the corresponding list to be upgraded, the list to be downgraded, and creates scheduling tasks.
  • configuration 1 has a separate list of to-be-upgraded, to-be-downgraded and heat management tasks.
  • Configuration 2 and Configuration 3 share a list of upgrades and downgrades, and they have a common heat management task that will be executed according to the configuration 2 and configuration 3 rules.
  • This example also provides a method for automatically generating associated heat monitoring configuration information after the storage system senses the peak business period during system operation.
  • the system calculates the peak business hours according to the heat statistics module, generates a new associated configuration of the business directory, and sets the weight of the existing heat monitoring configuration information and the newly added heat monitoring configuration information. It can help users analyze business peak hours, and generate new associated heat monitoring configuration information, automatically heat scheduling, simplify the complexity of operation and maintenance personnel configuration.
  • the main steps are:
  • the coordination scheduling module obtains the business catalog and the existing heat monitoring configuration information, and the statistical information of this time period, to generate a newly associated configuration.
  • the newly-added associated configuration heat statistics time is set to the peak business time period, and the heat update time and other parameters refer to the existing heat monitoring configuration information to set the weight of the newly associated configuration.
  • the coordination scheduling module adds the newly associated configuration to the configuration table.
  • Example 3 Weighted management of configuration information for associated heat monitoring
  • the configuration weight associated with the same business directory is specified when the storage system is initialized, and can be modified during operation and maintenance, or it can be automatically adjusted based on the heat statistics module data during the system's operation. After the application of this example, it can reduce parameter adjustment and frequent upgrades in operation and maintenance.
  • the heat statistics module occupies the SSD space and mechanical hard disk space during the statistical period, the number of times the business reads the SSD flash memory and mechanical hard disk, the number of bytes, etc., and the number of upgrade fragments obtained by each associated configuration calculation.
  • the weight range of the associated configuration is [0,1], and the default value of the initial weight is equal to 1/the number of associated configurations.
  • 9 is a schematic diagram of a weight management process according to another example 3 of the present application. As shown in FIG. 9, the following steps are included:
  • Step one the initial weight
  • Step 2 After the heat statistics task completes the statistics of the entire system, it starts the weight monitoring task;
  • Step 3 Find the most relevant heat configuration in each group of related configurations in the heat configuration
  • Step 4 Set the most relevant heat configuration weight to the original value + incremental weight W d ;
  • Step 5 Repeat the above steps in the next heat statistics period.
  • a threshold such as 1
  • a statistical report or alarm is generated.
  • the specific process of the weight management process may include: the storage system statistics module notifies the coordination scheduling module, starts the weight monitoring task, and adjusts the weight of the associated configuration in a fixed mode. For example, it is adjusted with a fixed step size of 0.1 to find the most relevant heat monitoring configuration information in the statistical period among the related heat monitoring configuration information.
  • the most relevant heat monitoring configuration information refers to the configuration in which the number of upgrade shards calculated in a certain configuration is closest to the number of shards actually upgraded in this heat management task within a preset statistical period. Then increase the incremental weight W d in the most relevant heat monitoring configuration information by 0.1. In the next statistical period, analyze the heat statistics and adjust the weights. When the final several operating cycles (that is, the most relevant weight reaches 1), the hot schedule and actual data statistics find that the scheduling efficiency is less than the preset scheduling efficiency, statistical reports and alarms are generated, and the associated configuration during the peak time period is automatically generated.
  • Example 4 demonstrates the shard elimination module.
  • This solution supports multiple services and multiple heat monitoring configuration information for a single service. They share SSD flash memory during actual operation, and have independent thermal management and thermal scheduling, which will cause certain problems in the use and release of SSD space. Therefore, the allocation and elimination module is added as an auxiliary to smoothly adapt to multiple heat management and heat scheduling.
  • FIG. 11 is a schematic diagram of the main flow of fragment elimination according to Example 4 of the present application. As shown in FIG. 11, the basic flow of the fragment elimination module of the storage system is as follows:
  • Step 1 Iterate through all the heat configuration and heat statistics of the current business directory, and sort by SSD space occupation. Take the SSD space occupying the maximum heat configuration and set it to the current heat configuration.
  • Step two traverse the current popularity configuration degradation list, sort the fragments according to popularity, and add them to the list to be eliminated.
  • Step 3 Create a new hot schedule immediately. After the scheduling is completed, check that the space occupied by the SSD flash memory release satisfies the condition and exit.
  • Step 4 When the SSD space release does not meet the conditions, sort all the configuration hotness configuration directories of the storage system according to the SSD space. Repeat the above steps for each heat catalog.
  • Step five traverse the heat-configured business directory to find whether the fragment is in SSD flash memory, and the retention time exceeds the configured SSD retention time, add the expired fragment to the expired list to be eliminated; the unexpired fragment, add the unexpired list, and calculate the occupancy SSD space.
  • Step 6 Unexpired shards, join the unexpired list, and calculate the occupied SSD space. Determine whether the SSD flash memory occupies the condition, otherwise, add the minimum heat slice to the elimination list in turn, triggering the creation of a new heat schedule.
  • the fragment elimination process can further include the following steps:
  • the method according to the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware, but in many cases the former is Better implementation.
  • the technical solution of the present application can essentially be embodied in the form of software products, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk,
  • the CD-ROM includes several instructions to enable a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in the embodiments of the present application.
  • a data storage device is also provided.
  • the device is configured to implement the above-mentioned embodiments and preferred embodiments, and descriptions that have already been described will not be repeated.
  • the term "module” may implement a combination of software and/or hardware that performs predetermined functions.
  • the devices described in the following embodiments are preferably implemented in software, implementation of hardware or a combination of software and hardware is also possible and conceived.
  • a data storage device including:
  • the first obtaining module is configured to obtain multiple heat monitoring configuration information set for the first service
  • the second obtaining module is configured to separately monitor the heat value of the first service according to each heat monitoring configuration information, wherein the heat value is used to indicate the frequency with which the first service is accessed;
  • the selection module is configured to select a location to store data corresponding to the first service according to the plurality of heat values corresponding to the plurality of heat monitoring configuration information, and store the data.
  • the above modules can be implemented by software or hardware, and the latter can be implemented by the following methods, but not limited to this: the above modules are all located in the same processor; or, the above modules can be combined in any combination The forms are located in different processors.
  • the embodiments of the present application also provide a storage medium.
  • the above storage medium may be set to store program code for performing the following steps:
  • S3 Select a location to store data corresponding to the first service according to multiple heat values corresponding to the multiple heat monitoring configuration information, and store the data.
  • the above storage medium may include, but is not limited to: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic Various media that can store program codes, such as a disc or an optical disc.
  • An embodiment of the present application further provides an electronic device, including a memory and a processor, where the computer program is stored in the memory, and the processor is configured to run the computer program to perform the steps in any one of the foregoing method embodiments.
  • the electronic device may further include a transmission device and an input-output device, where the transmission device is connected to the processor, and the input-output device is connected to the processor.
  • the above processor may be configured to perform the following steps through a computer program:
  • S3 Select a location to store data corresponding to the first service according to multiple heat values corresponding to the multiple heat monitoring configuration information, and store the data.
  • multiple heat monitoring configuration information is configured for the first service, and the heat of the first service is monitored according to the configuration in each heat monitoring configuration information to obtain the heat value corresponding to each heat monitoring configuration information , And then select the location to store the data corresponding to the first business according to the multiple heat values, such as a solid-state hard drive or a mechanical hard disk, which can be a comprehensive consideration of multiple heat values to migrate the data corresponding to the first business, or it can be independently based on a heat
  • a business configuration has multiple heat monitoring configuration information, which can more accurately and timely migrate the hot data of the business to the solid-state hard drive, which greatly improves the tiered storage efficiency and solves the related In the technology, the hot data gradation storage effect is not ideal due to the single heating value statistical method.
  • modules or steps of the present application can be implemented by a general-purpose computing device, they can be concentrated on a single computing device, or distributed in a network composed of multiple computing devices Above, optionally, they can be implemented with program code executable by the computing device, so that they can be stored in the storage device to be executed by the computing device, and in some cases, can be in a different order than here
  • the steps shown or described are performed, or they are made into individual integrated circuit modules respectively, or multiple modules or steps among them are made into a single integrated circuit module to achieve. In this way, this application is not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种数据存储的方法及装置,其中该方法包括:为第一业务配置多个热度监测配置信息,依据每个热度监测配置信息中的配置对第一业务的热度进行监测,获取每个热度监测配置信息对应的热度值,然后依据该多个热度值选择存储第一业务对应数据位置。

Description

数据存储的方法及装置 技术领域
本申请涉及但不限于数据存储领域,具体而言,涉及一种数据存储的方法及装置。
背景技术
在相关技术中,通常分布式存储系统架构由如下三部分组成:文件访问客户端模块,元数据服务器模块和存储服务器模块。图1是根据相关技术中的分布式存储系统结构模型图,如图1所示,文件访问客户端是应用程序访问文件系统的代理,提供应用程序文件操作接口,热度统计上报等功能;元数据服务器模块具有配置数据管理和文件元数据的管理和分级存储管理功能;存储服务器模块在存储系统中实际存储文件数据。
分布式存储系统(Distribute Storage System,简称为DSS)普遍混插机械硬盘和SSD(Solid State Drives,固态硬盘)闪存,以满足大容量和高性能需求。近年来新型SSD闪存,如NVMe协议类型等,更是具有极高性能、超低延时特点,也逐渐在企业级存储中广泛应用。存储系统使用分级存储管理不同类型硬盘,均衡存储性能和容量需求。分级存储中SSD闪存主要作用是作为热点数据的缓存,用以存储当前业务最新或者最热的数据。数据冷热判断依据主要有:数据价值、数据访问频率、保留时间、数据访问大小等指标,称之为数据的访问热度。分级存储综合上述要素,将分片的副本存储到不同类型硬盘中,并且在不同类型硬盘之间根据热点情况进行自动迁移。
针对相关技术中由于热度值统计方式单一导致热点数据分级存储效果不理想的问题,目前还没有有效的解决方案。
发明内容
本申请实施例提供了一种数据存储的方法及装置,以至少解决相关技 术中由于热度值统计方式单一导致热点数据分级存储效果不理想的问题。
根据本申请的一个实施例,提供了一种数据存储的方法,包括:获取为第一业务设置的多个热度监测配置信息;依据每个热度监测配置信息分别监测所述第一业务的热度值,其中,所述热度值用于指示所述第一业务被访问的频率;依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,并存储所述数据。
根据本申请的另一个实施例,还提供了一种数据存储的装置,包括:第一获取模块,设置为获取为第一业务设置的多个热度监测配置信息;第二获取模块,设置为依据每个热度监测配置信息分别监测所述第一业务的热度值,其中,所述热度值用于指示所述第一业务被访问的频率;选择模块,设置为依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,并存储所述数据。
根据本申请的又一个实施例,还提供了一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
根据本申请的又一个实施例,还提供了一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行上述任一项方法实施例中的步骤。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据相关技术中的分布式存储系统结构模型图;
图2是根据相关技术中的分级存储结构模型图;
图3是本申请实施例的一种数据存储的方法的计算机终端的硬件结构框图;
图4是根据本申请实施例的数据存储的方法的流程图;
图5是根据本申请实施例的多业务分级存储改进模块交互图;
图6是根据本申请实施例的多业务分级存储新增模块交互图;
图7是根据本申请例子一的多业务热度监测配置信息界面示意图;
图8是根据本申请例子二的分级存储多业务列表示意图;
图9是根据本申请另例子三的权重管理流程示意图;
图10是根据本申请例子四的分级存储多目录配置热度管理和淘汰结构图;
图11是根据本申请例子四的分片淘汰主要流程示意图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
分级存储架构主要功能模块如下:文件访问客户端热度统计和上报;元数据服务器配置管理模块、热度管理模块、热度调度模块、统计模块,图2是根据相关技术中的分级存储结构模型图,如图2中所示,包括访问客户端,元数据服务器,存储服务器,热度配置模块,热度管理模块,热度统计模块,分片淘汰模块,热度调度模块,权重管理模块,协调调度模块。
分级存储热度管理一般流程为:
(1)应用程序调用接口(如read、sendfile等)访问文件分片时,文件访问客户端统计上报该分片读写次数、读写字节数等信息给元数据服务器热度管理模块。
(2)元数据服务器接收到当前上报分片原始信息,结合历史热度和当前上报热度,根据公式计算得出该分片热度并保存到元数据中。
(3)热度管理模块定时扫描元数据的分片,如果分片热度大于配置热度阈值且分片所有副本均位于机械硬盘上,则将相关元数据插入到待升级列表,并且重新将待升级列表排序。如果分片热度值小于热度阈值并且有副本在SSD闪存上,则将相关元数据插入待降级列表,并重新排序待降级列表;此处热度阈值指数据访问热度超过此值的分片可以被作为候选分片升级到SSD闪存。待升级列表指以热度为关键字从大到小已排好序且包含满足超出热度阈值的分片信息;降级列表指以热度为关键字从小到大已排好序,热度小于热度阈值的分片信息。
(3)热度调度模块定时检查系统配置,取出待升级列表和待降级列表中符合条件的分片对存储服务器模块下达分片的副本迁移的指令。
(4)存储服务器迁移分片副本成功后,上报元数据服务器;
(5)元数据服务器修改分片副本迁移后新硬盘位置。
相关技术是在若干历史时间段内统计文件或者对象热度作为历史热度,来预测反映未来一段时间内文件的热度,据此作为分级存储热度判断依据,将不同热度文件迁移到不同性能的硬盘上。
相关技术中的分级存储技术存在较多的局限,一是多业务支持差,一套存储往往需要为多个业务提供存储服务,不同业务有不同的热点内容和热点时间段,笼统的基于历史文件访问热度的统计,将会导致热点不热,分级存储的效果不理想;第二是不同时间段热点支持差,即使是同一业务,在不同的时间段往往有不同的热点内容,单一的基于过往时间段的统计,会导致热点错位,分级存储的效率大打折扣;第三是热点统计周期配置管理困难,通过人为的设定热点周期,很难适应热点内容和时间段的变化。
实施例一
本申请实施例一所提供的方法实施例可以在计算机终端或者类似的运算装置中执行。以运行在计算机终端上为例,图3是本申请实施例的一 种数据存储的方法的计算机终端的硬件结构框图,如图3所示,计算机终端可以包括一个或多个(图3中仅示出一个)处理器302(处理器302可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)和设置为存储数据的存储器304,可选地,上述计算机终端还可以包括设置为通信功能的传输装置306以及输入输出设备308。本领域普通技术人员可以理解,图3所示的结构仅为示意,其并不对上述计算机终端的结构造成限定。例如,计算机终端还可包括比图3中所示更多或者更少的组件,或者具有与图3所示不同的配置。
存储器304可设置为存储应用软件的软件程序以及模块,如本申请实施例中的数据存储的方法对应的程序指令/模块,处理器302通过运行存储在存储器304内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器304可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器304可进一步包括相对于处理器302远程设置的存储器,这些远程存储器可以通过网络连接至计算机终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输装置306设置为经由一个网络接收或者发送数据。上述的网络具体实例可包括计算机终端的通信供应商提供的无线网络。在一个实例中,传输装置306包括一个网络适配器(Network Interface Controller,NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置306可以为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
在本实施例中提供了一种运行于上述计算机终端的数据存储的方法,图4是根据本申请实施例的数据存储的方法的流程图,如图4所示,该流程包括如下步骤:
步骤S402,获取为第一业务设置的多个热度监测配置信息;
步骤S404,依据每个热度监测配置信息分别监测所述第一业务的热度值,其中,所述热度值用于指示所述第一业务被访问的频率;
步骤S406,依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,并存储所述数据;
更改存储位置之后可以对应修改元数据信息。
通过上述步骤,通过本申请,为第一业务配置多个热度监测配置信息,依据每个热度监测配置信息中的配置对第一业务的热度进行监测,获取每个热度监测配置信息对应的热度值,然后依据该多个热度值选择存储第一业务对应数据位置,例如固态硬盘或者机械硬盘,可以是综合考虑多个热度值之后对第一业务对应数据进行迁移,也可以是独立地依据一个热度值对第一业务对应数据进行迁移,采用上述方案,一个业务配置有多个热度监测配置信息,可以更为准确及时地迁移该业务的热点数据至固态硬盘,大幅提升分级存储效率,解决了相关技术中由于热度值统计方式单一导致热点数据分级存储效果不理想的问题。
可选地,获取为第一业务设置的多个热度监测配置信息,包括:获取所述热度监测配置信息中包括的以下信息至少之一:热度更新周期、热度统计起始时间、热度统计结束时间。
可选地,依据每个热度监测配置信息分别监测所述第一业务的热度值,包括:在每个热度监测配置信息对应的热度统计开始时间至热度统计结束时间内,统计每个热度更新周期中所述第一业务被访问的第一次数;依据所述第一次数获取每个热度监测配置信息对应的所述第一业务的热度值。
可选地,依据每个热度监测配置信息分别监测所述第一业务的热度值,包括:在所述多个热度监测配置信息中的第一热度监测配置信息针对所述第一业务的第一业务目录时,依据所述第一热度监测配置信息统计所述第一业务目录中一个或多个数据分片的热度值。
可选地,依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,并存储所述数据,包括:在所述多个热 度监测配置信息为关联的热度监测配置信息时,获取每个热度监测配置信息对应的热度值和预设权重的乘积;获取所述多个热度监测配置信息的乘积的和值,依据所述和值选择存储所述第一业务对应数据的位置,并存储所述数据。
可选地,依据所述和值选择存储所述第一业务对应数据的位置,并存储所述数据,包括:在所述和值大于热度阈值时,将所述第一业务对应的数据由机械硬盘迁移至固态硬盘;在所述和值小于热度阈值时,将所述第一业务对应的数据由固态硬盘迁移至机械硬盘。
可选地,选择存储所述第一业务对应数据的位置,并存储所述数据,包括:选择存储所述第一业务的第一数据分片的副本的固态硬盘或机械硬盘;将所述副本存储至选定的固态硬盘或机械硬盘。
可选地,将所述副本迁移至固态硬盘之后,在一个热度更新周期内,统计执行所述第一业务时读取所述固态硬盘和读取机械硬盘的次数比例;在所述次数比例低于预设比例时,调整所述多个热度监测配置信息的预设权重以增加下一个热度更新周期对应的次数比例。
可选地,调整所述多个热度监测配置信息的预设权重以增加下一个热度更新周期对应的次数比例之后,通过多个热度更新周期的预设权重的调整后,检测到所述次数比例到达最大值;在所述最大值仍小于所述预设比例时,生成统计报告并告警。
可选地,依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,包括:在所述多个热度监测配置信息均为彼此独立的热度监测配置信息时,分别依据每个热度监测配置信息对应的热度值选择存储所述第一业务对应数据的位置。
可选地,依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,并存储所述数据之后,实时统计所述第一业务被访问的第二次数,在所述第二次数符合预设条件时,自动生成所述第一业务的第二热度监测配置信息。在检测到执行该第一业务的当前多 个热度监测配置信息后,第一业务的数据未能高效率的调用后,自动生成第二热度监测配置信息,用于后续的对第一业务的热度监测中,该第二热度监测配置信息的具体配置可以是向其他业务的热度监测配置信息学习的。
可选地,依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,并存储所述数据之后,在存储有数据的第一硬盘的存储状态符合预设状态时,通过以下方式至少之一释放所述第一硬盘的存储空间:将所述第一硬盘上存储的热度值低于热度阈值或者热度值最小的第二业务迁移出去;将所述第一硬盘上存储的第二业务的热度值最小的数据分片迁移出去。
下面结合本申请另一个实施例进一步说明。
鉴于上述相关技术的局限性,本申请公开了一种分布式存储系统中提高分级存储效率的方法。适用多业务场景,并且通过自适应的同时对多个时间段分别进行热点的统计分析,很好的解决了分布式存储系统分级存储在上述场景下的问题。
本申请要解决的技术问题是:一套分布式存储系统承载多种业务,不同业务具有不同的访问热点和高峰时间段,并且不论是历史热度还是当前热度,不同时间段的热度贡献值是不同的。当存储系统中有多种业务和不同高峰时间段,相关技术在热度管理方面存在调度效率低的问题。因此针对上述问题,本方案提出一种分级存储方法和装置,能够灵活的部署到分布式存储系统中,它支持多业务设置不同高峰时间段,进行独立热度管理,利用业务不同时间段热度和性能数据,自动生成多种不同时间段的关联热度监测配置信息,改进分级存储热度管理,并且自动依据热度统计生成统计数据,提供自动调整关联配置权重的方法,简化运维人员负担。
技术方案:
本方案在上述基础架构基础上,增加若干功能模块(图2中元数据服务器虚线框模块)和优化多个模块实现,以实现多业务支持时灵活热度管 理和改善热点分片高峰时间段调度问题,提出支持:
(1)多业务在分级存储中进行独立热度管理和调度;
(2)存储系统根据运行情况可以为每一个业务依据统计数据,自动生成多个时间段关联热度监测配置信息;某个业务中多个热度监测配置信息可以为独立配置,进行独立热度管理或者关联配置,进行共享热度管理。
(3)业务不同时间段的关联热度监测配置信息在系统运行过程中,提供一种自动调整关联配置权重的方法。
分级存储系统支持多个业务和多个时间段进行热度管理和调度,需要针对已有架构做出调整优化,下面依次详细介绍热度监测配置信息模块、热度管理、热度调度模块、热度统计模块、分片淘汰模块内容:
为支持本方案,热度监测配置信息模块扩展若干字段,单个业务能够增加多个热度监测配置信息,用配置编号来区分不同配置;同时分级存储系统支持配置多个业务。为此每一个热度监测配置信息相关字段包括业务标识、热度调度时间、热度计算公式、分片在SSD闪存保持时间、SSD闪存最大占用空间、热度统计起始时间、热度统计结束时间。
表1是根据本申请的热度监测配置信息中的各主要字段含义说明表格,如表1所示:
表1
Figure PCTCN2019115774-appb-000001
Figure PCTCN2019115774-appb-000002
本配置基础字段是业务标识、热度更新时间段、权重的组合。此处所述业务标识是作为业务运行在存储系统中所使用资源的标识,业务可以通过目录名、全路径、相对路径、文件前缀或者后缀格式等区分不同业务类型,均可以作为本方案实施例子。一个业务标识可以包含多个目录或者全路径。热度更新时间段可以为一天中若干时间段,如10点-14点,也可以配置为节假日(每周六周日、五一、十一)等。同一业务不同的时间段即可以配置为独立配置进行独立管理,也可以依靠系统自动生成关联配置进行共享热度管理。关联配置权重可以手工配置,也可以在系统运行过程中,自动生成关联配置时,系统自动赋予初始值,并且进行自动调整。还可以包含优选字段关联标签、分片在SSD保持时间、可编译热度计算公式等组合在一起,完善本热度管理方案。
例如分级存储系统2个独立业务目录HOT和TV,表2是根据本申请实施例的热度监测配置信息示意表格,如下表所示,4个热度监测配置信息如下:
表2
Figure PCTCN2019115774-appb-000003
Figure PCTCN2019115774-appb-000004
配置1、配置2为关联配置,都作用于业务目录HOT,共享同一个待升级、待降级列表,和同一个热度管理任务。配置3和配置4为独立配置,每个配置均具有独立的待升级、待降级列表和单独的热度管理任务。结构图如图10。
不同时间段的关联配置可以系统在运行过程中自动生成。生成规则如下:自动生成的前提是配置中已有相关业务目录的配置。它根据热度统计模块获取业务目录读性能高的若干时间段。此时间段内的性能超出平时运行预设值的1倍或者2倍。系统运行过程中,可以依据次业务目录的配置和此时间段生成一个关联配置,并设置初始权重。这样此业务在存储系统中包含多个热度监测配置信息,每一个热度监测配置信息具有一定权重。存储系统根据热度统计模块获取多个统计周期内数据,可以自动调整关联配置权重。
热度管理模块:
分级存储系统中有多个业务目录,每一个业务目录可以配置多个热度监测配置信息。针对某个业务目录,就形成在某个时间点,需要更新保存多个配置生成的热度数据,为此元数据服务器在文件分片相关元数据中增加若干原始热度字段(如h1,h2,h3),用以存储不同热度监测配置信息在同一个上报周期内原始热度信息;增加若干热度监测配置信息标签(如 tag1、tag2、tag3),,对应原始热度字段对应哪个热度监测配置信息。
当应用程序通过read和sendfile等接口读取文件时,文件访问客户端计算原始读写次数、读写分片字节数,发送给元数据服务器。元数据服务器接收到更新分片热度消息,读取当前时间,查找对应文件所属目录,进而递归查找上层目录,针对每一级目录查看是否配置业务目录热度监测配置信息,获取当前时间在热度统计范围内的配置编号。分片相关元数据中获取一个空闲热度字段,填充当前配置编号和根据本配置计算得到的热度。
同一个业务目录关联配置可以有多个,它们共享一个热度管理任务。热度管理模块会定时扫描热度监测配置信息,对于每一个独立配置启动一个单独的热度管理任务,而对于关联配置热度管理只需要启动一个热度管理任务。当热度管理任务进入运行时间后,扫描当前业务目录下的分片相关元数据,获取当前时间,例如当前时间在9点-12点,热度每小时更新一次,当热度更新任务运行时,配置1和配置2均开始生效,按照计算公式计算热度,用benefit1、bennfit2表示。那么当前分片的实际热度benefit通过下面公式(1)进行修正:
benefit=benefit1*w1+benefit2*w2,    公式(1)
在上述公式中,其中w1为配置1关联配置权重,w2为配置2关联配置权重。w1、w2初始值为0.5,即默认关联配置1和配置2地位是一样的。
关联配置中每个配置的权重是可以通过系统自动调整的。当计算实际热度时,实际热度和相关配置中的热度最接近,并将此配置统计数目增加1。当热度统计模块本周期完成运行后,统计出此业务读取SSD闪存和机械硬盘性能数据,得出本轮热度调度实际效率(可以使用业务SSD闪存实际读取数据量/业务总读取数据量)。将调度实际效率与预设想调度效率如80%相比较,如果实际效率低于预设调度效率,则将关联配置中最关联的配置权重上调10%。如此经过几个周期热度调度和热度统计后,每一个周期内依据调整规则,调整关联配置权重。当几个运行周期后(即最关联权重达到1)热度调度和实际数据统计发现调度效率小于预设调度效率, 则产生统计报告和告警,警示运维人员需要重新评估调度方案:调整热度统计时间、计算公式。
热度管理任务计算出当前分片热度后,判断热度是否大于热度阈值,若满足条件,则将其加入待升级列表。热度管理任务同时还处理已升级到SSD闪存的分片热度是否小于热度阈值,若满足条件则加入待降级队列。此处不再赘述。
热度调度模块
本模块定时取出每个热度监测配置信息,首先查看热度监测配置信息对应的待升级列表,依次取出热度最高的分片信息,检查分片所有副本是否全部仅在机械硬盘上,将满足升级条件分片的一个副本向存储服务器发送副本从机械盘迁移到SSD闪存请求;副本升级完成后,设置本分片当前升级时间点。然后,从待降级列表取出分片信息,检查分片的副本是否已经被降级到机械硬盘,并且是否已经超过SSD保持时间,将满足条件分片一个副本向存储服务器发送副本从SSD闪存迁移到机械盘请求。
热度统计模块具有的功能如下,统计每个调度周期热度监测配置信息中每一个业务目录下的所有分片读机械硬盘、SSD闪存的次数和读大小;计算热度监测配置信息目录中读分片命中SSD闪存的百分比,即热度调度效率;输出系统中不同业务目录SSD空间和分片在SSD中空间占用。上述统计信息被用来评估分级存储效率,和反馈给热度管理模块改进热度监测配置信息。
分片淘汰模块
分级存储系统多个业务目录同时进行热度调度和一个目录下有多个热度调度任务,而SSD闪存空间是有限的,会出现SSD闪存空间满,造成有些业务目录需要热度调度,但是SSD空间被其他业务占用,导致存储空间不够的问题。有两种解决方法:
①针对每一个业务或者热度监测配置信息,通过手工分配方式设置SSD最大占用空间。保证所有配置SSD空间占用最大值的累加值小于整 体SSD空间。此种方法需要提前规划业务对存储系统的要求。
②多个业务使用分级存储系统,或者多个时间段关联热度监测配置信息无法精确SSD闪存空间占用,仅使用业务规划的最大空间时,存储系统中SSD闪存空间使用会超过SSD空间阈值时,需要启动强制淘汰功能。例如存储系统SSD闪存空间为24T,HOT业务规划SSD空间最大占用为13T,TV业务规划SSD空间最大14T;或者TV多个关联配置SSD空间最大占用大于24T。当存储系统SSD空间实际占用超过SSD空间阈值时,需要针对系统中所有业务和热度监测配置信息中分片占用进行分析和释放空间。淘汰策略可以有多种,优选淘汰每一个业务中热度低于热度阈值的分片,其次淘汰每一个业务中热度值小的分片。
下面说明一种多业务和多热度监测配置信息下分片淘汰的方法,主要过程为:
(1)首先查看当前超过SSD空间阈值的业务目录,遍历目录配置的所有热度监测配置信息。将每一个热度监测配置信息中降级列表中的分片,加入到分片淘汰模块。即时触发创建新的热度调度任务。
(2)如果SSD空间占用不满足条件,则查找其他业务目录的所有热度监测配置信息。重复第一步。
(3)当SSD空间占用依然不满足条件时,需要淘汰部分在SSD闪存中未过期的分片。按照业务目录SSD空间占用排序后,依次查找目录中的文件的分片,将超过SSD保持时间的分片,加入分片淘汰模块。
(4)最后依次淘汰超过SSD空间阈值中热度小的分片。
协调调度
不同业务的热度管理和热度调度是相互独立的,这样不同业务在同一时间同时使用分级存储系统。它们共享CPU、SSD闪存、机械硬盘、网络资源。例如在HOT对应业务访问高峰时间段,进行大量TV目录对应业务大量分片热度调度,会影响HOT目录的性能稳定性。因此对多个独立的热度管理进行协调调度,防止因其他业务后台调度原因,影响业务稳定 性。主要功能有2个:
(1)接收热度统计模块通知业务高峰时间段,检查业务所有热度监测配置信息,自动生成关联热度监测配置信息,初始化初始权重。
(2)当热度统计模块发现某一个时间段内SSD闪存或者机械硬盘IO能力达到阈值,或者存储系统上报性能达到性能阈值时,会通知各个业务调度程序进行分片副本迁移速度控制。
图5是根据本申请实施例的多业务分级存储改进模块交互图,如图5所示,本方案通过增加若干管理模块和优化功能,以更好支持多业务使用同一分级存储系统,并且根据统计模块获取不同时间段自动生成关联热度监测配置信息,提供关联配置权重自动调整方法,以简化运维复杂性和提升调度效率。当元数据服务器针对相关模块优化后,支持多种业务热度管理,主要流程说明如下(见图5):
(1)元数据服务器接收分片热度信息后,查找分片的文件所属业务目录,读取当前时间,递归查找上层目录,查看目录是否进行热度监测配置信息,进而得到当前时间在热度统计范围的配置编号,更新本配置对应热度。
(2)热度管理模块定时扫描元数据的分片,获取分片的文件所属业务标识,和当前时间,进而得到业务所有独立和关联配置。检查当前时间生效的是独立配置还是关联配置,进而计算分片热度。
(3)根据业务标识和当前热度监测配置信息,查找配置对应升级、降级列表。检查分片热度大于配置热度阈值且分片所有副本均位于机械硬盘上,则将相关元数据插入到待升级列表,并且重新将待升级列表排序。如果分片热度值小于热度阈值并且有副本在SSD闪存上,则将相关元数据插入待降级列表,并重新排序待降级列表。
(4)针对每一组独立热度监测配置信息,定时启动若干热度调度任务,依次检查相应升级、降级列表。向存储服务器发送分片副本迁移请求。
图6是根据本申请实施例的多业务分级存储新增模块交互图,如图6 所示,本方案新增权重管理、协调调度、分片淘汰模块是后台功能,每个模块功能实现前面已有描述,现在展示新增模块和已有模块进行交互流程。各新增模块交互,如图6所示:
权重管理模块与热度统计、热度监测配置信息交互步骤:
(1)热度统计完成周期内业务统计信息,发送通知给权重管理模块;
(2)权重管理获取业务所有关联热度监测配置信息,检索业务相应热点统计数据,计算关联配置权重;
(3)更新关联配置权重,并存入热度监测配置信息中,进行持久化存储。
协调调度模块与热度统计、热度调度、热度监测配置信息交互,主要功能流程:
(1)定时检查检查存储系统性能、SSD、机械硬盘命中情况,当系统繁忙时,通知所有业务正在进行热度调度任务,降低迁移速度。
(2)热度统计任务发现出现业务量高峰期,超出阈值设置,通知协调调度模块。
(3)协调调度模块获取业务高峰时间段,检查业务所有热度监测配置信息,自动生成关联热度监测配置信息,并初始化初始权重,存入热度监测配置信息中。
本申请所要解决的技术问题是:一套分布式存储系统承载多种业务,不同业务具有不同的访问热点和高峰时间段,并且不论是历史热度还是当前热度,不同时间段的热度贡献值是不同的。当存储系统中有多种业务和不同高峰时间段,现有技术在热度管理方面存在不足,和调度效率低的问题。因此分布式存储系统针对上述问题,提出一种分级存储装置,能够灵活的部署到分布式存储系统中,它支持多业务,并且自动生成多种不同时间段的关联热度监测配置信息,改进分级存储热度管理,并且自动依据热度统计生成关联热度监测配置信息,提供自动调整关联配置权重的方法,简化运维人员负担。
例子一,多业务热度监测配置信息和管理
上述分级存储系统除了大视频的点播、直播业务外,还可以承载网页视频缓存、小程序应用、邮箱备份等业务。这些业务与视频点播用户群、访问平率、高峰访问时间段等有很多差异。它们不能按照统一的热度管理进行分片副本迁移。那么按照每一个业务目录配置一个基础热度监测配置信息和若干关联热度监测配置信息。本文所述业务标识是作为业务运行在存储系统中所使用的资源标识,业务还可以通过全路径、相对路径、文件前缀或者后缀格式等区分不同业务类型的文件。另外所述时间段不仅仅可以每天某段时间间隔(每天9点-11点),还可以按照天配置为节假日,如周六、周日、国庆(十月一日至十月七日)。例如针对另外一个邮箱业务,在同一存储系统增加如下配置:
配置4,作为MAIL应用基础热度监测配置信息,热度统计时间段每天8点-18点,热度更新周期为每一小时。
配置5,作为配置4的关联配置,热度统计时间段为早8点-早9点半,热度更新周期为每30分钟。
具体配置方式通过人机交互命令,或者交互界面。存储系统新增业务时,除了增加业务路径外,还需要执行多业务分级存储热度监测配置信息。下面介绍存储系统中多业务热度监测配置信息热度交互界面,例如增加业务TV时,增加热度监测配置信息部分参数配置如下图7所示,图7是根据本申请例子一的多业务热度监测配置信息界面示意图。
图8是根据本申请例子二的分级存储多业务列表示意图,如图8所示,下面展示存储系统中包含多个业务配置列表。
热度监测配置信息1为Mail业务,配置2、3为TV配置,是关联热度监测配置信息。配置1是独立热度监测配置信息,配置2、3是关联配置共享热度管理。不同业务之间的热度管理和热度调度是相互独立的,这样不同业务都可以同时使用分级存储系统。分级存储系统为了能够提供稳定的访问性能,和更好的控制系统硬件,需要对多个独立的热度管理进行 协调调度。多个业务在存储系统中共享CPU、SSD闪存、机械硬盘、网络资源,不能因后台热度调度原因造成业务运行稳定性下降。
当热度统计模块发现某一个时间段内SSD闪存或者机械硬盘IO能力达到阈值,或者存储系统上报性能达到性能阈值时,会通知各个业务调度程序进行分片副本迁移速度控制。更常见因素是在某个业务高峰期,如电视观众在19点-20点点播节目时,其他业务在此时间内的需要降低热度管理和调度的速度。
例子二,关联热度监测配置信息生成
以内容分发网络为例,它通常提供用户直播、点播等业务,使用分级存储系统以提供高性能读IO和大容量能力。业务对存储系统主要要求有:大量读带宽、较低的延时以及较大存储容量。业务常见场景:一般时间段观众收看和点播电视节目比较平稳,但是在每天几个时间段和周末等特殊时段内集中观看节目,会触发存储系统业务高峰。以业务目录HOT为例,例如用户经常在11点-12点和晚19-21点点播节目,此时存储系统压力较大。如果能将很热的分片调度到SSD闪存中,那么可以提高存储系统的吞吐能力和较低延时。这段时间我们称作高峰期。其他时间段用户点播节目,存储系统的业务平稳。高峰期的热度管理和平时业务热度具有很大不同,不能以一套标准来判定。应用本方案,针对HOT目录可以配置3个以上的热度监测配置信息,如下:
配置1,热度统计时段(起始时间、结束时间,下同)配置为每天11-12点,热度更新时间为每半个小时,计算公式等不做特殊说明,以默认配置为例。
配置2,作为HOT目录基础热度监测配置信息,主要应用平常时段业务,热度统计时间段为早8点-晚23点,热度更新时间为每一个小时。
配置3,热度统计时间段为晚18-22点,热度更新时间为每半个小时。
说明:配置1作为独立配置。配置2、配置3设置为关联配置,初始权重分别为0.2和0.8。SSD存储空间占用依业务规划,同一业务关联热 度监测配置信息无需精确设置此值,使用同一配置数据即可。其他配置不再赘述。
HOT业务3个配置在系统中逻辑结构如图8所示,业务目录HOT经过上述配置后,存储系统分配相应资源:生成相应待升级列表、待降级列表,创建调度任务等。其中配置1具有单独的待升级、待降级列表和热度管理任务。配置2、配置3共享一个待升级、待降级列表,并且它们有一个公共的热度管理任务会以配置2、配置3规则执行。
本实例还提供一种系统运行过程中,存储系统感知业务高峰时间段后,自动生成关联热度监测配置信息。当业务目录已有基础热度监测配置信息,系统根据热度统计模块统计业务高峰时段,生成业务目录的新增关联配置,并且设置已有热度监测配置信息和新增热度监测配置信息的权重。它可以帮助用户分析出业务高峰时段,并且生成新的关联热度监测配置信息,自动进行热度调度,简化运维人员配置复杂度。主要步骤有:
(1)系统运行一个完整热度调度周期和统计周期后,出现业务高峰时间段,超出平时访问量的预设值的N倍。并且遍历热度监测配置信息,没有发现相关时间段的关联热度监测配置信息。
(2)通知协调调度模块,生成新的关联配置。
(3)协调调度模块获取此业务目录和已存在的热度监测配置信息,和此时间段的统计信息,生成一个新增关联配置。新增关联配置热度统计时间设置为高峰业务时间段,热度更新时间等参数参照已存在热度监测配置信息,设置新增关联配置的权重。
(4)协调调度模块将新增关联配置加入到配置表中。
例子三,关联热度监测配置信息权重管理
同一个业务目录关联配置权重在存储系统初始化时指定,并且既可以在运维时修改,也可以在系统在运行过程中,依据热度统计模块数据,进行自动调整。本实例应用后,可以减少运维中参数调整和频繁升级版本。
热度统计模块统计周期内业务目录占用SSD空间和机械硬盘空间, 业务读SSD闪存和机械硬盘的次数、字节数等,各关联配置计算获取的升级分片数目等。
关联配置的权重取值范围为[0,1],初始化权重默认值等于1/关联配置数目。图9是根据本申请另例子三的权重管理流程示意图,如图9所示包括以下步骤:
步骤一,初始权重;
步骤二,热度统计任务完成对整个系统各项统计后,启动权重监控任务;
步骤三,查找热度配置中,每一组关联配置中最相关热度配置;
步骤四,设置最相关热度配置权重为原始值+增量权重W d
步骤五,下一个热度统计周期重复上述步骤,当某个热度配置权重达到阈值(如1),但是与预设调度效率,产生统计报告或者告警。
权重管理流程的具体流程可以包括:存储系统统计模块通知协调调度模块,启动权重监控任务,以固定模式调整关联配置的权重。例如它以固定步长0.1进行调整,查找关联热度监测配置信息中本统计周期内最相关热度监测配置信息。最相关热度监测配置信息指在一个预设的统计周期内,某个配置中计算得到的升级分片数目和本热度管理任务实际升级的分片数目最接近的配置。然后将最相关热度监测配置信息中增量权重W d,增加0.1。在下一个统计周期内,分析热度统计数据,进行调整权重。当最终几个运行周期(即最关联权重达到1)热度调度和实际数据统计发现调度效率小于预设调度效率,则产生统计报告和告警,自动生成高峰时间段的关联配置。
此外还支持当某个业务目录热度调度效率较平稳,超出业务性能需要时,它的相关关联配置权重设置为某个时间段内不需要调整,适用固定值。
例子四,演示分片淘汰模块。
本方案支持多个业务和单个业务配置多个热度监测配置信息。它们实 际运行过程中共享SSD闪存,且具有独立热度管理和热度调度,会使SSD空间使用和释放造成一定问题。因此增加分配淘汰模块作为辅助,平滑适应多个热度管理和热度调度。
图11是根据本申请例子四的分片淘汰主要流程示意图,如图11所示,存储系统分片淘汰模块基本流程如下:
步骤一,遍历当前业务目录所有热度配置和热度统计,以SSD空间占用排序。取SSD空间占用最大热度配置,设置为当前热度配置。
步骤二,遍历当前热度配置降级列表,将分片按照热度排序,将其加入待淘汰列表。
步骤三,即时触发创建一个新热度调度。调度结束后,检查SSD闪存占用空间释放满足条件,即退出。
步骤四,SSD空间释放不满足条件时,将存储系统所有配置热度配置目录按SSD空间排序。针对每一个热度目录重复上述步骤。
步骤五,遍历热度配置的业务目录,查找分片是否在SSD闪存,且保持时间超过配置SSD保持时间,将过期分片加入待淘汰过期列表;未过期分片,加入未过期列表,并计算占用SSD空间。
步骤六,未过期分片,加入未过期列表,并计算占用SSD空间。判断SSD闪存占用是否满足条件,否则,依次将最小热度分片加入淘汰列表,触发创建一个新热度调度。
分片淘汰流程进一步可以包括以下步骤:
(1)遍历当前业务目录的所有热度监测配置信息和热度统计,按照实际占用SSD空间排序。
(2)取占用SSD空间占用最大的热度监测配置信息。遍历待降级队列,将分片按照热度排序,并将超过SSD保持时间的分片,加入待淘汰队列(淘汰队列参见图10,图10是根据本申请例子四的分级存储多目录配置热度管理和淘汰结构图)。
(3)即时触发创建新的热度调度,通过热度调度模块,将其从SSD闪存迁移到机械硬盘。
(4)取当前业务目录下一条热度监测配置信息,重复第二步。
(5)将所有分级存储热度监测配置信息目录按照实际SSD占用空间大小排序;遍历排序后的业务目录,取其中一个业务目录设置为当前业务目录。重复第一步。
(6)触发创建新的热度调度。SSD占用空间释放满足条件,即退出。
(7)取当前SSD空间占用最多业务目录,查找目录中文件的分片,检查分片副本是否在SSD闪存上,并且比较副本升级时间和SSD保持时间是否到期。将副本过期的分片,加入待淘汰过期候选队列;将未过期分片加入待淘汰未过期候选队列,并计算占用SSD空间,按照热度从小到大排列。
(8)从待淘汰过期候选队列取出分片。加入热度调度模块的降级队列。转第6步。
(9)从待淘汰未过期候选队列取出分片,当本队列的分片空间大于满足需要淘汰空间大小,每次将队列里面最小热度的分片淘汰出来。转第6步。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
实施例二
在本实施例中还提供了一种数据存储的装置,该装置设置为实现上述 实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
根据本申请的另一个实施例,还提供了一种数据存储的装置,包括:
第一获取模块,设置为获取为第一业务设置的多个热度监测配置信息;
第二获取模块,设置为依据每个热度监测配置信息分别监测所述第一业务的热度值,其中,所述热度值用于指示所述第一业务被访问的频率;
选择模块,设置为依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,并存储所述数据。
通过本申请,为第一业务配置多个热度监测配置信息,依据每个热度监测配置信息中的配置对第一业务的热度进行监测,获取每个热度监测配置信息对应的热度值,然后依据该多个热度值选择存储第一业务对应数据位置,例如固态硬盘或者机械硬盘,可以是综合考虑多个热度值之后对第一业务对应数据进行迁移,也可以是独立地依据一个热度值对第一业务对应数据进行迁移,采用上述方案,一个业务配置有多个热度监测配置信息,可以更为准确及时地迁移该业务的热点数据至固态硬盘,大幅提升分级存储效率,解决了相关技术中由于热度值统计方式单一导致热点数据分级存储效果不理想的问题。
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述各个模块以任意组合的形式分别位于不同的处理器中。
实施例三
本申请的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:
S1,获取为第一业务设置的多个热度监测配置信息;
S2,依据每个热度监测配置信息分别监测所述第一业务的热度值,其中,所述热度值用于指示所述第一业务被访问的频率;
S3,依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,并存储所述数据。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本申请的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。
可选地,上述电子装置还可以包括传输装置以及输入输出设备,其中,该传输装置和上述处理器连接,该输入输出设备和上述处理器连接。
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
S1,获取为第一业务设置的多个热度监测配置信息;
S2,依据每个热度监测配置信息分别监测所述第一业务的热度值,其中,所述热度值用于指示所述第一业务被访问的频率;
S3,依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,并存储所述数据。
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。
通过本申请的上述实施例,为第一业务配置多个热度监测配置信息,依据每个热度监测配置信息中的配置对第一业务的热度进行监测,获取每 个热度监测配置信息对应的热度值,然后依据该多个热度值选择存储第一业务对应数据位置,例如固态硬盘或者机械硬盘,可以是综合考虑多个热度值之后对第一业务对应数据进行迁移,也可以是独立地依据一个热度值对第一业务对应数据进行迁移,采用上述方案,一个业务配置有多个热度监测配置信息,可以更为准确及时地迁移该业务的热点数据至固态硬盘,大幅提升分级存储效率,解决了相关技术中由于热度值统计方式单一导致热点数据分级存储效果不理想的问题。
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (15)

  1. 一种数据存储的方法,包括:
    获取为第一业务设置的多个热度监测配置信息;
    依据每个热度监测配置信息分别监测所述第一业务的热度值,其中,所述热度值用于指示所述第一业务被访问的频率;
    依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,并存储所述数据。
  2. 根据权利要求1所述的方法,其中,获取为第一业务设置的多个热度监测配置信息,包括:
    获取所述热度监测配置信息中包括的以下信息至少之一:
    热度更新周期、热度统计起始时间、热度统计结束时间。
  3. 根据权利要求1所述的方法,其中,依据每个热度监测配置信息分别监测所述第一业务的热度值,包括:
    在每个热度监测配置信息对应的热度统计开始时间至热度统计结束时间内,统计每个热度更新周期中所述第一业务被访问的第一次数;
    依据所述第一次数获取每个热度监测配置信息对应的所述第一业务的热度值。
  4. 根据权利要求1所述的方法,其中,依据每个热度监测配置信息分别监测所述第一业务的热度值,包括:
    在所述多个热度监测配置信息中的第一热度监测配置信息针对所述第一业务的第一业务目录时,依据所述第一热度监测配置信息统计所述第一业务目录中一个或多个数据分片的热度值。
  5. 根据权利要求1所述的方法,其中,依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,并存储所述数据,包括:
    在所述多个热度监测配置信息为关联的热度监测配置信息时,获取每个热度监测配置信息对应的热度值和预设权重的乘积;
    获取所述多个热度监测配置信息的乘积的和值,依据所述和值选择存储所述第一业务对应数据的位置,并存储所述数据。
  6. 根据权利要求5所述的方法,其中,依据所述和值选择存储所述第一业务对应数据的位置,并存储所述数据,包括:
    在所述和值大于热度阈值时,将所述第一业务对应的数据由机械硬盘迁移至固态硬盘;
    在所述和值小于热度阈值时,将所述第一业务对应的数据由固态硬盘迁移至机械硬盘。
  7. 根据权利要求1所述的方法,其中,选择存储所述第一业务对应数据的位置,并存储所述数据,包括:
    选择存储所述第一业务的第一数据分片的副本的固态硬盘或机械硬盘;
    将所述副本存储至选定的固态硬盘或机械硬盘。
  8. 根据权利要求7所述的方法,其中,将所述副本迁移至固态硬盘之后,所述方法还包括:
    在一个热度更新周期内,统计执行所述第一业务时读取所述固态硬盘和读取机械硬盘的次数比例;
    在所述次数比例低于预设比例时,调整所述多个热度监测配置信息的预设权重以增加下一个热度更新周期对应的次数比例。
  9. 根据权利要求8所述的方法,其中,调整所述多个热度监测配置信息的预设权重以增加下一个热度更新周期对应的次数比例之后,所述方法包括:
    通过多个热度更新周期的预设权重的调整后,检测到所述次数比例到达最大值;
    在所述最大值仍小于所述预设比例时,生成统计报告并告警。
  10. 根据权利要求1所述的方法,其中,依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,包括:
    在所述多个热度监测配置信息均为彼此独立的热度监测配置信息时,分别依据每个热度监测配置信息对应的热度值选择存储所述第一业务对应数据的位置。
  11. 根据权利要求1所述的方法,其中,依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,并存储所述数据之后,所述方法还包括:
    实时统计所述第一业务被访问的第二次数,在所述第二次数符合预设条件时,自动生成所述第一业务的第二热度监测配置信息。
  12. 根据权利要求1所述的方法,其中,依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,并存储所述数据之后,所述方法还包括:
    在存储有数据的第一硬盘的存储状态符合预设状态时,通过以下 方式至少之一释放所述第一硬盘的存储空间:
    将所述第一硬盘上存储的热度值低于热度阈值或者热度值最小的第二业务迁移出去;
    将所述第一硬盘上存储的第二业务的热度值最小的数据分片迁移出去。
  13. 一种数据存储的装置,包括:
    第一获取模块,设置为获取为第一业务设置的多个热度监测配置信息;
    第二获取模块,设置为依据每个热度监测配置信息分别监测所述第一业务的热度值,其中,所述热度值用于指示所述第一业务被访问的频率;
    选择模块,设置为依据所述多个热度监测配置信息对应的多个热度值,选择存储所述第一业务对应数据的位置,并存储所述数据。
  14. 一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至12任一项中所述的方法。
  15. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行所述权利要求1至12任一项中所述的方法。
PCT/CN2019/115774 2018-12-27 2019-11-05 数据存储的方法及装置 WO2020134609A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811616160.3 2018-12-27
CN201811616160.3A CN110209345A (zh) 2018-12-27 2018-12-27 数据存储的方法及装置

Publications (1)

Publication Number Publication Date
WO2020134609A1 true WO2020134609A1 (zh) 2020-07-02

Family

ID=67780027

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/115774 WO2020134609A1 (zh) 2018-12-27 2019-11-05 数据存储的方法及装置

Country Status (2)

Country Link
CN (1) CN110209345A (zh)
WO (1) WO2020134609A1 (zh)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209345A (zh) * 2018-12-27 2019-09-06 中兴通讯股份有限公司 数据存储的方法及装置
CN111309251A (zh) * 2020-01-21 2020-06-19 青梧桐有限责任公司 数据存储方法、系统、电子设备及可读存储介质
CN111400318B (zh) * 2020-03-09 2023-09-15 北京易华录信息技术股份有限公司 一种数据存储的调度策略的生成方法及装置
CN111427969B (zh) * 2020-03-18 2022-05-27 清华大学 一种分级存储系统的数据替换方法
CN113297005B (zh) * 2020-07-27 2024-01-05 阿里巴巴集团控股有限公司 数据处理方法、装置和设备
CN112559504A (zh) * 2020-12-09 2021-03-26 北京思特奇信息技术股份有限公司 一种基于数据热度的数据清理方法、装置及存储介质
CN112734103A (zh) * 2021-01-05 2021-04-30 烽火通信科技股份有限公司 一种基于时空轮序的视频冷片预测方法与装置
CN113032369A (zh) * 2021-03-26 2021-06-25 山东英信计算机技术有限公司 一种数据迁移方法、装置及介质
CN113885797B (zh) * 2021-09-24 2023-12-22 济南浪潮数据技术有限公司 一种数据存储方法、装置、设备及存储介质
CN114666121A (zh) * 2022-03-21 2022-06-24 山东鼎夏智能科技有限公司 数据监控方法及装置
CN116189896B (zh) * 2023-04-24 2023-08-08 北京快舒尔医疗技术有限公司 一种基于云端的糖尿病健康数据预警方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150347A (zh) * 2013-02-07 2013-06-12 浙江大学 基于文件热度的动态副本管理方法
CN103186350A (zh) * 2011-12-31 2013-07-03 北京快网科技有限公司 混合存储系统及热点数据块的迁移方法
CN106709068A (zh) * 2017-01-22 2017-05-24 郑州云海信息技术有限公司 一种热点数据识别方法及其装置
CN108121802A (zh) * 2017-12-22 2018-06-05 东软集团股份有限公司 网页访问的热力分析方法、装置及其设备
CN110209345A (zh) * 2018-12-27 2019-09-06 中兴通讯股份有限公司 数据存储的方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9037791B2 (en) * 2013-01-22 2015-05-19 International Business Machines Corporation Tiered caching and migration in differing granularities
CN104133643A (zh) * 2014-08-04 2014-11-05 浪潮电子信息产业股份有限公司 一种自动数据分级存储框架下提高数据迁移效率的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186350A (zh) * 2011-12-31 2013-07-03 北京快网科技有限公司 混合存储系统及热点数据块的迁移方法
CN103150347A (zh) * 2013-02-07 2013-06-12 浙江大学 基于文件热度的动态副本管理方法
CN106709068A (zh) * 2017-01-22 2017-05-24 郑州云海信息技术有限公司 一种热点数据识别方法及其装置
CN108121802A (zh) * 2017-12-22 2018-06-05 东软集团股份有限公司 网页访问的热力分析方法、装置及其设备
CN110209345A (zh) * 2018-12-27 2019-09-06 中兴通讯股份有限公司 数据存储的方法及装置

Also Published As

Publication number Publication date
CN110209345A (zh) 2019-09-06

Similar Documents

Publication Publication Date Title
WO2020134609A1 (zh) 数据存储的方法及装置
JP6731201B2 (ja) 時間ベースのノード選出方法及び装置
CN100484017C (zh) 网元管理系统中海量性能数据的统计方法
CN106550003B (zh) 负载均衡的控制方法、装置及系统
EP3285187B1 (en) Optimized merge-sorting of data retrieved from parallel storage units
US9872276B2 (en) Scheduling of software package transmissions on a multimedia broadcast multicast service channel
CN110599148B (zh) 集群数据处理方法、装置、计算机集群及可读存储介质
CN113301515B (zh) 短信通道连接的处理方法、装置、系统、设备和存储介质
WO2017075967A1 (zh) 在线媒体服务的带宽分配方法及系统
CN112165508B (zh) 一种多租户云存储请求服务的资源分配方法
WO2023109806A1 (zh) 物联网设备的活跃数据处理方法、装置及存储介质
WO2016045367A1 (zh) 一种多数据源数据融合的方法及装置
CN114157673A (zh) Cdn系统节点管理方法、装置、存储介质及电子设备
CN108574718B (zh) 一种云主机创建方法及装置
JP2019161265A (ja) 通信管理方法、通信システム及びプログラム
CN102098170B (zh) 一种数据采集优化方法及系统
CN103530335A (zh) 电力计量采集系统的入库操作方法及装置
CN111309442B (zh) 微服务容器数量的调整方法、装置、系统、介质及设备
WO2017215415A1 (zh) 一种资源控制方法、装置和iptv服务器
CN117369941A (zh) Pod调度方法和系统
WO2010000323A1 (en) Management of performance data
CN111324459A (zh) 基于日历的资源调度方法、装置、电子设备及存储介质
CN115473858A (zh) 数据传输方法和流式数据传输系统
CN114090201A (zh) 资源调度方法、装置、设备及存储介质
CN114328638A (zh) 一种基于数据库轮询的业务消息推送系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19903496

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.11.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19903496

Country of ref document: EP

Kind code of ref document: A1