WO2023155703A1 - Workload feature extraction method and apparatus - Google Patents

Workload feature extraction method and apparatus Download PDF

Info

Publication number
WO2023155703A1
WO2023155703A1 PCT/CN2023/074657 CN2023074657W WO2023155703A1 WO 2023155703 A1 WO2023155703 A1 WO 2023155703A1 CN 2023074657 W CN2023074657 W CN 2023074657W WO 2023155703 A1 WO2023155703 A1 WO 2023155703A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
workload
distribution
file system
data
Prior art date
Application number
PCT/CN2023/074657
Other languages
French (fr)
Chinese (zh)
Inventor
金季焜
鲁鹏
方维
黄茜
张
李融尚
史启权
康宁
王波超
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023155703A1 publication Critical patent/WO2023155703A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system

Definitions

  • the present application relates to the field of intelligent storage, and in particular to a workload feature extraction method and device.
  • Storage intelligence can better meet users' demands for high capacity, high reliability, high throughput, and low latency, enabling storage devices to achieve self-adaptive optimization in complex business scenarios.
  • the workload information of the storage device is indispensable, and the workload information is an important input for memory allocation, load balancing, data migration and other functional applications.
  • the workload information is not only inefficient, but also has a great impact on the performance of storage devices, and cannot meet the needs of direct use in the future.
  • the present application discloses a method and device for extracting workload features, which can realize online extraction of workload features of storage devices, effectively improve the security of user data, and also improve the extraction efficiency of load features.
  • the present application provides a method for extracting workload features, the method comprising: acquiring a first workload feature of a storage device during an input/output IO execution process; storing the first workload feature, and using the first workload feature Memory allocation of storage devices, data migration, network attached storage NAS load balancing, identification of hot and cold data blocks, prefetch policy tuning, performance bottleneck perception, load forecasting or load change perception.
  • the storage device may be, for example, a storage node in a centralized storage system or a storage node in a distributed storage system, which is not specifically limited herein.
  • the step of obtaining the first workload feature of the storage device during the IO execution process may be performed by a processor (for example, a central processing unit CPU) in the storage device running a software instruction, or it may be independent of Another chip of the processor is executed, and the chip can be an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a graphics processing unit (graphics processing unit, GPU), embedded neural network processor (neural-network processing units, NPU) and other processing chips.
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • GPU graphics processing unit
  • NPU embedded neural network processor
  • the chip may be integrated in the storage device, or placed in the storage device by inserting a card, which is not specifically limited here.
  • the storage device extracts the workload characteristics of the storage device online during the IO execution process, which not only avoids the problem of low security caused by the direct leakage of the user's workload data (for example, IO data), but also improves Extraction efficiency of workload characteristics.
  • the extracted workload characteristics can be directly used by each module of the storage device itself, so as to realize self-adaptive optimization of the storage device and help improve the intelligence of the storage system.
  • acquiring the first workload feature of the storage device includes: obtaining the first workload feature by a processor of the storage device according to the workload data in the memory.
  • the processor directly reads the workload data from the memory, and extracts the first workload feature according to the read workload data, realizing the online extraction of the workload feature.
  • the online workload feature extraction method in this application reduces the need for the storage device itself. performance impact.
  • the feature of the first workload is extracted offline without an external device.
  • the first workload feature does not need to be extracted offline by an external device, but is extracted online by the storage device itself, which not only reduces the impact on the performance of the storage device, but also improves the extraction efficiency of the workload feature.
  • the first workload feature includes at least one of a time feature, a flow feature, and a hotspot distribution feature, where the time feature is used to indicate the time interval corresponding to the IO of the block business, and the flow feature is used for Indicates the access mode of the IO flow of the block service, and the hotspot distribution feature is used to indicate the reuse distance distribution of the address block in the block service.
  • the time feature may be, for example, the total time interval, the distribution of IO time intervals, and the like.
  • the total time interval is used to indicate the total duration corresponding to the batch IO of the block service
  • the IO time interval distribution is used to indicate the time interval between IOs in the batch IO. Based on the time characteristics, the access time distribution of IO in the block business can be known.
  • Stream features include at least one of the following features: number of IO streams, IO stream length distribution, IO stream bandwidth distribution, IO stream interval distribution, IO stream space concurrency distribution, proportion of sequential streams and interval streams in IO streams, and disorder The number of IO streams.
  • the stream characteristics reflect the characteristics of IO stream access in the block business scenario.
  • the reuse distance of an address block refers to the number of non-duplicated address blocks between two adjacent accesses to the same address block.
  • the distribution of hotspots can reflect the regularity of repeated access to address blocks by IOs of the block business.
  • the first workload feature further includes at least one of the following features: IO size distribution, read-write ratio parameters, total read-write bandwidth, and the number of unaligned IOs.
  • the first workload feature includes a short-term access feature, which is used to indicate that the batch IO of the file system service is at least Access patterns in one dimension.
  • the short-term access features include at least one of the following features: number of files accessed by batch IO, file size distribution, file reuse distance distribution, file concurrent operation number distribution, etc.
  • the short-term access characteristics include at least one of the following characteristics: number of directories accessed by batch IO, directory depth and width distribution, directory reuse distance distribution, total read and write bandwidth distribution in the directory, directory operations Quantity distribution, number distribution of concurrent directory operations, sequential access sequence of files in a directory, etc.
  • the short-term access feature includes at least one of the following features: the total time interval corresponding to the batch of IOs and the distribution of time intervals.
  • the short-term access characteristics include at least one of the following characteristics: the size distribution of IO in the file, the read and write bandwidth of the IO in the file, the reuse distance distribution of the address block in the file, and the IO in the file. Stream characteristics and the order of IO streams within the file, etc.
  • the short-term access characteristics include at least one of the following characteristics: the proportion of operation commands, the distribution of host operations, and the distribution of operation modes, where the host operations are identified based on the batch IO of the file system business
  • User operations on the host can also be called aggregation operations, user operations, and so on.
  • the host operation can be, for example, cp (indicating copy), rm (indicating deleting a directory or file), rmdir (indicating deleting an empty directory), grep (indicating query) etc. on the Linux system.
  • Operation modes include but are not limited to sequential read, sequential write, random read, random write, create write, append write, overwrite, file lock, protocol lock, file system lock, etc.
  • the first workload feature also includes a global feature
  • the global feature is used to indicate the hierarchical structure distribution of the file system
  • the global feature includes at least one of the following features: the number of files in the file system, The number of directories in the file system, the distribution of directory depth in the file system, the distribution of the number of files in the directory in the file system, the distribution of file access frequency in the file system, and the distribution of directory access frequency in the file system.
  • the hierarchical structure layout of the file system is described from a global perspective, for example, the number of directories, files, and files in the directory, etc., and the overall access to the directories and files of the file system
  • the frequency enriches the expression of the workload of the file system business and improves the accuracy of the extracted workload characteristics in the file system business scenario.
  • the method further includes: compressing the first workload features, obtaining second workload features and compression parameters, the number of features included in the second workload features is less than the number of features included in the first workload features, and the compression parameters It is used to restore the characteristics of the second workload to the characteristics of the first workload; according to the characteristics of the second workload and compression parameters, perform NAS load balancing, memory allocation, data migration, hot and cold data block identification, prefetch policy optimization, performance bottleneck detection, Load forecasting or load change sensing.
  • the storage device can directly use the first workload features, such as NAS load balancing, memory allocation, data migration, hot and cold data block identification, prefetch policy optimization, load prediction, etc., to achieve self-adaptive optimization of each module.
  • the intelligence degree of the storage device is improved.
  • the method further includes: sending the second workload characteristics and compression parameters to the simulated device, so that the simulated device performs memory allocation, data movement, NAS load balancing, hot and cold data block identification, Prefetching strategy tuning, performance bottleneck perception, load forecasting or load change perception.
  • the simulation device can implement various applications according to the received second workload characteristics and compression parameters, such as memory allocation, data migration, NAS load balancing, etc., and realize offline testing and scene simulation of various scenarios.
  • the second workload feature and compression parameters are also used for the simulation device to acquire IO simulation data and verify whether the first workload feature is credible based on the IO simulation data.
  • the simulation device can first obtain the first workload feature based on the second workload feature and compression parameters, obtain IO simulation data according to the first workload feature, and further verify whether the first workload feature is credible according to the IO simulation data. Specifically, according to the IO simulation Re-extract the third workload feature from the data, and compare whether the third workload feature is consistent with the first workload feature. When the third workload feature is consistent with the first workload feature, it means that the first workload feature is more credible, and the IO simulation data can be used as a user The real workload data of the device enables the reproduction of user scenarios.
  • the simulation device can further prove the extracted first workload feature.
  • the simulation device can successfully obtain the user device through simulation using the first workload feature transmitted by the storage device. on-site data.
  • the present application provides a workload feature extraction device, which includes a processing unit, used to acquire the first workload feature of the storage device during the execution of input and output IO; a storage unit, used to store The first workload feature, the first workload feature is used for memory allocation of storage devices, data migration, network attached storage NAS load balancing, hot and cold data block identification, prefetch policy optimization, performance bottleneck perception, load prediction or load change perception.
  • the processing unit is specifically configured to: use the processor to obtain the first workload feature according to the workload data in the memory.
  • the feature of the first workload is extracted offline without an external device.
  • the first workload feature includes at least one of a time feature, a flow feature, and a hotspot distribution feature, where the time feature is used to indicate the time interval corresponding to the IO of the block business, and the flow feature is used for Indicates the access mode of the IO flow of the block service, and the hotspot distribution feature is used to indicate the reuse distance distribution of the address block in the block service.
  • the first workload feature further includes at least one of the following features: IO size distribution, read-write ratio parameters, total read-write bandwidth, and the number of unaligned IOs.
  • the first workload feature includes a short-term access feature, which is used to indicate that the batch IO of the file system service is at least Access patterns in one dimension.
  • the first workload feature also includes a global feature
  • the global feature is used to indicate the hierarchical structure distribution of the file system
  • the global feature includes at least one of the following features: the number of files in the file system, the number of files The number of directories in the system, the directory depth distribution of the file system, the number of files in the directory of the file system, the file access frequency distribution of the file system, and the directory access frequency distribution of the file system.
  • the processing unit is further configured to: compress the first workload feature, obtain the second workload feature and compression parameters, the number of features included in the second workload feature is less than the number of features included in the first workload feature, and compress the parameter It is used to restore the characteristics of the second workload to the characteristics of the first workload; according to the characteristics of the second workload and compression parameters, perform NAS load balancing, memory allocation, data migration, hot and cold data block identification, prefetch policy optimization, performance bottleneck detection, Load forecasting or load change sensing.
  • the apparatus further includes: a sending unit, configured to send the second workload characteristics and compression parameters to the simulation device, so that the simulation device performs memory allocation, data movement, and network attached storage NAS load according to the second workload characteristics and compression parameters. Balancing, identification of hot and cold data blocks, prefetch policy tuning, performance bottleneck perception, load forecasting or load change perception.
  • the second workload feature and compression parameters are also used for the simulation device to acquire IO simulation data and verify whether the first workload feature is credible based on the IO simulation data.
  • the present application provides a device, which includes a processor and a memory, wherein the memory is used to store program instructions; the processor invokes the program instructions in the memory, so that the device executes the first aspect or the first aspect.
  • a device which includes a processor and a memory, wherein the memory is used to store program instructions; the processor invokes the program instructions in the memory, so that the device executes the first aspect or the first aspect.
  • the present application provides a computer-readable storage medium, including computer instructions.
  • the computer instructions are executed by a processor, the above-mentioned first aspect or any possible implementation of the first aspect can be realized. method.
  • the present application provides a computer program product.
  • the computer program product When the computer program product is executed by a processor, the method in the above-mentioned first aspect or any possible embodiment of the first aspect is implemented.
  • the computer program product can be, for example, a software installation package. If the method provided by any possible design of the first aspect above needs to be used, the computer program product can be downloaded and executed on the processor. , so as to implement the first aspect or the method in any possible embodiment of the first aspect.
  • the present application provides a system, the system includes a processor, a chip, and a memory, wherein the processor and/or the chip is used to obtain the first workload workload feature of the storage device during the execution of the input and output IO ;
  • the memory is used to store the first workload feature, and the first workload feature is used for memory allocation of storage devices, data movement, network attached storage NAS load balancing, hot and cold data block identification, prefetch policy optimization, performance bottleneck perception, load prediction or load change sensing.
  • the processor and/or the chip is further configured to compress the first workload features to obtain second workload features and compression parameters, the number of features included in the second workload features is less than the number of features included in the first workload features, Compression parameters are used to restore the second workload characteristics to the first workload characteristics; according to the second workload characteristics and compression parameters, perform NAS load balancing, memory allocation, data migration, hot and cold data block identification, prefetch policy tuning, Performance bottleneck awareness, load forecasting, or load change awareness.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of modules of a storage device provided in an embodiment of the present application.
  • FIG. 3 is a flow chart of a workload feature extraction method provided in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a block access sequence provided by an embodiment of the present application.
  • FIG. 5 is a flow chart of a workload feature extraction method provided in an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a functional structure of a storage device provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a storage device provided by an embodiment of the present application.
  • a description such as "at least one (or at least one) of a1, a2, ... and an” is used, including any one of a1, a2, ... and an.
  • the case of being alone also includes the case of any combination of any number of a1, a2, ... and an, and each case can exist alone.
  • the description of "at least one of a, b, and c" includes a alone, b alone, c alone, a combination of a and b, a combination of a and c, a combination of b and c, or a combination of abc Condition.
  • Block storage means that data is stored in fixed-size data blocks, and each data block is assigned a number for addressing.
  • Block storage often adopts a storage area network (Storage Area Network, SAN) architecture.
  • SAN is a storage architecture that connects storage devices and application servers through a network, and this network is dedicated to access between hosts and storage devices. When there is a demand for data access, the data can be transmitted at high speed between the server and the background storage device through the storage area network.
  • SAN provides block-level storage services, which can effectively improve data transmission efficiency and read/write speed.
  • File storage refers to the way of storing data in the form of files.
  • File storage often adopts a network-attached storage (Network-Attached Storage, NAS) architecture to provide file-level data access and sharing services.
  • NAS Network-Attached Storage
  • NAS is implemented by installing a file system on a storage device and sharing storage space in the form of a file directory.
  • the feature of NAS is that it includes a file system and an operating system, and can run completely independently. It is a file-level shared storage device with low cost and integrated software and hardware.
  • file to organize data in the computer
  • data for the same purpose can be composed of different types of files according to the structure required by different applications. Different suffixes are usually used to refer to different types, and each file has a corresponding file name. And when there are many files, the files can be grouped, and each group of files is placed in the same directory (or folder). In addition, there may be a subdirectory (subdirectory or subfolder) under the directory except files, and all files and directories form a tree structure.
  • This tree structure is called: File System (File System), the file system defines the necessary data structure and disk data management methods when storing files on disk.
  • FAT/FAT32/NTFS of Windows EXT2/EXT3/EXT4/XFS/BtrFS of Linux, hdfs file system of Hadoop, etc.
  • the reuse distance refers to the number of unique data separated between two adjacent accesses to the same memory data.
  • the reuse distance from the current visit to the next visit is called the forward reuse distance (next reuse distance, NRD), and the reuse distance from the current visit to the last visit is called the backward reuse distance (previous reuse distance, PRD).
  • the reuse distance generally refers to the forward reuse distance.
  • a stream is a sequence of bytes with a starting point and an ending point, and information is sent in a first-in, first-out manner.
  • the IO stream can be divided into input stream (InputStream) and output stream (OutputStream) according to the flow direction.
  • the input stream means that the data in the hard disk is input to the memory
  • the output stream means that the data in the memory is output to the hard disk.
  • the IO stream can be divided into a byte stream and a character stream according to the size of the data processing unit.
  • the byte stream is a stream for reading and writing data in units of bytes
  • the character stream is a stream for reading and writing data in units of characters.
  • the IO stream may also have other classification methods, which are not specifically limited here.
  • collecting workload data of the user equipment is one of effective ways to perceive the workload of the user equipment. For example, a log storing workload data (for example, IO data) is copied from the storage device, and a workload feature of the storage device is obtained based on the workload data in the log.
  • workload data for example, IO data
  • the data copy process takes a long time, which not only affects the performance of the storage device itself, but also involves customer privacy and security issues.
  • the workload characteristics of storage devices obtained based on workload data are relatively simple, such as IO read/write ratio, stream ratio, stream bandwidth, etc., which cannot accurately represent the workload of storage devices.
  • the embodiment of this application proposes a workload feature extraction method, which can achieve the premise of reducing performance consumption and storage space consumption as much as possible. Under this condition, the workload characteristics of storage devices can be extracted efficiently and accurately, and this method has good applicability.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the system can be used to extract the workload characteristics of the storage device and perform IO flow simulation based on the workload characteristics of the storage device.
  • the system includes a storage device and an emulation device, where the storage device and the emulation device can communicate in a wireless or wired manner.
  • the storage device may be a storage node in a centralized storage system, or a storage node in a distributed storage system.
  • the simulated device may be a device with a computing function, for example, a server deployed on the network side, or a component or chip in the server.
  • the network-side device may be deployed in a cloud environment, that is, a cloud computing server, or the network-side device may also be deployed in an edge environment, that is, an edge computing server.
  • the network side device may be one integrated device, or multiple distributed devices, which is not specifically limited in this embodiment of the present application.
  • the storage device is used to extract its own workload characteristics online, and based on the extracted workload characteristics, it can perform optimal memory allocation, NAS load balancing, data migration, hot and cold data block identification, prefetch policy optimization, performance bottleneck perception, At least one of operations such as load forecasting or load change sensing.
  • the storage device is also used to send the extracted workload characteristics to the simulation device, so that the simulation device performs IO flow simulation according to the extracted workload characteristics, and reproduces the load scenario of the storage device.
  • the storage device may first compress the extracted workload characteristics to obtain the compressed workload characteristics , the compressed workload features still have the field names of each feature and are interpretable, and then the The compressed workload characteristics are sent to the simulated device.
  • the simulation device is used to receive the workload characteristics sent by the storage device, and perform IO flow simulation based on the received workload characteristics to generate the IO simulation data of the storage device, so as to realize the reproduction of the real business scenario of the storage device.
  • the simulated device receives the compressed workload characteristics from the storage device, and the simulated device needs to decompress and restore the compressed workload characteristics before performing the IO flow simulation operation to obtain the uncompressed workload characteristics. Compressed workload characteristics.
  • the simulation device can also perform memory allocation, load balancing, data movement, hot and cold data block identification, prefetch policy tuning, performance bottleneck perception, load prediction or load change according to the received workload characteristics perception.
  • the storage device when a storage device is deployed in a distributed storage system, if the storage device is a management node in the distributed storage system, in addition to extracting the workload characteristics of its own device, the storage device is also used to Summarize the workload characteristics of other storage nodes in the distributed storage system, for example, receive the workload characteristics extracted by other storage nodes, and compress the workload characteristics of each storage node separately, and send the compressed workload characteristics to emulated device.
  • FIG. 1 is only an exemplary architecture diagram, but does not limit the number of network elements included in the system shown in FIG. 1 .
  • FIG. 1 may also include other functional entities.
  • the method provided in the embodiment of the present application can be applied to the system shown in FIG. 1 , and of course the method provided in the embodiment of the present application can also be applied to other systems, which is not limited in the embodiment of the present application.
  • FIG. 2 exemplarily shows a block diagram of a storage device.
  • the storage device can be used to extract workload characteristics.
  • the storage device at least includes a data acquisition module, an online feature extraction module, a feature query interface and an optimization module.
  • the data collection module is used to obtain the workload data from the memory of the storage device and input the obtained workload data to the online feature extraction module.
  • the online feature extraction module includes a feature calculation module and a feature processing module.
  • the feature calculation module can perform multi-type (for example, block business and file system business) and multi-granularity (for example, logical unit number, file system, etc.) features based on workload data Extract and calculate, obtain the workload feature of the storage device and input the workload feature of the storage device to the feature processing module, and the feature processing module is used to compress the workload feature to obtain the compressed workload feature.
  • the feature query interface is used as a unified entrance for the tuning module to access the online feature extraction module, and the feature query interface is used to realize data transmission between the tuning module and the online feature extraction module.
  • the tuning module can obtain workload features through the feature query interface and perform algorithm tuning, such as optimal memory allocation, NAS load balancing, identification of hot and cold data, and optimization of data migration strategies.
  • the tuning module includes but is not limited to a memory allocation module, a NAS load regulation module, a garbage collection module, and the like.
  • the memory allocation module in the tuning module obtains the workload feature from the online feature extraction module through the feature query interface, and optimizes the resource allocation strategy based on the workload feature to obtain an optimal memory allocation strategy.
  • the online feature extraction module may also perform persistent storage on the workload features, or store the workload features in a corresponding database.
  • the storage device further includes a data return module.
  • the data return module is used to receive the workload feature sent by the online feature extraction module, and send the workload feature to an external device (for example, a simulation device), so that the external device can realize the IO flow simulation of the offline real business based on the workload feature. It is used to assist fault location on the live network, test various offline scenarios, and evaluate algorithm strategies.
  • the workload feature input by the online feature extraction module to the data return module can be: the workload feature compressed by the feature processing module, or the uncompressed workload feature output by the feature calculation module.
  • the embodiments of the present application do not make specific limitations.
  • each module of the storage device shown in FIG. 2 mainly consumes computing power of a core central processing unit (central processing unit, CPU).
  • CPU central processing unit
  • the storage device shown in Figure 2 may also include a service distribution deployment module, which is used to implement core CPU computing For example, by distributing and deploying network cards, data processing units (DPUs, DPUs), chips, etc. to storage devices to expand the computing power of storage devices.
  • a service distribution deployment module which is used to implement core CPU computing For example, by distributing and deploying network cards, data processing units (DPUs, DPUs), chips, etc.
  • each module shown in FIG. 2 may be implemented by software, hardware, or a combination of hardware and software.
  • FIG. 3 is a flowchart of a workload feature extraction method provided by an embodiment of the present application, which is applied to a storage device. The method includes but is not limited to the following steps:
  • S101 Acquire a first workload feature of a storage device during an input/output IO execution process.
  • the storage device acquires the first workload feature during the IO execution process, which means that the storage device extracts the first workload feature of its own device in an online manner.
  • the storage device extracts different workload characteristics in different business scenarios.
  • the embodiment of the present application mainly introduces the extraction of workload features of the storage device in two business scenarios, one of which is a block business, and the other is a file system business.
  • the first type block business
  • the first workload features include at least one of time features, flow features, and hotspot distribution features:
  • the time feature is used to indicate the time interval corresponding to the input and output IO of the block service.
  • the time feature specifically includes: a total time interval and a distribution of IO time intervals.
  • the total time interval is used to indicate the total duration corresponding to the batch IO of the block service
  • the IO time interval distribution is used to indicate the time interval between IOs in the batch IO of the block service.
  • the total time interval may be 10ms
  • the batch IO is the number of IOs contained within 10ms
  • the IO time interval distribution indicates the time interval of each IO within 10ms.
  • the total time interval may also be 12ms, 20ms or other values.
  • the flow feature is used to indicate the access mode of the IO flow of the block service.
  • the IO stream includes multiple IOs.
  • the distance interval between any two adjacent IOs is smaller than a preset interval threshold and the access time interval between any two adjacent IOs is smaller than a preset time threshold.
  • the IO stream can be a sequential stream or an interval stream.
  • Each IO in the sequential stream is a sequential IO.
  • the sequential IO means that the read and write operations access data from adjacent addresses one by one based on the logical block.
  • the sequential stream The distance interval between any two adjacent IOs in the interval stream is 0; each IO in the interval stream can be a random IO.
  • the distance interval between adjacent IOs is not 0. In other words, each IO in the sequential stream is continuous and uninterrupted in logical address, while each IO in the interval stream is intermittent in logical address.
  • the stream features include at least one of the following features:
  • Number of IO streams used to indicate the number of IO streams in the batch IO.
  • IO flow length distribution used to indicate the length of each IO flow of the batch IO, the length of the IO flow is the IO flow contained The number of IOs.
  • IO flow bandwidth distribution used to indicate the bandwidth of each IO flow of the batch IO.
  • IO flow interval distribution used to indicate the distance between each IO flow of a batch of IO.
  • Proportion of sequence flow and interval flow used to indicate the ratio of the number of sequence flow to the number of interval flow in the batch IO.
  • the number of out-of-order IO streams used to indicate the number of out-of-order IO streams in the batch IO.
  • Spatial concurrency distribution of IO streams used to indicate the spatial concurrency of IO streams in batch IO.
  • the concurrency distribution of the IO stream space may be expressed as at least one of the following: the distribution of the number of IO streams contained in an address block of a preset size; the number of address blocks of a preset size accessed by an IO stream.
  • the address block with a preset size can be a block with a size of 4M (abbreviated as 4M block), and the feature of concurrent distribution of IO stream space specifically includes: the distribution of the number of IO streams contained in a 4M block and the IO stream in batch IO The number of 4M blocks accessed.
  • 4M block a block with a size of 4M (abbreviated as 4M block)
  • the feature of concurrent distribution of IO stream space specifically includes: the distribution of the number of IO streams contained in a 4M block and the IO stream in batch IO The number of 4M blocks accessed.
  • the hot spot distribution feature is used to indicate the reuse distance distribution of the address blocks accessed by the IO in the block business.
  • the hotspot distribution feature includes the distribution of reuse distances of address blocks, and the reuse distance of address blocks refers to the number of non-duplicated address blocks between two adjacent accesses to the same address block.
  • the reuse distance distribution please refer to the following description about the reuse distance. For the sake of brevity, details are not repeated here.
  • the reuse distance distribution of address blocks can be classified according to the index. For example, the reuse distance distribution of address blocks with an interval distance of 2048 IO accesses and the reuse distance of address blocks with an interval distance of 4096 IO accesses can be counted sequentially. The distance distribution, the reuse distance distribution of the address blocks accessed by the IO within the interval distance of 8192, ..., so that the distribution of the address blocks frequented by IO can be characterized based on different interval distance scales.
  • the hotspot distribution feature may also include the number of non-duplicated IOs in the batch of IOs.
  • the number of non-duplicated IOs refers to the number of IOs that access different address blocks.
  • the first workload feature further includes at least one of the following features: IO size distribution, read-write ratio parameter, total read-write bandwidth, number of unaligned IOs, and Second input and output (Input/Output Per Second, IOPS).
  • the IO size distribution is used to indicate the size of each IO in the batch IO
  • the read-write ratio parameter is used to indicate the ratio of the number of read IOs in the batch IO to the number of write IOs
  • the number of unaligned IOs is used to indicate The number of IOs with misaligned length and/or the number of IOs with misaligned offset.
  • unaligned IO may cause read amplification or write amplification during the read and write process, resulting in increased consumption of disk IO, which not only reduces the read and write efficiency of the disk, but also affects the performance of the disk.
  • IO alignment that is, length alignment and offset alignment
  • IO alignment can effectively save disk IO consumption and improve disk read and write efficiency. It can be seen that the number of unaligned IOs can be used to analyze the performance of the disk.
  • the second type file system business
  • the first workload feature includes a short-term access feature, which is used to indicate the batch IO of the file system business in the directory, file, time, IO and The access pattern on at least one dimension in the operation.
  • short-term access characteristics include at least one of the following characteristics: number of directories, distribution of directory depth and width, distribution of directory reuse distance, distribution of total read and write bandwidth in a directory, distribution of metadata operations corresponding to a directory, The distribution of the number of directory operations, the distribution of the number of concurrent directory operations, and the sequential access sequence of files in the directory.
  • the directory depth refers to the length of the subdirectories nested in the directory
  • the directory width refers to the number of files in the directory
  • the directory reuse distance is used to indicate the number of distinct directories between two consecutive accesses to the same directory.
  • the so-called same directory means that the identifiers (IDs) of the directories are the same, and each directory in the file system has a corresponding directory ID.
  • Operations on directories include but are not limited to viewing, querying, copying, switching, creating, deleting, cutting, renaming, changing attributes, etc.
  • the number of concurrent directory operations refers to the number of clients operating on the same directory at the same time.
  • the sequential access sequence of files in a directory can be used to analyze the correlation between files in the directory, which is beneficial to the prefetching of the sequential flow of files in a directory in the file system business, and improves the hit rate of file access in the directory.
  • the short-term access characteristics include at least one of the following characteristics: file quantity, file size distribution, file reuse distance distribution, file operation quantity distribution and file concurrent operation quantity distribution.
  • the number of files includes but is not limited to the total number of files accessed by batch IO, the number of files corresponding to read IO in batch IO, the number of files corresponding to write IO in batch IO, and the number of files corresponding to each operation mode in batch IO , the operation modes include sequential read, random read, sequential write, random write, create write, append write, overwrite write, file lock, etc.
  • the file reuse distance is used to indicate the number of unique files between two adjacent accesses to the same file.
  • the so-called same file refers to the same file ID, and each file in the file system has a corresponding file ID.
  • Operations on files include but are not limited to: viewing, querying, copying, switching, creating, deleting, cutting, renaming, changing attributes, etc.
  • the number of concurrent file operations refers to the number of clients operating on the same file at the same time.
  • the short-term access feature includes at least one of the following features: total time interval and time interval distribution, wherein the total time interval is used to indicate the total duration corresponding to the batch IO of the file system business, and the time interval The distribution is used to indicate the time interval between each file IO in the batch IO of the file system service.
  • short-term access features include at least one of the following features: IO size distribution in a file, read and write bandwidth of IO in a file, reuse distance distribution of address blocks in a file, and IO flow in a file Characteristic and sequentiality of IO streams within a file.
  • the read-write bandwidth of the IO in the file includes at least one of the total read-write bandwidth and the distribution of the read-write bandwidth of a single file.
  • the reuse distance of the address block in the file is used to indicate the number of non-duplicated address blocks between two adjacent accesses to the same address block in the same file.
  • the characteristics of the IO streams in the file include at least one of the number of IO streams in the file, the length distribution of the IO streams in the file, the bandwidth distribution of the IO streams in the file, the interval distribution of the IO streams in the file, and the spatial concurrency distribution of the IO streams in the file. It should be noted that, for details about the characteristics of the IO flow in the file, reference may be made to the relevant description of the above-mentioned flow characteristics of the block service, which will not be repeated here.
  • the sequence degree of the IO stream in the file is used to indicate the sequence of the IO stream in the file.
  • the order degree of the IO stream in the file can be obtained by weighted calculation according to the length of the IO stream in the file and the interval of the IO stream in the file, wherein, when other parameters remain unchanged, the greater the length of the IO stream in the file, the greater the length of the IO stream in the file.
  • the greater the sequence degree of the IO stream the stronger the sequence of the IO stream in the file; when other parameters remain unchanged, the larger the interval of the IO stream in the file, the smaller the sequence degree of the IO stream in the file, the stronger the sequence of the IO stream in the file. The less sequential the stream is.
  • the short-term access feature includes at least one of the following features: operation command proportion, host operation distribution, and operation mode distribution.
  • the host operation refers to the user's operation on the host identified based on the batch IO of the file system service, and may also be referred to as aggregation operation, user operation, and the like.
  • the host operation can be, for example, cp (indicates copying), rm (indicates deleting a directory or file), rmdir (indicating deleting an empty directory), grep (indicating query), cd (indicating switching), and ls (indicating viewing the current directory) on the Linux system. ), ll (indicates to view the current file), mkdir (indicates to create), mv (indicates to cut or rename), etc.
  • the operating mode distribution is used to indicate the quantity of each operating mode corresponding to the batch IO.
  • Operation modes include but are not limited to sequential read, sequential write, random read, random write, create write, append write, overwrite, file lock, protocol lock, file system lock, etc.
  • the proportion of operation commands can be the proportion of IO operations, the proportion of host operations, and so on.
  • the proportion of IO operations is the ratio of the number of IO operations in the batch IO (the corresponding fields are read read, write wirte, or look up, etc.) to the total number of IOs in the batch IO
  • the proportion of host operations is the slave batch The ratio of the number of host operations identified by the IO to the total number of IOs in the batch IO.
  • the first workload feature also includes a global feature, which is used to indicate the hierarchical structure distribution of the file system, and the global feature includes at least one of the following features: The number of files, the number of directories in the file system, the distribution of directory depth in the file system, the distribution of the number of files in the directory of the file system, the distribution of file access frequency in the file system, and the distribution of directory access frequency in the file system.
  • the global feature may also include the distribution of the number of subdirectories at each directory depth, the distribution of the number of files at each directory depth, and the distribution of file sizes at each directory depth.
  • obtaining the first workload feature of the storage device may be: the processor of the storage device obtains the first workload feature of the storage device based on the workload data in the memory. It can be seen that the first workload feature is extracted online by the storage device, not by the external device offline. In this way, the impact of copying IO data by the external device on the performance of the storage device is effectively avoided, and the extraction of the workload feature is improved. efficiency.
  • the workload data includes: the logical unit number (Logical Unit Number, LUN) of the IO access of the block business, the offset of the starting position of the IO, the size of the IO, and the type of the IO operation (for example, read operation or write operation) and other data.
  • LUN Logical Unit Number
  • the workload data includes: the identification ID of the file system, the client IP address corresponding to the IO, the file ID accessed by the IO, the directory ID accessed by the IO, and the start position offset of the IO , IO size, IO operation type (for example, read operation, write operation, or metadata operation), etc.
  • the workload features of the above-mentioned block business and the workload feature of the file system business include related features obtained based on the reuse distance, such as the reuse distance distribution characteristics of the block business, and the file reuse distance distribution and directory reuse distance of the file system business. Distribution, reuse distance distribution of address blocks within a file, etc.
  • Reuse distance is an important feature that can effectively characterize the distribution of IO hotspots.
  • the block access sequence is traversed once to obtain the reuse distance distribution of the block access sequence, which is beneficial to improve the extraction efficiency of the reuse distance distribution feature.
  • FIG. 4 is a schematic diagram of a block access sequence provided by an embodiment of the present application.
  • Figure 4 shows the block access sequence ⁇ BCACDABCBCEA ⁇ . It can be seen that block B is accessed 3 times, block C is accessed 4 times, block A is accessed 3 times, block D is accessed 1 time, block E was visited 1 time.
  • address block the access information of each logical address block (hereinafter referred to as address block). Take a pair of accesses of A (that is, A 2 and A 3 ) in the block access sequence as an example, where the superscript of A indicates the number of accesses to A:
  • the interval distance distribution information between repeated blocks, the total number of address blocks corresponding to the current address block, and the number of globally repeated address blocks can be recorded respectively.
  • the probability density function of the reuse distance is fitted. Based on the probability density function, the probability that any interval distance is repeated in the interval composed of repeated blocks can be estimated.
  • the distribution feature of the reuse distance of the block service can be extracted, and the above method is also suitable for extracting the feature related to the reuse distance of the file system service.
  • the block access sequence shown in Figure 4 can be replaced by file ID access sequence
  • the block access sequence shown in Figure 4 can be replaced by directory ID access sequence Sequence
  • each address block in the block access sequence shown in FIG. 4 is each address block in the same file.
  • the first workload feature may be stored in a memory of a storage device. It should be noted that the first workload feature extracted in a short period of time can be stored in the memory of the storage device, so that other modules of the storage device can directly obtain it from the memory when they need to use the corresponding workload feature, which is conducive to improving data security. Transmission rate.
  • the first workload characteristics of each period in the memory may also be transferred to the storage device. stored on the hard disk.
  • the storage device may also generate a visualized load profile according to the first workload feature, where the load profile includes various load features and performance bottleneck information.
  • the load profile in the block business scenario, is the load profile of the block business; in the file system business scenario, the load profile is the load profile of the file system business.
  • the load profile can be displayed at different granularities based on user requirements. For example, the granularity can be divided into storage devices, controllers, and LUNs.
  • S103 Perform memory allocation, data migration, NAS load balancing, hot and cold data block identification, prefetch policy optimization, performance bottleneck perception, load prediction or load change perception according to the first workload feature.
  • the first workload feature can provide support for the optimization of multiple modules of the storage device, and the multiple modules can be, for example, a memory allocation module, a hot and cold data identification module, a NAS load regulation module, and the like.
  • the memory allocation module can perform memory allocation, prefetch strategy optimization, etc.
  • the hot and cold data identification module can perform hot and cold data block identification, data Moving, etc.
  • the NAS load control module can perform NAS load balancing, load forecasting, etc.
  • modules in the storage device may rely on the same or different workload characteristics.
  • the workload features that the module relies on when performing the optimization operation may be some or all of the first workload features, which are not specifically limited here.
  • the memory allocation module mainly performs memory allocation according to the above-mentioned hotspot distribution characteristics and IO size distribution and other characteristics. Specifically, the memory allocation module can determine the interval distance corresponding to the hotspot IO in the batch IO based on the distribution characteristics of the hotspot, and obtain the size of the memory cache when the maximum memory cache hit rate is achieved based on the interval distance corresponding to the hotspot IO, thereby realizing memory Allocation of resources.
  • the cache hit ratio refers to the ratio of the cache hits to the total cache accesses.
  • the so-called cache hit means that the logical address to be read is located in the cache, and can be quickly read from the cache, which is called a cache hit.
  • the prefetch policy optimization may be performed according to the proportion of sequence flow and interval flow, IO flow length distribution, IO flow bandwidth distribution and IO flow interval distribution in flow characteristics.
  • a unified prefetching strategy is often adopted, but when the overall prefetching has a large waste rate, the prefetching function of each LUN will be stopped.
  • the benefit of enabling the prefetch function for each LUN is evaluated.
  • the revenue corresponding to each LUN dynamically adjusts the cache prefetch strategy to reduce the read amplification caused by the prefetch strategy as much as possible while ensuring the prefetch hit rate to prevent resource waste.
  • the NAS load balancing module is used to implement load balancing of file system services. Specifically, according to the short-term access characteristics and global characteristics of the file system, the NAS load balancing module can pre-evaluate the data volume and access frequency of the directory or file when creating a directory or file, so as to determine the directory or file to be created. The handler, implemented to match the appropriate handler for newly created directories or files. In addition, when it is determined that there is a large difference in the workload of each processor, according to the short-term access characteristics and global characteristics of the file system business, the frequently accessed directories and files can be moved to relatively idle or less workload processors. device. In this way, NAS load balancing is realized.
  • the historical access frequency and reuse distance distribution of each data block may also be determined according to the hotspot distribution feature in the first workload feature, and the historical access frequency and reuse distance distribution of the data block are used for the Data blocks are identified as hot and cold data blocks.
  • data migration may also be performed. For example, in fusion storage, try to store hot data in high-performance solid-state disk SSD and place cold data in low-performance mechanical hard disk HDD, which can save the storage space of high-performance SSD. Therefore, after identifying data block A as For hot data blocks but data block A is placed on HDD, the data corresponding to data block A can be migrated to SSD to improve data reading efficiency.
  • the storage device may also perform load prediction or load change sensing according to the extracted first workload feature, so as to estimate a future load change trend, so as to better implement load control.
  • the storage device may also perform performance bottleneck detection, stabilize write bandwidth, etc. according to the first workload feature, which is not specifically limited here.
  • FIG. 5 is a flow chart of a workload feature extraction method provided by an embodiment of the present application, which is applied to a communication system composed of a storage device and an emulation device.
  • the embodiment in Figure 5 can be a supplement to the embodiment in Figure 3, or it can be independent of Figure 3 Example.
  • the method includes but is not limited to the following steps:
  • S201 The storage device acquires its own first workload characteristic during the IO execution process. For details of this step, reference may be made to the related description of S101 in the embodiment in FIG.
  • the storage device sends the first workload feature to the emulation device.
  • the simulation device receives the first workload feature sent by the storage device.
  • the simulation device performs memory allocation, data migration, NAS load balancing, hot and cold data block identification, prefetch policy optimization, performance bottleneck perception, load prediction or load change perception according to the first workload characteristics.
  • the simulation device can execute at least one of the following applications based on the first workload feature: memory allocation, data movement, network attached storage NAS load balancing, identification of hot and cold data blocks, prefetch strategy optimization, performance bottleneck sensing, load forecasting, or load change sensing.
  • applications such as memory allocation, data migration, and NAS load balancing, please refer to the relevant description of S103 in the embodiment of FIG. 3 above.
  • the simulation device may first verify whether the extracted first workload feature is authentic, and then execute the above application if the first workload feature is determined to be authentic.
  • the simulation device determines that the extracted first workload feature is credible, which may be: the simulation device simulates the first workload feature to obtain IO simulation data; re-extracts the third workload feature according to the IO simulation data, and compares the third workload feature With the first workload feature, when the third workload feature is consistent with the first workload feature, it is determined that the first workload feature is credible, which means that the IO simulation data can be used as the real IO data corresponding to the first workload feature. It can be understood that the first workload feature is credible, indicating that the first workload feature can better characterize the workload of the storage device, and the first workload feature and the IO simulation data obtained according to the first workload feature are of reference significance.
  • the consistency between the third workload feature and the first workload feature may be that the similarity between the third workload feature and the first workload feature satisfies a preset condition, for example, the third workload feature and the first workload feature The similarity is greater than or equal to a preset threshold.
  • the simulation device determines that the extracted first workload feature is credible, which may be: the simulation device simulates the first workload feature to obtain IO simulation data; observe the performance index of the IO simulation data when running on the device to obtain the IO simulation The difference value between the performance index corresponding to the data and the performance index corresponding to the real workload data. When the difference value is less than the preset difference threshold, it is determined that the first workload feature is credible, and the IO simulation data can be used as the first workload feature corresponding real workload data.
  • the performance indicators may be IOPS, IO delay, cache hit IO quantity, cache prefetch IO quantity, and so on.
  • the simulation device can also assist performance fault location, offline testing, algorithm strategy evaluation, etc. based on the first workload feature.
  • the storage device may also compress the first workload feature, and then send the compressed workload feature to the simulation device.
  • the above S202 and S203 may not be executed, but the following S204-S206 are executed:
  • S204 The storage device compresses the first workload feature to obtain a second workload feature and compression parameters.
  • the number of features included in the second workload feature is smaller than the number of features included in the first workload feature, and the compression parameters are used to restore the second workload feature to the first workload feature. It can be seen that compressing the first workload feature can not only reduce the consumption of storage space for storing the first workload feature, but also increase the data transmission rate.
  • the second workload feature is a compressed workload feature of the first workload feature.
  • each feature in the second workload feature is a feature that is retained after the first workload feature is compressed. It can be understood that the number of features in the second workload feature is smaller than the number of features in the first workload feature, so the data volume of the second workload feature is also smaller than the data volume of the first workload feature.
  • each feature in the second workload features corresponds to one or more features in the first workload features.
  • feature a1 in the second workload feature corresponds to feature a2 and feature a3 in the first workload feature, it means that feature a2 and feature a3 can be restored or recovered based on the compression parameters and feature a1.
  • the field names of each feature in the second workload feature remain unchanged before and after compression, so that it can be ensured that each feature in the second workload feature is still interpretable.
  • the first workload feature includes feature 1, feature 2, and feature 3, and the second workload feature obtained after compressing the first workload feature only includes feature 1.
  • feature 1 is the compressed first workload feature The feature that is reserved later, and the field name of feature 1 in the first workload feature is the same as the field name of feature 1 in the second workload feature.
  • compressing the first workload features includes: compressing the first workload features according to the similarity and/or predictability between features in the first workload features.
  • the similarity between features in the first workload feature may refer to the similarity between different features of the same batch of IOs in the first workload feature.
  • the compression process may be: for any two different features in the first workload feature, calculate the similarity between the two features, when the similarity between the two features meets the preset similarity condition , delete any one of the two features corresponding to the similarity. It should be noted that before deleting any feature corresponding to the similarity, the mapping relationship between the two features corresponding to the similarity needs to be recorded in the compression parameters.
  • feature A and feature B in the first workload feature calculate the similarity between feature A and feature B, and when the similarity between feature A and feature B is greater than or equal to the preset similarity threshold, it can be considered If feature A is similar to feature B, any one of feature A and feature B can be deleted, thereby effectively reducing the data volume of the first workload feature, thereby realizing the compression of the first workload feature.
  • the similarity between features in the first workload feature may refer to the similarity between features of different batches of IOs in the first workload feature.
  • the first workload features include feature 1, feature 2, feature 3, feature 4, feature 5, and feature 6, where feature 1, feature 2, and feature 3 belong to the first batch of IOs, and feature 4, feature 5, and feature 6 belongs to the second batch of IO
  • the field name of feature 1 is the same as that of feature 4
  • the field name of feature 2 is the same as that of feature 5
  • the field name of feature 3 is the same as that of feature 6,
  • the field name of feature 1 The content of feature 4 is the same as that of feature 4 (that is, the similarity meets the preset similarity conditions)
  • the content of feature 2 is the same as that of feature 5 (that is, the similarity meets the preset similarity conditions)
  • the content of feature 3 is the same as that of feature 6 are not the same (that is, the similarity does not meet the preset similarity conditions)
  • the three features of the first batch of IOs are all retained, because feature 4 is the same as feature 1, feature 5 is the same as feature 2, and feature 1 and Feature 2 has been reserved, so redundant features 4 and 5 can
  • compressing the first workload feature according to the predictability among the features in the first workload feature includes: determining the predictable feature in the first workload feature through the artificial intelligence model, and obtaining the artificial intelligence model parameters, and remove predictable features from workload features, where the parameters of the artificial intelligence model are included in the compressed parameters.
  • the first workload feature includes feature b1, feature b2, feature b3, and feature b4, and the artificial intelligence model determines that feature b2 can be predicted based on feature b1, and feature b4 can be predicted based on feature b3, then it can be deleted from the first workload feature Feature b2 and feature b4, or, remove feature b1 and feature b3 from the first workload feature.
  • the obtained compression parameters include conversion parameters between feature b1 and feature b2, and conversion parameters between feature b3 and feature b4.
  • the artificial intelligence model is also a single-layer neural network, a random forest (Random Forest, RF) model, a support vector machine (Support Vector Machine, SVM) model or other prediction algorithms, which are not specifically limited herein.
  • RF Random Forest
  • SVM Support Vector Machine
  • the prediction of features can be one-to-one, that is, predict another feature based on one feature, or many-to-one, that is, predict a feature based on multiple features, or one-to-many, that is, based on one feature Multiple features are predicted, which is not specifically limited in this embodiment of the present application.
  • the features of the first workload may also be compressed in combination with the similarity and predictability between features.
  • the similarity and predictability please refer to the related descriptions about the similarity and predictability above, which will not be repeated here.
  • the storage device sends the second workload characteristics and compression parameters to the emulation device.
  • the simulation device receives the second workload feature and compression parameters sent by the storage device. In this way, transmitting the second workload characteristics and compression parameters to the simulation device can effectively improve the data transmission efficiency between the storage device and the simulation device.
  • the simulation device performs memory allocation, data movement, NAS load balancing, identification of hot and cold data blocks, prefetch policy optimization, performance bottleneck perception, load prediction or load change perception according to the second workload characteristics and compression parameters.
  • the simulation device can first use the compression parameters to restore the second workload feature to the first workload feature, and perform the above-mentioned memory allocation, data movement, NAS load balancing, hot and cold data block identification, etc. according to the first workload feature. Prefetching strategy tuning, performance bottleneck perception, load forecasting or load change perception. It should be noted that for details about memory allocation, data migration, and NAS load balancing performed by the simulation device according to the characteristics of the first workload, please refer to the relevant description of S203 above. For the sake of brevity, details are omitted here.
  • the storage device extracts different workload characteristics for the block service and the file system service respectively, which can more accurately characterize the load characteristics of the storage device in different service scenarios. Compressing the extracted workload characteristics and then transmitting them to the simulation device can effectively improve the data transmission efficiency between the storage device and the simulation device, and also enable the simulation device to realize the reproduction of business scenarios based on the workload characteristics.
  • FIG. 6 is a schematic diagram of a functional structure of a storage device provided by an embodiment of the present application.
  • the storage device 30 includes a processing unit 310 and a storage unit 312 .
  • the storage device 30 may be implemented by hardware, software, or a combination of software and hardware.
  • the processing unit 310 is configured to obtain the first workload feature of the storage device during the execution of the input and output IO; the storage unit 312 is used to store the first workload feature, and the first workload feature is used for memory allocation of the storage device , data migration, network attached storage NAS load balancing, identification of hot and cold data blocks, prefetch policy tuning, performance bottleneck perception, load forecasting or load change perception.
  • the storage device 30 further includes a sending unit 314, and the sending unit 314 is configured to send the second workload characteristics and compression parameters to the simulation device, wherein the compression parameters are used to restore the second workload characteristics to the first workload feature.
  • the sending unit 314 may also be configured to send the first workload feature to the simulation device.
  • Each functional module of the storage device 30 may be used to implement the method described in the embodiment of FIG. 3 .
  • the processing unit 310 may be used to execute S101 and S103
  • the storage unit 312 may be used to execute S102
  • the sending unit 314 may be used to execute S202 or S205 in FIG. 5 .
  • Each functional module of the storage device 30 can also be used to implement the method on the storage device side described in the embodiment of FIG. 5 , and details are not repeated here for the sake of brevity.
  • each unit in the above embodiment shown in FIG. 6 may be realized by software, hardware, firmware or a combination thereof.
  • the software or firmware includes but is not limited to computer program instructions or codes, and can be executed by a hardware processor.
  • the hardware includes but not limited to various integrated circuits, such as central processing unit (central processing unit, CPU), digital signal processor (digital signal processor, DSP), field-programmable gate array (field-programmable gate array, FPGA) Or application-specific integrated circuit (ASIC).
  • the present application also provides a storage device.
  • the storage device 40 includes: a processor 401 , a communication interface 402 , a memory 403 and a bus 404 .
  • the processor 401 , the memory 403 and the communication interface 402 communicate through a bus 404 .
  • Storage device 40 may be a server or a storage device. It should be understood that the present application does not limit the number of processors and memories in the storage device 40 .
  • the bus 404 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus or the like.
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one line is used in FIG. 7 , but it does not mean that there is only one bus or one type of bus.
  • the bus 404 may include a pathway for transferring information between various components of the storage device 40 (eg, memory 403 , processor 401 , communication interface 402 ).
  • the processor 401 may include any one or more of processors such as a central processing unit (central processing unit, CPU), a microprocessor (micro processor, MP), or a digital signal processor (digital signal processor, DSP).
  • processors such as a central processing unit (central processing unit, CPU), a microprocessor (micro processor, MP), or a digital signal processor (digital signal processor, DSP).
  • the memory 403 is used to provide a storage space, in which data such as operating systems and computer programs can be stored.
  • Memory 403 can be random access memory (random access memory, RAM), erasable programmable read only memory (erasable programmable read only memory, EPROM), read-only memory (read-only memory, ROM), or portable read-only memory One or more combinations of memory (compact disc read memory, CD-ROM), etc.
  • the memory 403 may exist independently, or may be integrated inside the processor 401 .
  • Communication interface 402 may be used to provide information input or output to processor 401 .
  • the communication interface 402 can be used to receive data sent from the outside and/or send data to the outside, and can be a wired link interface such as an Ethernet cable, or a wireless link (such as Wi-Fi, Bluetooth, general wireless transmission, etc.) interface.
  • the communication interface 402 may further include a transmitter (such as a radio frequency transmitter, an antenna, etc.) or a receiver coupled with the interface.
  • the processor 401 in the storage device 40 is configured to read the computer program stored in the memory 403 to execute the aforementioned method, such as the method described in FIG. 3 or the method on the storage device side described in FIG. 5 .
  • the storage device 40 may be one or more modules in the execution subject of the method shown in FIG. 3 , and the processor 401 may be used to read one or more computer programs stored in the memory, Used to do the following:
  • the first workload feature is stored in the storage unit 312, and the first workload feature is used for memory allocation of storage devices, data migration, network attached storage NAS load balancing, hot and cold data block identification, prefetch policy optimization, performance bottleneck perception, and load prediction or load change sensing.
  • storage medium includes read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), programmable read-only memory (Programmable Read-only Memory, PROM), erasable programmable read-only memory ( Erasable Programmable Read Only Memory, EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically-Erasable Programmable Read-Only Memory, EEPROM, Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage, tape storage, or any other computer-readable medium that can be used to carry or store data.
  • Read-Only Memory Read-Only Memory
  • RAM Random Access Memory
  • PROM Programmable Read-only Memory
  • PROM Programmable Read-only Memory
  • EPROM Erasable Programmable Read Only Memory
  • OTPROM One-time Programmable Read-Only Memory
  • EEPROM Electrically-Erasable Programmable Read-Only Memory
  • CD-ROM Compact Disc Read-Only Memory
  • the essence of the technical solution of the present application or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of software products.
  • the computer program product is stored in a storage medium, including several instructions. So that a device (which may be a personal computer, a server, or a network device, a robot, a single-chip microcomputer, a chip, a robot, etc.) executes all or part of the steps of the methods described in the various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application discloses a workload feature extraction method and apparatus. The method comprises: a storage device obtaining a first workload feature thereof during an input/output (IO) execution process; and storing the first workload feature, the first workload feature being used for memory allocation, data movement, network-attached storage (NAS) load balancing, cold and hot data block identification, prefetching strategy tuning, performance bottleneck perception, load prediction or load change perception of the storage device. By implementing the present application, the storage device extracts the workload feature online, thereby facilitating improvement of the security of user data, and improving the extraction efficiency of the load feature.

Description

一种工作负载特征提取方法及装置A workload feature extraction method and device 技术领域technical field
本申请涉及智能存储领域,尤其涉及一种工作负载特征提取方法及装置。The present application relates to the field of intelligent storage, and in particular to a workload feature extraction method and device.
背景技术Background technique
随着人工智能技术发展以及硬件能力的提升,存储智能化逐渐成为一种趋势。存储智能化能更好地满足用户对高容量、高可靠性、高吞吐量和低时延的需求,使得存储设备可以在复杂业务场景下实现自适应调优。With the development of artificial intelligence technology and the improvement of hardware capabilities, intelligent storage has gradually become a trend. Storage intelligence can better meet users' demands for high capacity, high reliability, high throughput, and low latency, enabling storage devices to achieve self-adaptive optimization in complex business scenarios.
构建智能化的存储系统,存储设备的工作负载workload信息是不可或缺的,workload信息是实现内存分配,负载均衡,数据搬移等功能应用的重要输入。然而,当前获取workload信息时,不仅效率低、对存储设备的性能影响大,还无法满足未来直接使用的需求。To build an intelligent storage system, the workload information of the storage device is indispensable, and the workload information is an important input for memory allocation, load balancing, data migration and other functional applications. However, currently obtaining workload information is not only inefficient, but also has a great impact on the performance of storage devices, and cannot meet the needs of direct use in the future.
发明内容Contents of the invention
本申请公开了一种工作负载特征提取方法及装置,能够实现存储设备的工作负载特征的在线提取,有效提高了用户数据的安全度,还提高了负载特征的提取效率。The present application discloses a method and device for extracting workload features, which can realize online extraction of workload features of storage devices, effectively improve the security of user data, and also improve the extraction efficiency of load features.
第一方面,本申请提供了一种工作负载特征提取方法,该方法包括:在输入输出IO执行过程中,获取存储设备的第一工作负载workload特征;存储第一workload特征,第一workload特征用于存储设备的内存分配、数据搬移、网络附加存储NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。In a first aspect, the present application provides a method for extracting workload features, the method comprising: acquiring a first workload feature of a storage device during an input/output IO execution process; storing the first workload feature, and using the first workload feature Memory allocation of storage devices, data migration, network attached storage NAS load balancing, identification of hot and cold data blocks, prefetch policy tuning, performance bottleneck perception, load forecasting or load change perception.
上述方法应用于存储设备,存储设备例如可以是集中式存储系统中的存储节点,也可以是分布式存储系统中的存储节点,在此不作具体限定。The foregoing method is applied to a storage device, and the storage device may be, for example, a storage node in a centralized storage system or a storage node in a distributed storage system, which is not specifically limited herein.
需要说明的是,在IO执行过程中获取存储设备的第一工作负载workload特征这一步骤可以是存储设备中的处理器(例如,中央处理器CPU)运行软件指令执行的,也可以是独立于该处理器的另一芯片执行的,该芯片可以是专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、图形处理单元(graphics processing unit,GPU)、嵌入式神经网络处理器(neural-network processing units,NPU)等处理芯片。其中,该芯片可以集成在存储设备中,也可以通过插卡的方式放置于该存储设备,在此不作具体限定。It should be noted that the step of obtaining the first workload feature of the storage device during the IO execution process may be performed by a processor (for example, a central processing unit CPU) in the storage device running a software instruction, or it may be independent of Another chip of the processor is executed, and the chip can be an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a graphics processing unit (graphics processing unit, GPU), embedded neural network processor (neural-network processing units, NPU) and other processing chips. Wherein, the chip may be integrated in the storage device, or placed in the storage device by inserting a card, which is not specifically limited here.
上述方法中,存储设备在IO执行过程中在线式地提取存储设备的工作负载特征,不仅避免了因用户的工作负载数据(例如,IO数据)直接泄露导致的安全度低的问题,还提高了工作负载特征的提取效率。另外,提取出的工作负载特征可以直接供给存储设备自身的各个模块使用,实现存储设备的自适应调优,有利于提高存储系统的智能化程度。In the above method, the storage device extracts the workload characteristics of the storage device online during the IO execution process, which not only avoids the problem of low security caused by the direct leakage of the user's workload data (for example, IO data), but also improves Extraction efficiency of workload characteristics. In addition, the extracted workload characteristics can be directly used by each module of the storage device itself, so as to realize self-adaptive optimization of the storage device and help improve the intelligence of the storage system.
可选地,获取存储设备的第一工作负载workload特征,包括:存储设备的处理器根据内存中的工作负载数据,获得第一工作负载workload特征。Optionally, acquiring the first workload feature of the storage device includes: obtaining the first workload feature by a processor of the storage device according to the workload data in the memory.
实施上述实现方式,在存储设备中,处理器直接从内存读取工作负载数据,并根据读取到的工作负载数据提取第一workload特征,实现了工作负载特征的在线提取。相较于外部设备通过消耗大量时间从存储设备拷贝出全部的工作负载数据并基于工作负载数据离线提取工作负载特征这种方式,本申请中在线式地工作负载特征提取方法降低了对存储设备本身的性能的影响。 Implementing the above implementation manner, in the storage device, the processor directly reads the workload data from the memory, and extracts the first workload feature according to the read workload data, realizing the online extraction of the workload feature. Compared with the way that an external device consumes a lot of time to copy all the workload data from the storage device and extract workload features offline based on the workload data, the online workload feature extraction method in this application reduces the need for the storage device itself. performance impact.
可选地,第一workload特征无需外部设备通过离线方式提取。Optionally, the feature of the first workload is extracted offline without an external device.
实施上述实现方式,第一workload特征无需外部设备通过离线方式提取,而是通过存储设备自身在线提取,不仅降低了对存储设备的性能的影响,还提高了工作负载特征的提取效率。Implementing the above implementation method, the first workload feature does not need to be extracted offline by an external device, but is extracted online by the storage device itself, which not only reduces the impact on the performance of the storage device, but also improves the extraction efficiency of the workload feature.
可选地,在块业务场景下,第一workload特征包括时间特征、流特征和热点分布特征中的至少一项,其中,时间特征用于指示块业务的IO对应的时间间隔,流特征用于指示块业务的IO流的访问模式,热点分布特征用于指示块业务中地址块的重用距离分布。Optionally, in the block business scenario, the first workload feature includes at least one of a time feature, a flow feature, and a hotspot distribution feature, where the time feature is used to indicate the time interval corresponding to the IO of the block business, and the flow feature is used for Indicates the access mode of the IO flow of the block service, and the hotspot distribution feature is used to indicate the reuse distance distribution of the address block in the block service.
其中,时间特征例如可以是总时间间隔、IO时间间隔分布等。总时间间隔用于指示块业务的批次IO对应的总时长,IO时间间隔分布用于指示批次IO中各个IO之间的时间间隔。基于时间特征可以知晓块业务中IO的访问时间分布。Wherein, the time feature may be, for example, the total time interval, the distribution of IO time intervals, and the like. The total time interval is used to indicate the total duration corresponding to the batch IO of the block service, and the IO time interval distribution is used to indicate the time interval between IOs in the batch IO. Based on the time characteristics, the access time distribution of IO in the block business can be known.
流特征包括以下特征中的至少一项:IO流数量、IO流长度分布、IO流带宽分布、IO流间隔分布、IO流空间并发分布、IO流中顺序流与间隔流的占比和乱序IO流的数量。流特征反映了块业务场景下的IO进行流式访问的特性。Stream features include at least one of the following features: number of IO streams, IO stream length distribution, IO stream bandwidth distribution, IO stream interval distribution, IO stream space concurrency distribution, proportion of sequential streams and interval streams in IO streams, and disorder The number of IO streams. The stream characteristics reflect the characteristics of IO stream access in the block business scenario.
地址块的重用距离是指对同一地址块的相邻两次访问之间所间隔的不重复的地址块的数量。热点分布特征可以反映块业务的IO对地址块的重复访问规律。The reuse distance of an address block refers to the number of non-duplicated address blocks between two adjacent accesses to the same address block. The distribution of hotspots can reflect the regularity of repeated access to address blocks by IOs of the block business.
可选地,在块业务场景下,第一workload特征还包括下述特征中的至少一项:IO大小分布、读写占比参数、读写总带宽和非对齐IO数量。Optionally, in the block business scenario, the first workload feature further includes at least one of the following features: IO size distribution, read-write ratio parameters, total read-write bandwidth, and the number of unaligned IOs.
可选地,在文件系统业务场景下,第一workload特征包括短时访问特征,短时访问特征用于指示文件系统业务的批次IO在文件、目录、时间、文件内的IO和操作中至少一个维度上的访问模式。Optionally, in the file system service scenario, the first workload feature includes a short-term access feature, which is used to indicate that the batch IO of the file system service is at least Access patterns in one dimension.
其中,在文件这一维度上,短时访问特征包括下述特征中的至少一项:批次IO访问的文件数量、文件大小分布、文件重用距离分布、文件并发操作数量分布等。Among them, in the dimension of files, the short-term access features include at least one of the following features: number of files accessed by batch IO, file size distribution, file reuse distance distribution, file concurrent operation number distribution, etc.
在目录这一维度上,短时访问特征包括下述特征中的至少一项:批次IO访问的目录数量、目录深度以及宽度分布、目录重用距离分布、目录内读写总带宽分布、目录操作数量分布、目录并发操作数量分布和目录中文件顺序访问序列等。In the directory dimension, the short-term access characteristics include at least one of the following characteristics: number of directories accessed by batch IO, directory depth and width distribution, directory reuse distance distribution, total read and write bandwidth distribution in the directory, directory operations Quantity distribution, number distribution of concurrent directory operations, sequential access sequence of files in a directory, etc.
在时间这一维度上,短时访问特征包括下述特征中的至少一项:批次IO对应的总时间间隔以及时间间隔分布等。In the dimension of time, the short-term access feature includes at least one of the following features: the total time interval corresponding to the batch of IOs and the distribution of time intervals.
在文件内的IO这一维度上,短时访问特征包括下述特征中的至少一项:文件内IO大小分布、文件内IO的读写带宽、文件内地址块的重用距离分布、文件内IO流特征和文件内IO流的顺序度等。In the dimension of IO in the file, the short-term access characteristics include at least one of the following characteristics: the size distribution of IO in the file, the read and write bandwidth of the IO in the file, the reuse distance distribution of the address block in the file, and the IO in the file. Stream characteristics and the order of IO streams within the file, etc.
在操作这一维度上,短时访问特征包括下述特征中的至少一项:操作命令占比、主机操作分布和操作模式分布,其中,主机操作是基于文件系统业务的批次IO识别出的用户在主机上的操作,也可以称为聚合操作、用户操作等。主机操例如可以是Linux系统上的cp(表示拷贝)、rm(表示删除目录或文件)、rmdir(表示删除空目录)、grep(表示查询)等。操作模式包括但不限于顺序读、顺序写、随机读、随机写、创建写、追加写、覆盖写、文件锁、协议锁、文件系统锁等。In the dimension of operation, the short-term access characteristics include at least one of the following characteristics: the proportion of operation commands, the distribution of host operations, and the distribution of operation modes, where the host operations are identified based on the batch IO of the file system business User operations on the host can also be called aggregation operations, user operations, and so on. The host operation can be, for example, cp (indicating copy), rm (indicating deleting a directory or file), rmdir (indicating deleting an empty directory), grep (indicating query) etc. on the Linux system. Operation modes include but are not limited to sequential read, sequential write, random read, random write, create write, append write, overwrite, file lock, protocol lock, file system lock, etc.
实施上述实现方式,在文件系统业务场景下,针对短时内的批次IO,从文件、目录、时间、文件内的IO和操作这几个维度,实现对文件系统业务场景下的工作负载的多角度、多层次描述,提高了文件系统业务场景下提取的工作负载特征的精准性。Implement the above implementation method, in the file system business scenario, for batch IO in a short period of time, from the dimensions of file, directory, time, IO and operation in the file, realize the workload analysis in the file system business scenario Multi-angle and multi-level descriptions improve the accuracy of workload characteristics extracted in file system business scenarios.
可选地,在文件系统业务场景下,第一workload特征还包括全局特征,全局特征用于指示文件系统的层次结构分布,全局特征包括以下特征中的至少一种:文件系统的文件数量、 文件系统的目录数量、文件系统的目录深度分布、文件系统的目录下文件数量分布、文件系统的文件访问频率分布和文件系统的目录访问频率分布。Optionally, in the file system business scenario, the first workload feature also includes a global feature, the global feature is used to indicate the hierarchical structure distribution of the file system, and the global feature includes at least one of the following features: the number of files in the file system, The number of directories in the file system, the distribution of directory depth in the file system, the distribution of the number of files in the directory in the file system, the distribution of file access frequency in the file system, and the distribution of directory access frequency in the file system.
实施上述实现方式,从全局的角度对文件系统的层次结构布局进行了描述,例如,文件系统的目录数量、文件数量、目录内的文件数量等,以及文件系统的目录和文件在整体上的访问频率,丰富了对文件系统业务的工作负载的表达,提高了文件系统业务场景下提取的工作负载特征的精准性。Implementing the above implementation, the hierarchical structure layout of the file system is described from a global perspective, for example, the number of directories, files, and files in the directory, etc., and the overall access to the directories and files of the file system The frequency enriches the expression of the workload of the file system business and improves the accuracy of the extracted workload characteristics in the file system business scenario.
可选地,该方法还包括:对所述第一workload特征进行压缩,获得第二workload特征和压缩参数,第二workload特征包括的特征的数量小于第一workload特征包括的特征的数量,压缩参数用于将第二workload特征恢复为第一workload特征;根据第二workload特征和压缩参数,执行NAS负载均衡、内存分配、数据搬移、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。Optionally, the method further includes: compressing the first workload features, obtaining second workload features and compression parameters, the number of features included in the second workload features is less than the number of features included in the first workload features, and the compression parameters It is used to restore the characteristics of the second workload to the characteristics of the first workload; according to the characteristics of the second workload and compression parameters, perform NAS load balancing, memory allocation, data migration, hot and cold data block identification, prefetch policy optimization, performance bottleneck detection, Load forecasting or load change sensing.
实施上述实现方式,通过对第一workload特征进行压缩,不仅可以减少存储工作负载特征所需的存储空间,还减少了传输工作负载特征时的数据量,提高了存储设备与仿真设备之间数据的传输效率。另外,存储设备可以直接使用第一workload特征,例如,NAS负载均衡、内存分配、数据搬移、冷热数据块识别、预取策略调优、负载预测等,实现自身各个模块的自适应调优,提高了存储设备的智能化程度。Implementing the above implementation, by compressing the first workload feature, not only the storage space required for storing the workload feature can be reduced, but also the amount of data when transmitting the workload feature can be reduced, and the data exchange between the storage device and the simulation device can be improved. transmission efficiency. In addition, the storage device can directly use the first workload features, such as NAS load balancing, memory allocation, data migration, hot and cold data block identification, prefetch policy optimization, load prediction, etc., to achieve self-adaptive optimization of each module. The intelligence degree of the storage device is improved.
可选地,该方法还包括:向仿真设备发送第二workload特征和压缩参数,以使仿真设备根据第二workload特征和压缩参数执行内存分配、数据搬移、NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。Optionally, the method further includes: sending the second workload characteristics and compression parameters to the simulated device, so that the simulated device performs memory allocation, data movement, NAS load balancing, hot and cold data block identification, Prefetching strategy tuning, performance bottleneck perception, load forecasting or load change perception.
实施上述实现方式,仿真设备可以根据接收到的第二workload特征和压缩参数实现多种应用,例如,内存分配、数据搬移、NAS负载均衡等,实现了各类场景的线下测试以及场景模拟。Implementing the above implementation method, the simulation device can implement various applications according to the received second workload characteristics and compression parameters, such as memory allocation, data migration, NAS load balancing, etc., and realize offline testing and scene simulation of various scenarios.
可选地,第二workload特征和压缩参数还用于仿真设备获取IO仿真数据以及基于IO仿真数据验证第一workload特征是否可信。Optionally, the second workload feature and compression parameters are also used for the simulation device to acquire IO simulation data and verify whether the first workload feature is credible based on the IO simulation data.
例如,仿真设备可以先基于第二workload特征和压缩参数获得第一workload特征并根据第一workload特征获得IO仿真数据,进一步根据IO仿真数据验证第一workload特征是否可信,具体地,根据IO仿真数据重新提取第三workload特征,比较第三workload特征与第一workload特征是否一致,在第三workload特征与第一workload特征一致时,即说明第一workload特征比较可信,IO仿真数据可作为用户设备真实的工作负载数据,由此实现了用户场景的复现。For example, the simulation device can first obtain the first workload feature based on the second workload feature and compression parameters, obtain IO simulation data according to the first workload feature, and further verify whether the first workload feature is credible according to the IO simulation data. Specifically, according to the IO simulation Re-extract the third workload feature from the data, and compare whether the third workload feature is consistent with the first workload feature. When the third workload feature is consistent with the first workload feature, it means that the first workload feature is more credible, and the IO simulation data can be used as a user The real workload data of the device enables the reproduction of user scenarios.
实施上述实现方式,仿真设备还可以进一步佐证已提取的第一workload特征,在确定第一workload特征可信的情况下,仿真设备利用存储设备传输的第一workload特征,通过仿真可成功获得用户设备的现场数据。Implementing the above implementation method, the simulation device can further prove the extracted first workload feature. When it is determined that the first workload feature is credible, the simulation device can successfully obtain the user device through simulation using the first workload feature transmitted by the storage device. on-site data.
第二方面,本申请提供了一种工作负载特征提取装置,该装置包括,处理单元,用于在输入输出IO执行过程中,获取存储设备的第一工作负载workload特征;存储单元,用于存储第一workload特征,第一workload特征用于存储设备的内存分配、数据搬移、网络附加存储NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。In a second aspect, the present application provides a workload feature extraction device, which includes a processing unit, used to acquire the first workload feature of the storage device during the execution of input and output IO; a storage unit, used to store The first workload feature, the first workload feature is used for memory allocation of storage devices, data migration, network attached storage NAS load balancing, hot and cold data block identification, prefetch policy optimization, performance bottleneck perception, load prediction or load change perception.
可选地,处理单元,具体用于:通过处理器根据内存中的工作负载数据,获得第一工作负载workload特征。Optionally, the processing unit is specifically configured to: use the processor to obtain the first workload feature according to the workload data in the memory.
可选地,第一工作负载workload特征无需外部设备通过离线方式提取。 Optionally, the feature of the first workload is extracted offline without an external device.
可选地,在块业务场景下,第一workload特征包括时间特征、流特征和热点分布特征中的至少一项,其中,时间特征用于指示块业务的IO对应的时间间隔,流特征用于指示块业务的IO流的访问模式,热点分布特征用于指示块业务中地址块的重用距离分布。Optionally, in the block business scenario, the first workload feature includes at least one of a time feature, a flow feature, and a hotspot distribution feature, where the time feature is used to indicate the time interval corresponding to the IO of the block business, and the flow feature is used for Indicates the access mode of the IO flow of the block service, and the hotspot distribution feature is used to indicate the reuse distance distribution of the address block in the block service.
可选地,在块业务场景下,第一workload特征还包括下述特征中的至少一项:IO大小分布、读写占比参数、读写总带宽和非对齐IO数量。Optionally, in the block business scenario, the first workload feature further includes at least one of the following features: IO size distribution, read-write ratio parameters, total read-write bandwidth, and the number of unaligned IOs.
可选地,在文件系统业务场景下,第一workload特征包括短时访问特征,短时访问特征用于指示文件系统业务的批次IO在文件、目录、时间、文件内的IO和操作中至少一个维度上的访问模式。Optionally, in the file system service scenario, the first workload feature includes a short-term access feature, which is used to indicate that the batch IO of the file system service is at least Access patterns in one dimension.
可选地,在文件系统业务场景下,第一workload特征还包括全局特征,全局特征用于指示文件系统的层次结构分布,全局特征包括以下特征中的至少一种:文件系统的文件数量、文件系统的目录数量、文件系统的目录深度分布、文件系统的目录下文件数量分布、文件系统的文件访问频率分布和文件系统的目录访问频率分布。Optionally, in the file system business scenario, the first workload feature also includes a global feature, the global feature is used to indicate the hierarchical structure distribution of the file system, and the global feature includes at least one of the following features: the number of files in the file system, the number of files The number of directories in the system, the directory depth distribution of the file system, the number of files in the directory of the file system, the file access frequency distribution of the file system, and the directory access frequency distribution of the file system.
可选地,处理单元,还用于:对第一workload特征进行压缩,获得第二workload特征和压缩参数,第二workload特征包括的特征的数量小于第一workload特征包括的特征的数量,压缩参数用于将第二workload特征恢复为第一workload特征;根据第二workload特征和压缩参数,执行NAS负载均衡、内存分配、数据搬移、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。Optionally, the processing unit is further configured to: compress the first workload feature, obtain the second workload feature and compression parameters, the number of features included in the second workload feature is less than the number of features included in the first workload feature, and compress the parameter It is used to restore the characteristics of the second workload to the characteristics of the first workload; according to the characteristics of the second workload and compression parameters, perform NAS load balancing, memory allocation, data migration, hot and cold data block identification, prefetch policy optimization, performance bottleneck detection, Load forecasting or load change sensing.
可选地,该装置还包括:发送单元,用于向仿真设备发送第二workload特征和压缩参数,以使仿真设备根据第二workload特征和压缩参数执行内存分配、数据搬移、网络附加存储NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。Optionally, the apparatus further includes: a sending unit, configured to send the second workload characteristics and compression parameters to the simulation device, so that the simulation device performs memory allocation, data movement, and network attached storage NAS load according to the second workload characteristics and compression parameters. Balancing, identification of hot and cold data blocks, prefetch policy tuning, performance bottleneck perception, load forecasting or load change perception.
可选地,第二workload特征和压缩参数还用于仿真设备获取IO仿真数据以及基于IO仿真数据验证第一workload特征是否可信。Optionally, the second workload feature and compression parameters are also used for the simulation device to acquire IO simulation data and verify whether the first workload feature is credible based on the IO simulation data.
第三方面,本申请提供了一种装置,该装置包括处理器和存储器,其中,存储器用于存储程序指令;所述处理器调用所述存储器中的程序指令,使得装置执行第一方面或者第一方面的任一可能的实现方式中的方法。In a third aspect, the present application provides a device, which includes a processor and a memory, wherein the memory is used to store program instructions; the processor invokes the program instructions in the memory, so that the device executes the first aspect or the first aspect. A method in any possible implementation of an aspect.
第四方面,本申请提供了一种计算机可读存储介质,包括计算机指令,当所述计算机指令在被处理器运行时,实现上述第一方面或者第一方面的任一可能的实现方式中的方法。In a fourth aspect, the present application provides a computer-readable storage medium, including computer instructions. When the computer instructions are executed by a processor, the above-mentioned first aspect or any possible implementation of the first aspect can be realized. method.
第五方面,本申请提供了一种计算机程序产品,当该计算机程序产品被处理器执行时,实现上述第一方面或者第一方面的任一可能的实施例中的所述方法。该计算机程序产品,例如可以为一个软件安装包,在需要使用上述第一方面的任一种可能的设计提供的方法的情况下,可以下载该计算机程序产品并在处理器上执行该计算机程序产品,以实现第一方面或者第一方面的任一可能的实施例中的所述方法。In a fifth aspect, the present application provides a computer program product. When the computer program product is executed by a processor, the method in the above-mentioned first aspect or any possible embodiment of the first aspect is implemented. The computer program product can be, for example, a software installation package. If the method provided by any possible design of the first aspect above needs to be used, the computer program product can be downloaded and executed on the processor. , so as to implement the first aspect or the method in any possible embodiment of the first aspect.
第六方面,本申请提供了一种系统,该系统包括处理器、芯片和存储器,其中,处理器和/或芯片用于在输入输出IO执行过程中,获取存储设备的第一工作负载workload特征;存储器用于存储第一workload特征,第一workload特征用于存储设备的内存分配、数据搬移、网络附加存储NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。In a sixth aspect, the present application provides a system, the system includes a processor, a chip, and a memory, wherein the processor and/or the chip is used to obtain the first workload workload feature of the storage device during the execution of the input and output IO ;The memory is used to store the first workload feature, and the first workload feature is used for memory allocation of storage devices, data movement, network attached storage NAS load balancing, hot and cold data block identification, prefetch policy optimization, performance bottleneck perception, load prediction or load change sensing.
可选地,处理器和/或芯片还用于对第一workload特征进行压缩,获得第二workload特征和压缩参数,第二workload特征包括的特征的数量小于第一workload特征包括的特征的数量,压缩参数用于将第二workload特征恢复为第一workload特征;根据第二workload特征和压缩参数,执行NAS负载均衡、内存分配、数据搬移、冷热数据块识别、预取策略调优、 性能瓶颈感知、负载预测或负载变化感知。Optionally, the processor and/or the chip is further configured to compress the first workload features to obtain second workload features and compression parameters, the number of features included in the second workload features is less than the number of features included in the first workload features, Compression parameters are used to restore the second workload characteristics to the first workload characteristics; according to the second workload characteristics and compression parameters, perform NAS load balancing, memory allocation, data migration, hot and cold data block identification, prefetch policy tuning, Performance bottleneck awareness, load forecasting, or load change awareness.
有关第一workload特征的具体内容可参考第一方面中第一workload特征的相关叙述。For specific content about the first workload feature, refer to the related description of the first workload feature in the first aspect.
上述第二方面至第六方面的技术效果与上述第一方面相同,在此不再赘述。The technical effects of the above-mentioned second aspect to the sixth aspect are the same as those of the above-mentioned first aspect, and will not be repeated here.
附图说明Description of drawings
图1是本申请实施例提供的一种系统架构示意图;FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application;
图2是本申请实施例提供的一种存储设备的模块示意图;FIG. 2 is a schematic diagram of modules of a storage device provided in an embodiment of the present application;
图3是本申请实施例提供的一种工作负载特征提取方法的流程图;FIG. 3 is a flow chart of a workload feature extraction method provided in an embodiment of the present application;
图4是本申请实施例提供的一种块访问序列的示意图;FIG. 4 is a schematic diagram of a block access sequence provided by an embodiment of the present application;
图5是本申请实施例提供的一种工作负载特征提取方法的流程图;FIG. 5 is a flow chart of a workload feature extraction method provided in an embodiment of the present application;
图6是本申请实施例提供的一种存储设备的功能结构示意图;FIG. 6 is a schematic diagram of a functional structure of a storage device provided by an embodiment of the present application;
图7是本申请实施例提供的一种存储设备的结构示意图。FIG. 7 is a schematic structural diagram of a storage device provided by an embodiment of the present application.
具体实施方式Detailed ways
在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。本申请实施例中的说明书和权利要求书中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application. The terms "first" and "second" in the description and claims in the embodiments of the present application are used to distinguish different objects, rather than to describe a specific order.
需要说明的是,本申请实施例中采用诸如“a1、a2、……和an中的至少一项(或至少一个)”等的描述方式,包括了a1、a2、……和an中任意一个单独存在的情况,也包括了a1、a2、……和an中任意多个的任意组合情况,每种情况可以单独存在。例如,“a、b和c中的至少一项”的描述方式,包括了单独a、单独b、单独c、a和b组合、a和c组合、b和c组合,或abc三者组合的情况。It should be noted that, in the embodiment of the present application, a description such as "at least one (or at least one) of a1, a2, ... and an" is used, including any one of a1, a2, ... and an The case of being alone also includes the case of any combination of any number of a1, a2, ... and an, and each case can exist alone. For example, the description of "at least one of a, b, and c" includes a alone, b alone, c alone, a combination of a and b, a combination of a and c, a combination of b and c, or a combination of abc Condition.
为了便于理解,下面先对本申请实施例可能涉及的相关术语等进行介绍。For ease of understanding, the following first introduces related terms and the like that may be involved in the embodiments of the present application.
(1)块存储(1) block storage
块存储是指数据存储在一个个大小固定的数据块中,每个数据块分配有一个用于寻址的编号。块存储常采用存储区域网络(Storage Area Network,SAN)架构。SAN是一种通过网络连接存储设备和应用服务器的存储构架,这个网络专用于主机与存储设备之间的访问。当有数据的存取需求时,数据可以通过存储区域网络在服务器和后台存储设备之间高速传输。SAN提供块级存储服务,能有效提高数据传输效率和读写速率。Block storage means that data is stored in fixed-size data blocks, and each data block is assigned a number for addressing. Block storage often adopts a storage area network (Storage Area Network, SAN) architecture. SAN is a storage architecture that connects storage devices and application servers through a network, and this network is dedicated to access between hosts and storage devices. When there is a demand for data access, the data can be transmitted at high speed between the server and the background storage device through the storage area network. SAN provides block-level storage services, which can effectively improve data transmission efficiency and read/write speed.
(2)文件存储(2) File storage
文件存储是指以文件形式存储数据的方式。文件存储常采用网络附加存储(Network-Attached Storage,NAS)架构,用于提供文件级的数据访问和共享服务。除了可以提供文件共享服务给用户外,还能控制用户的访问权限(例如,增、删、改等)。NAS的实现方式是在存储设备上安装文件系统,并以文件目录的方式共享存储空间。NAS的特点是包含文件系统和操作系统,可完全独立自主运行,属于文件级共享存储设备,成本低,软硬件一体。File storage refers to the way of storing data in the form of files. File storage often adopts a network-attached storage (Network-Attached Storage, NAS) architecture to provide file-level data access and sharing services. In addition to providing file sharing services to users, it can also control user access rights (for example, add, delete, modify, etc.). NAS is implemented by installing a file system on a storage device and sharing storage space in the form of a file directory. The feature of NAS is that it includes a file system and an operating system, and can run completely independently. It is a file-level shared storage device with low cost and integrated software and hardware.
利用“文件”这个概念对计算机中的数据进行组织,用于同一用途的数据可以按照不同应用程序要求的结构方式组成不同类型的文件。通常用不同的后缀来指代不同的类型,且每个文件都有对应的文件名。而当有很多文件时,可以对文件进行分组,每一组文件放在同一个目录(或者叫文件夹)中。而且目录下面除了文件还可以有下一级目录(称之为子目录或者子文件夹),所有的文件、目录形成一个树状结构。这个树状结构称作:文件系统(File  System),文件系统定义了把文件存储于磁盘时所必须的数据结构以及磁盘数据的管理方式。文件系统有很多形式,例如,Windows的FAT/FAT32/NTFS,Linux的EXT2/EXT3/EXT4/XFS/BtrFS,Hadoop的hdfs文件系统等。Using the concept of "file" to organize data in the computer, data for the same purpose can be composed of different types of files according to the structure required by different applications. Different suffixes are usually used to refer to different types, and each file has a corresponding file name. And when there are many files, the files can be grouped, and each group of files is placed in the same directory (or folder). In addition, there may be a subdirectory (subdirectory or subfolder) under the directory except files, and all files and directories form a tree structure. This tree structure is called: File System (File System), the file system defines the necessary data structure and disk data management methods when storing files on disk. There are many forms of file systems, for example, FAT/FAT32/NTFS of Windows, EXT2/EXT3/EXT4/XFS/BtrFS of Linux, hdfs file system of Hadoop, etc.
(3)重用距离(3) Reuse distance
重用距离(reuse distance,RD)是指对同一内存数据的相邻两次访问之间所间隔的不重复数据的个数。当前访问到下次访问的重用距离称为前向重用距离(next reuse distance,NRD),当前访问到上次访问之间的重用距离称为后向重用距离(previous reuse distance,PRD)。不作特殊说明的情况下,重用距离一般指的是前向重用距离。The reuse distance (RD) refers to the number of unique data separated between two adjacent accesses to the same memory data. The reuse distance from the current visit to the next visit is called the forward reuse distance (next reuse distance, NRD), and the reuse distance from the current visit to the last visit is called the backward reuse distance (previous reuse distance, PRD). Unless otherwise specified, the reuse distance generally refers to the forward reuse distance.
(4)输入输出IO流(4) Input and output IO stream
流(stream)是一组有顺序的、有起点和终点的字节集合,是以先进先出的方式发送信息。IO流根据流向可分为输入流(InputStream)和输出流(OutputStream),其中,输入流是指硬盘里的数据输入至内存,输出流是指内存中的数据输出至硬盘。IO流根据数据处理单元大小可分为字节流和字符流,其中,字节流为以字节为单位读写数据的流,字符流是以字符为单位读写数据的流。除此之外,IO流还可以有其他分类方式,在此不作具体限定。A stream is a sequence of bytes with a starting point and an ending point, and information is sent in a first-in, first-out manner. The IO stream can be divided into input stream (InputStream) and output stream (OutputStream) according to the flow direction. The input stream means that the data in the hard disk is input to the memory, and the output stream means that the data in the memory is output to the hard disk. The IO stream can be divided into a byte stream and a character stream according to the size of the data processing unit. The byte stream is a stream for reading and writing data in units of bytes, and the character stream is a stream for reading and writing data in units of characters. In addition, the IO stream may also have other classification methods, which are not specifically limited here.
在一种实现方式中,采集用户设备的工作负载数据是感知用户设备的工作负载的有效方式之一。例如,从存储设备拷贝存储了工作负载数据(例如,IO数据)的日志,并基于日志中的工作负载数据获得存储设备的工作负载特征。但由于工作负载数据的数据量大,导致数据拷贝过程耗时久,不仅会影响存储设备本身的性能,还涉及客户隐私安全问题。另外,基于工作负载数据获得的存储设备的工作负载特征较为简单,例如,IO的读写占比、流占比、流带宽等,无法准确表征存储设备的工作负载。In an implementation manner, collecting workload data of the user equipment is one of effective ways to perceive the workload of the user equipment. For example, a log storing workload data (for example, IO data) is copied from the storage device, and a workload feature of the storage device is obtained based on the workload data in the log. However, due to the large amount of workload data, the data copy process takes a long time, which not only affects the performance of the storage device itself, but also involves customer privacy and security issues. In addition, the workload characteristics of storage devices obtained based on workload data are relatively simple, such as IO read/write ratio, stream ratio, stream bandwidth, etc., which cannot accurately represent the workload of storage devices.
针对上述数据采集过程影响设备性能、效率低、工作负载提取的粗粒度化等问题,本申请实施例提出一种工作负载特征提取方法,能实现以尽可能低的性能消耗以及存储空间消耗的前提下,高效、精准地提取存储设备的工作负载特征,且该方法具有较好的适用性。Aiming at the above-mentioned data collection process affecting device performance, low efficiency, and coarse-grained workload extraction, the embodiment of this application proposes a workload feature extraction method, which can achieve the premise of reducing performance consumption and storage space consumption as much as possible. Under this condition, the workload characteristics of storage devices can be extracted efficiently and accurately, and this method has good applicability.
下面将结合附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.
参见图1,图1是本申请实施例提供的一种系统架构示意图。该系统可以用于提取存储设备的工作负载特征以及基于存储设备的工作负载特征进行IO流仿真等。如图1所示,该系统包括存储设备和仿真设备,其中,存储设备与仿真设备可以通过无线或有线的方式进行通信。Referring to FIG. 1 , FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application. The system can be used to extract the workload characteristics of the storage device and perform IO flow simulation based on the workload characteristics of the storage device. As shown in FIG. 1 , the system includes a storage device and an emulation device, where the storage device and the emulation device can communicate in a wireless or wired manner.
其中,存储设备可以是集中式存储系统中的存储节点,也可以是分布式存储系统中的存储节点。仿真设备可以是具有计算功能的设备,例如可以是部署在网络侧的服务器,或者为该服务器中的组件或者芯片。网络侧设备可以部署在云环境,即云计算服务器,或者网络侧设备也可以部署在边缘环境中,即边缘计算服务器。网络侧设备可以是集成的一个设备,也可以是分布式的多个设备,本申请实施例不做具体限定。Wherein, the storage device may be a storage node in a centralized storage system, or a storage node in a distributed storage system. The simulated device may be a device with a computing function, for example, a server deployed on the network side, or a component or chip in the server. The network-side device may be deployed in a cloud environment, that is, a cloud computing server, or the network-side device may also be deployed in an edge environment, that is, an edge computing server. The network side device may be one integrated device, or multiple distributed devices, which is not specifically limited in this embodiment of the present application.
存储设备用于在线提取自身的工作负载特征,并可以基于提取出的工作负载特征执行最优内存分配、NAS负载均衡、数据搬移、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知等操作中的至少一项。存储设备还用于将提取后的工作负载特征发送给仿真设备,以使仿真设备根据提取后的工作负载特征进行IO流仿真,复现存储设备的负载场景。The storage device is used to extract its own workload characteristics online, and based on the extracted workload characteristics, it can perform optimal memory allocation, NAS load balancing, data migration, hot and cold data block identification, prefetch policy optimization, performance bottleneck perception, At least one of operations such as load forecasting or load change sensing. The storage device is also used to send the extracted workload characteristics to the simulation device, so that the simulation device performs IO flow simulation according to the extracted workload characteristics, and reproduces the load scenario of the storage device.
示例性地,为了减少存储设备传输的数据量以及进一步提高存储设备数据的安全性,存储设备在提取到工作负载特征后,可以先对提取的工作负载特征进行压缩,获得压缩后的工作负载特征,压缩后的工作负载特征仍保持有各个特征的字段名且具有可解释性,然后再将 压缩后的工作负载特征发送给仿真设备。Exemplarily, in order to reduce the amount of data transmitted by the storage device and further improve the security of the storage device data, after the storage device extracts the workload characteristics, it may first compress the extracted workload characteristics to obtain the compressed workload characteristics , the compressed workload features still have the field names of each feature and are interpretable, and then the The compressed workload characteristics are sent to the simulated device.
仿真设备用于接收存储设备发送的工作负载特征,并基于接收到的工作负载特征进行IO流仿真以生成存储设备的IO仿真数据,实现对存储设备的真实业务场景的复现。在一些可能的实施例中,仿真设备从存储设备处接收到压缩后的工作负载特征,则仿真设备在执行IO流仿真操作之前,需先将压缩后的工作负载特征进行解压还原,获得未经压缩的工作负载特征。在一些可能的实施例中,仿真设备还可以根据接收到的工作负载特征执行内存分配、负载均衡、数据搬移、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。The simulation device is used to receive the workload characteristics sent by the storage device, and perform IO flow simulation based on the received workload characteristics to generate the IO simulation data of the storage device, so as to realize the reproduction of the real business scenario of the storage device. In some possible embodiments, the simulated device receives the compressed workload characteristics from the storage device, and the simulated device needs to decompress and restore the compressed workload characteristics before performing the IO flow simulation operation to obtain the uncompressed workload characteristics. Compressed workload characteristics. In some possible embodiments, the simulation device can also perform memory allocation, load balancing, data movement, hot and cold data block identification, prefetch policy tuning, performance bottleneck perception, load prediction or load change according to the received workload characteristics perception.
在一些可能的实施例中,当存储设备部署于分布式存储系统中时,若存储设备为分布式存储系统中的管理节点,则该存储设备除了提取自身设备的工作负载特征外,还用于汇总分布式存储系统中其他存储节点的工作负载特征,例如,接收其他存储节点提取的工作负载特征,还可以对各个存储节点的工作负载特征分别进行压缩,并将压缩后的工作负载特征发送给仿真设备。In some possible embodiments, when a storage device is deployed in a distributed storage system, if the storage device is a management node in the distributed storage system, in addition to extracting the workload characteristics of its own device, the storage device is also used to Summarize the workload characteristics of other storage nodes in the distributed storage system, for example, receive the workload characteristics extracted by other storage nodes, and compress the workload characteristics of each storage node separately, and send the compressed workload characteristics to emulated device.
需要说明的是,图1仅为示例性架构图,但不限定图1所示系统包括的网元的数量。虽然图1未示出,但除图1所示的功能实体外,图1还可以包括其他功能实体。另外,本申请实施例提供的方法可以应用于图1所示的系统,当然本申请实施例提供的方法也可以适用其他系统,本申请实施例对此不予限制。It should be noted that FIG. 1 is only an exemplary architecture diagram, but does not limit the number of network elements included in the system shown in FIG. 1 . Although not shown in FIG. 1 , in addition to the functional entities shown in FIG. 1 , FIG. 1 may also include other functional entities. In addition, the method provided in the embodiment of the present application can be applied to the system shown in FIG. 1 , and of course the method provided in the embodiment of the present application can also be applied to other systems, which is not limited in the embodiment of the present application.
参见图2,图2示例性地给出了一种存储设备的模块示意图。该存储设备可以用于提取工作负载特征。如图2所示,该存储设备至少包括数据采集模块、在线特征提取模块、特征查询接口和调优模块。Referring to FIG. 2 , FIG. 2 exemplarily shows a block diagram of a storage device. The storage device can be used to extract workload characteristics. As shown in Figure 2, the storage device at least includes a data acquisition module, an online feature extraction module, a feature query interface and an optimization module.
其中,数据采集模块用于从存储设备的内存中获取工作负载数据并将获取到的工作负载数据输入至在线特征提取模块。在线特征提取模块包括特征计算模块和特征处理模块,特征计算模块基于工作负载数据可执行多类型(例如,块业务和文件系统业务)、多粒度(例如、逻辑单元号、文件系统等)的特征提取计算,获得存储设备的workload特征并将存储设备的workload特征输入至特征处理模块,特征处理模块用于对workload特征进行压缩处理,获得压缩后的workload特征。需要说明的是,有关workload特征的具体内容可参考下述方法实施例中的相关叙述,在此不再赘述。Wherein, the data collection module is used to obtain the workload data from the memory of the storage device and input the obtained workload data to the online feature extraction module. The online feature extraction module includes a feature calculation module and a feature processing module. The feature calculation module can perform multi-type (for example, block business and file system business) and multi-granularity (for example, logical unit number, file system, etc.) features based on workload data Extract and calculate, obtain the workload feature of the storage device and input the workload feature of the storage device to the feature processing module, and the feature processing module is used to compress the workload feature to obtain the compressed workload feature. It should be noted that, for the specific content of the features of the workload, reference may be made to the relevant descriptions in the following method embodiments, which will not be repeated here.
特征查询接口作为调优模块访问在线特征提取模块的统一入口,特征查询接口用于实现调优模块和在线特征提取模块之间的数据传输。例如,调优模块可以通过特征查询接口获取workload特征并进行算法调优,例如,最优内存分配、NAS负载均衡、冷热数据识别、数据搬移策略优化等。其中,调优模块包括但不限于内存分配模块、NAS负载调控模块、垃圾回收模块等。The feature query interface is used as a unified entrance for the tuning module to access the online feature extraction module, and the feature query interface is used to realize data transmission between the tuning module and the online feature extraction module. For example, the tuning module can obtain workload features through the feature query interface and perform algorithm tuning, such as optimal memory allocation, NAS load balancing, identification of hot and cold data, and optimization of data migration strategies. Wherein, the tuning module includes but is not limited to a memory allocation module, a NAS load regulation module, a garbage collection module, and the like.
示例性地,调优模块中的内存分配模块通过特征查询接口从在线特征提取模块获取workload特征,并基于workload特征进行资源分配策略优化,获得最优内存分配策略。Exemplarily, the memory allocation module in the tuning module obtains the workload feature from the online feature extraction module through the feature query interface, and optimizes the resource allocation strategy based on the workload feature to obtain an optimal memory allocation strategy.
在一些可能的实施例中,在线特征提取模块在获得workload特征后,还可以对workload特征进行持久化存储,或者,将workload特征存储至相应的数据库中。In some possible embodiments, after the online feature extraction module obtains the workload features, it may also perform persistent storage on the workload features, or store the workload features in a corresponding database.
在一些可能的实施例中,该存储设备还包括数据回传模块。数据回传模块用于接收在线特征提取模块发送的workload特征,并将workload特征发送给外部设备(例如,仿真设备),以使外部设备基于workload特征实现线下的真实业务的IO流仿真,用于辅助现网故障定位、各类线下场景测试、算法策略评估等。In some possible embodiments, the storage device further includes a data return module. The data return module is used to receive the workload feature sent by the online feature extraction module, and send the workload feature to an external device (for example, a simulation device), so that the external device can realize the IO flow simulation of the offline real business based on the workload feature. It is used to assist fault location on the live network, test various offline scenarios, and evaluate algorithm strategies.
需要说明的是,在线特征提取模块输入至数据回传模块的workload特征可以是:经特征处理模块压缩后的workload特征,也可以是特征计算模块输出的未压缩的workload特征, 本申请实施例不作具体限定。It should be noted that the workload feature input by the online feature extraction module to the data return module can be: the workload feature compressed by the feature processing module, or the uncompressed workload feature output by the feature calculation module. The embodiments of the present application do not make specific limitations.
在本申请实施例中,图2所示的存储设备的各模块主要消耗核心中央处理器(centralprocessing unit,CPU)的算力。在一些可能的实施例中,随着存储设备提取的workload特征的数据量越来越大,图2所示的存储设备还可以包括服务分布部署模块,服务分布部署模块用于实现核心CPU算力的卸载,例如,通过分布部署至存储设备的网卡、数据处理单元(data processing unit,DPU)、芯片等,以扩大存储设备的算力。In the embodiment of the present application, each module of the storage device shown in FIG. 2 mainly consumes computing power of a core central processing unit (central processing unit, CPU). In some possible embodiments, as the amount of workload feature data extracted by the storage device increases, the storage device shown in Figure 2 may also include a service distribution deployment module, which is used to implement core CPU computing For example, by distributing and deploying network cards, data processing units (DPUs, DPUs), chips, etc. to storage devices to expand the computing power of storage devices.
需要说明的是,图2所示的存储设备的模块示意图仅是一种示例,在一些可能的实施例中,存储设备可能包含相比于图2所示的更多或者更少的模块。另外,图2中展示的各模块可以用软件、硬件或者硬件与软件的结合方式实施。It should be noted that the schematic diagram of modules of the storage device shown in FIG. 2 is only an example. In some possible embodiments, the storage device may include more or fewer modules than those shown in FIG. 2 . In addition, each module shown in FIG. 2 may be implemented by software, hardware, or a combination of hardware and software.
参见图3,图3是本申请实施例提供的一种工作负载workload特征提取方法的流程图,应用于存储设备。该方法包括但不限于以下步骤:Referring to FIG. 3 , FIG. 3 is a flowchart of a workload feature extraction method provided by an embodiment of the present application, which is applied to a storage device. The method includes but is not limited to the following steps:
S101:在输入输出IO执行过程中,获取存储设备的第一工作负载workload特征。S101: Acquire a first workload feature of a storage device during an input/output IO execution process.
在本申请实施例中,存储设备在IO执行过程中获取第一workload特征,即说明存储设备以在线的方式提取自身设备的就第一workload特征。In the embodiment of the present application, the storage device acquires the first workload feature during the IO execution process, which means that the storage device extracts the first workload feature of its own device in an online manner.
为了能准确刻画存储设备的工作负载,存储设备在不同业务场景下提取了不同的workload特征。本申请实施例主要介绍了存储设备针对两种业务场景下的workload特征的提取,其中一种业务场景为块业务,另一种业务场景为文件系统业务。In order to accurately describe the workload of the storage device, the storage device extracts different workload characteristics in different business scenarios. The embodiment of the present application mainly introduces the extraction of workload features of the storage device in two business scenarios, one of which is a block business, and the other is a file system business.
下面分别介绍这两种业务场景下提取的workload特征:The following describes the workload features extracted in these two business scenarios:
第一种:块业务The first type: block business
在本申请实施例中,在块业务场景下,第一workload特征包括时间特征、流特征和热点分布特征中的至少一项:In this embodiment of the application, in the block business scenario, the first workload features include at least one of time features, flow features, and hotspot distribution features:
(1)时间特征:(1) Time characteristics:
时间特征用于指示块业务的输入输出IO对应的时间间隔。The time feature is used to indicate the time interval corresponding to the input and output IO of the block service.
在本申请实施例中,时间特征具体包括:总时间间隔和IO时间间隔分布。其中,总时间间隔用于指示块业务的批次IO对应的总时长,IO时间间隔分布用于指示块业务的批次IO中各个IO之间的时间间隔。In the embodiment of the present application, the time feature specifically includes: a total time interval and a distribution of IO time intervals. Wherein, the total time interval is used to indicate the total duration corresponding to the batch IO of the block service, and the IO time interval distribution is used to indicate the time interval between IOs in the batch IO of the block service.
例如,总时间间隔可以是10ms,批次IO即为10ms内包含的IO的数量,IO时间间隔分布表示10ms内各个IO的时间间隔。在一些可能的实施例中,总时间间隔还可以是12ms、20ms或其他值。For example, the total time interval may be 10ms, the batch IO is the number of IOs contained within 10ms, and the IO time interval distribution indicates the time interval of each IO within 10ms. In some possible embodiments, the total time interval may also be 12ms, 20ms or other values.
(2)流特征(2) Flow characteristics
流特征用于指示块业务的IO流的访问模式。The flow feature is used to indicate the access mode of the IO flow of the block service.
需要说明的是,IO流包括多个IO。在IO流中,任意两个相邻IO之间的距离间隔小于预设间隔阈值且任意两个相邻IO之间的访问时间间隔小于预设时间阈值。IO流可以是顺序流或间隔流,其中,顺序流中的各个IO均为顺序IO,顺序IO是指读取和写入操作是基于逻辑块逐个连续地访问来自相邻地址的数据,顺序流中任意两个相邻IO之间的距离间隔为0;间隔流中的各个IO可以是随机IO,随机IO是指读写操作时间连续但访问的逻辑地址不连续,间隔流中任意两个相邻IO之间的距离间隔不为0。换句话说,顺序流中的各个IO在逻辑地址上是连续不间断的,而间隔流中的各个IO在逻辑地址上是有间断的。It should be noted that the IO stream includes multiple IOs. In the IO flow, the distance interval between any two adjacent IOs is smaller than a preset interval threshold and the access time interval between any two adjacent IOs is smaller than a preset time threshold. The IO stream can be a sequential stream or an interval stream. Each IO in the sequential stream is a sequential IO. The sequential IO means that the read and write operations access data from adjacent addresses one by one based on the logical block. The sequential stream The distance interval between any two adjacent IOs in the interval stream is 0; each IO in the interval stream can be a random IO. The distance interval between adjacent IOs is not 0. In other words, each IO in the sequential stream is continuous and uninterrupted in logical address, while each IO in the interval stream is intermittent in logical address.
在本申请实施例中,流特征包括下述特征中的至少一项:In this embodiment of the application, the stream features include at least one of the following features:
a、IO流数量:用于指示批次IO中IO流的数量。a. Number of IO streams: used to indicate the number of IO streams in the batch IO.
b、IO流长度分布:用于指示批次IO各个IO流的长度,IO流的长度为该IO流包含的 IO的数量。b. IO flow length distribution: used to indicate the length of each IO flow of the batch IO, the length of the IO flow is the IO flow contained The number of IOs.
c、IO流带宽分布:用于指示批次IO各个IO流的带宽。c. IO flow bandwidth distribution: used to indicate the bandwidth of each IO flow of the batch IO.
d、IO流间隔分布:用于指示批次IO各个IO流之间的距离间隔。d. IO flow interval distribution: used to indicate the distance between each IO flow of a batch of IO.
e、顺序流与间隔流占比:用于指示批次IO中顺序流的数量与间隔流的数量的比值。e. Proportion of sequence flow and interval flow: used to indicate the ratio of the number of sequence flow to the number of interval flow in the batch IO.
f、乱序IO流的数量:用于指示批次IO中出现乱序的IO流的数量。f. The number of out-of-order IO streams: used to indicate the number of out-of-order IO streams in the batch IO.
以一个具体的例子说明乱序IO流:假设IO流1中各个IO对应的起始位置的偏移量依次为10、20、30、40,…,即对于任意两个相邻IO中下一个IO的偏移量大于上一个IO的偏移量,则IO流1不乱序。假设IO流2中各个IO对应的起始位置的偏移量依次为20、40、30、50、10,…,即IO流2中的IO的偏移量既不呈现递增的趋势也不呈现递减的趋势,则IO流2即为乱序的IO流。Use a specific example to illustrate the out-of-order IO stream: Assume that the offsets of the starting positions corresponding to each IO in IO stream 1 are 10, 20, 30, 40, ..., that is, for any two adjacent IOs, the next If the offset of the IO is greater than the offset of the previous IO, the IO stream 1 will not be disordered. Assume that the offsets of the starting positions corresponding to each IO in IO stream 2 are 20, 40, 30, 50, 10, ..., that is, the offsets of IOs in IO stream 2 neither show an increasing trend nor present Decreasing trend, then IO flow 2 is an out-of-order IO flow.
g、IO流空间并发分布:用于指示批次IO中IO流的空间并发性。g. Spatial concurrency distribution of IO streams: used to indicate the spatial concurrency of IO streams in batch IO.
一具体实施中,IO流空间并发分布可以表示为下述中的至少一项:预设大小的地址块中包含的IO流的数量的分布;IO流访问的预设大小的地址块的数量。In a specific implementation, the concurrency distribution of the IO stream space may be expressed as at least one of the following: the distribution of the number of IO streams contained in an address block of a preset size; the number of address blocks of a preset size accessed by an IO stream.
例如,预设大小的地址块可以是大小为4M的块(简称4M块),则IO流空间并发分布这一特征具体包括:4M块包含的IO流的数量的分布以及批次IO中IO流访问的4M块的数量。For example, the address block with a preset size can be a block with a size of 4M (abbreviated as 4M block), and the feature of concurrent distribution of IO stream space specifically includes: the distribution of the number of IO streams contained in a 4M block and the IO stream in batch IO The number of 4M blocks accessed.
(3)热点分布特征(3) Distribution characteristics of hot spots
热点分布特征用于指示块业务中IO访问的地址块的重用距离分布。The hot spot distribution feature is used to indicate the reuse distance distribution of the address blocks accessed by the IO in the block business.
在本申请实施例中,热点分布特征包括地址块的重用距离分布,地址块的重用距离是指同一地址块的相邻两次访问之间所间隔的不重复的地址块的数量。有关重用距离分布的计算具体可参考下述有关重用距离的叙述,为了说明书的简洁,在此不再赘述。In the embodiment of the present application, the hotspot distribution feature includes the distribution of reuse distances of address blocks, and the reuse distance of address blocks refers to the number of non-duplicated address blocks between two adjacent accesses to the same address block. For the calculation of the reuse distance distribution, please refer to the following description about the reuse distance. For the sake of brevity, details are not repeated here.
需要说明的是,地址块的重用距离分布可以按照指数进行分级统计,例如,可以依次统计间隔距离为2048内IO访问的地址块的重用距离分布、间隔距离为4096内IO访问的地址块的重用距离分布、间隔距离为8192内IO访问的地址块的重用距离分布,…,如此可基于不同间隔距离尺度表征被IO频繁的地址块的分布情况。It should be noted that the reuse distance distribution of address blocks can be classified according to the index. For example, the reuse distance distribution of address blocks with an interval distance of 2048 IO accesses and the reuse distance of address blocks with an interval distance of 4096 IO accesses can be counted sequentially. The distance distribution, the reuse distance distribution of the address blocks accessed by the IO within the interval distance of 8192, ..., so that the distribution of the address blocks frequented by IO can be characterized based on different interval distance scales.
在一些可能的实施例中,热点分布特征还可以包括批次IO中非重复IO的数量。非重复IO的数量是指访问了不同地址块的IO的数量。In some possible embodiments, the hotspot distribution feature may also include the number of non-duplicated IOs in the batch of IOs. The number of non-duplicated IOs refers to the number of IOs that access different address blocks.
在一些可能的实施例中,在块业务场景下,第一workload特征还包括下述特征中的至少一项:IO大小分布、读写占比参数、读写总带宽、非对齐IO数量和每秒输入所输出量(Input/Output Per Second,IOPS)。In some possible embodiments, in the block business scenario, the first workload feature further includes at least one of the following features: IO size distribution, read-write ratio parameter, total read-write bandwidth, number of unaligned IOs, and Second input and output (Input/Output Per Second, IOPS).
其中,IO大小分布用于指示批次IO中各个IO的大小,读写占比参数用于指示批次IO中的读IO的数量与写IO的数量的比值,非对齐IO的数量用于指示长度不对齐的IO数量和/或偏移不对齐的IO数量。Among them, the IO size distribution is used to indicate the size of each IO in the batch IO, the read-write ratio parameter is used to indicate the ratio of the number of read IOs in the batch IO to the number of write IOs, and the number of unaligned IOs is used to indicate The number of IOs with misaligned length and/or the number of IOs with misaligned offset.
需要说明的是,非对齐IO在读写过程中可能会造成读放大或写放大,导致磁盘IO的消耗增加,不仅降低了磁盘的读写效率,还影响了磁盘的性能。而IO对齐(即指长度对齐且偏移对齐)可以有效节省磁盘IO的消耗,提高磁盘的读写效率。由此可知,非对齐IO数量可用于分析磁盘的性能。It should be noted that unaligned IO may cause read amplification or write amplification during the read and write process, resulting in increased consumption of disk IO, which not only reduces the read and write efficiency of the disk, but also affects the performance of the disk. And IO alignment (that is, length alignment and offset alignment) can effectively save disk IO consumption and improve disk read and write efficiency. It can be seen that the number of unaligned IOs can be used to analyze the performance of the disk.
第二种:文件系统业务The second type: file system business
在本申请实施例中,在文件系统业务场景下,第一workload特征包括短时访问特征,短时访问特征用于指示文件系统业务的批次IO在目录、文件、时间、文件内的IO和操作中至少一个维度上的访问模式。In this embodiment of the application, in the file system business scenario, the first workload feature includes a short-term access feature, which is used to indicate the batch IO of the file system business in the directory, file, time, IO and The access pattern on at least one dimension in the operation.
下面分别从这五个维度描述短时访问特征的具体内容: The following describes the specific content of the short-term access feature from these five dimensions:
(1)目录(1) Directory
基于目录这一维度,短时访问特征包括下述特征中的至少一种:目录数量、目录深度以及宽度分布、目录重用距离分布、目录内读写总带宽分布、目录对应的元数据操作分布、目录操作数量分布、目录并发操作数量分布和目录中文件顺序访问序列。Based on the directory dimension, short-term access characteristics include at least one of the following characteristics: number of directories, distribution of directory depth and width, distribution of directory reuse distance, distribution of total read and write bandwidth in a directory, distribution of metadata operations corresponding to a directory, The distribution of the number of directory operations, the distribution of the number of concurrent directory operations, and the sequential access sequence of files in the directory.
其中,目录深度是指该目录嵌套的子目录的长度,目录宽度是指该目录下的文件的数量。Wherein, the directory depth refers to the length of the subdirectories nested in the directory, and the directory width refers to the number of files in the directory.
目录重用距离用于指示对同一目录的相邻两次访问之间所间隔的不重复目录的数量。所谓同一目录是指目录的标识(ID)相同,在文件系统中每个目录有对应的目录ID。The directory reuse distance is used to indicate the number of distinct directories between two consecutive accesses to the same directory. The so-called same directory means that the identifiers (IDs) of the directories are the same, and each directory in the file system has a corresponding directory ID.
对目录的操作包括但不限于查看、查询、拷贝、切换、创建、删除、剪切、重命名、更改属性等。Operations on directories include but are not limited to viewing, querying, copying, switching, creating, deleting, cutting, renaming, changing attributes, etc.
目录并发操作数量是指同时对同一目录进行操作的客户端的数量。The number of concurrent directory operations refers to the number of clients operating on the same directory at the same time.
目录中文件顺序访问序列可用于分析该目录中各文件之间的关联性,有利于文件系统业务中某目录中文件顺序流的预取,提高该目录中文件访问的命中率。The sequential access sequence of files in a directory can be used to analyze the correlation between files in the directory, which is beneficial to the prefetching of the sequential flow of files in a directory in the file system business, and improves the hit rate of file access in the directory.
(2)文件(2) Documents
基于文件这一维度,短时访问特征包括下述特征中的至少一种:文件数量、文件大小分布、文件重用距离分布、文件操作数量分布和文件并发操作数量分布。Based on the dimension of files, the short-term access characteristics include at least one of the following characteristics: file quantity, file size distribution, file reuse distance distribution, file operation quantity distribution and file concurrent operation quantity distribution.
其中,文件数量包括但不限于批次IO访问的文件总数,批次IO中读IO对应的文件数量、批次IO中写IO对应的文件数量、以及批次IO中各操作模式对应的文件数量,操作模式包括顺序读、随机读、顺序写、随机写、创建写、追加写、覆盖写、文件锁等。Among them, the number of files includes but is not limited to the total number of files accessed by batch IO, the number of files corresponding to read IO in batch IO, the number of files corresponding to write IO in batch IO, and the number of files corresponding to each operation mode in batch IO , the operation modes include sequential read, random read, sequential write, random write, create write, append write, overwrite write, file lock, etc.
文件重用距离用于指示同一文件的相邻两次访问之间所间隔的不重复文件的个数。所谓同一文件是指文件ID相同,在文件系统中每个文件有对应的文件ID。The file reuse distance is used to indicate the number of unique files between two adjacent accesses to the same file. The so-called same file refers to the same file ID, and each file in the file system has a corresponding file ID.
对文件的操作包括但不限于:查看、查询、拷贝、切换、创建、删除、剪切、重命名、更改属性等。Operations on files include but are not limited to: viewing, querying, copying, switching, creating, deleting, cutting, renaming, changing attributes, etc.
文件并发操作数量是指同时对同一文件进行操作的客户端的数量。The number of concurrent file operations refers to the number of clients operating on the same file at the same time.
(3)时间(3) time
基于时间这一维度,短时访问特征包括下述特征中的至少一种:总时间间隔和时间间隔分布,其中,总时间间隔用于指示文件系统业务的批次IO对应的总时长,时间间隔分布用于指示文件系统业务的批次IO中各个文件IO之间的时间间隔。Based on the dimension of time, the short-term access feature includes at least one of the following features: total time interval and time interval distribution, wherein the total time interval is used to indicate the total duration corresponding to the batch IO of the file system business, and the time interval The distribution is used to indicate the time interval between each file IO in the batch IO of the file system service.
(4)文件内的IO(4) IO in the file
基于文件内的IO这一维度,短时访问特征包括下述特征中的至少一种:文件内IO大小分布、文件内IO的读写带宽、文件内地址块的重用距离分布、文件内IO流特征和文件内IO流的顺序度。Based on the dimension of IO in a file, short-term access features include at least one of the following features: IO size distribution in a file, read and write bandwidth of IO in a file, reuse distance distribution of address blocks in a file, and IO flow in a file Characteristic and sequentiality of IO streams within a file.
其中,文件内IO的读写带宽包括总读写带宽和单个文件的读写带宽分布中的至少一项。Wherein, the read-write bandwidth of the IO in the file includes at least one of the total read-write bandwidth and the distribution of the read-write bandwidth of a single file.
文件内地址块的重用距离用于指示对同一文件内的同一地址块的相邻两次访问之间所间隔的不重复的地址块的数量。The reuse distance of the address block in the file is used to indicate the number of non-duplicated address blocks between two adjacent accesses to the same address block in the same file.
文件内IO流特征包括文件内IO流数量、文件内IO流长度分布、文件内IO流带宽分布、文件内IO流间隔分布、文件内IO流空间并发分布等特征中的至少一种。需要说明的是,有关文件内IO流特征具体可参考上述块业务的流特征的相关叙述,在此不再赘述。The characteristics of the IO streams in the file include at least one of the number of IO streams in the file, the length distribution of the IO streams in the file, the bandwidth distribution of the IO streams in the file, the interval distribution of the IO streams in the file, and the spatial concurrency distribution of the IO streams in the file. It should be noted that, for details about the characteristics of the IO flow in the file, reference may be made to the relevant description of the above-mentioned flow characteristics of the block service, which will not be repeated here.
文件内IO流的顺序度用于指示文件内IO流的顺序性。示例性地,文件内IO流的顺序度可以根据文件内IO流长度和文件内IO流间隔进行加权计算获得,其中,在其他参数不变的情况下,文件内IO流长度越大,文件内IO流的顺序度越大,则文件内IO流的顺序性越强;在其他参数不变的情况下,文件内IO流间隔越大,文件内IO流的顺序度越小,则文件内IO 流的顺序性越弱。The sequence degree of the IO stream in the file is used to indicate the sequence of the IO stream in the file. Exemplarily, the order degree of the IO stream in the file can be obtained by weighted calculation according to the length of the IO stream in the file and the interval of the IO stream in the file, wherein, when other parameters remain unchanged, the greater the length of the IO stream in the file, the greater the length of the IO stream in the file. The greater the sequence degree of the IO stream, the stronger the sequence of the IO stream in the file; when other parameters remain unchanged, the larger the interval of the IO stream in the file, the smaller the sequence degree of the IO stream in the file, the stronger the sequence of the IO stream in the file. The less sequential the stream is.
(5)操作(5) Operation
基于操作这一维度,短时访问特征包括下述特征中的至少一种:操作命令占比、主机操作分布和操作模式分布。Based on the dimension of operation, the short-term access feature includes at least one of the following features: operation command proportion, host operation distribution, and operation mode distribution.
其中,主机操作为基于文件系统业务的批次IO识别出的用户在主机上的操作,也可以称作为聚合操作,用户操作等。主机操例如可以是Linux系统上的cp(表示拷贝)、rm(表示删除目录或文件)、rmdir(表示删除空目录)、grep(表示查询)、cd(表示切换)、ls(表示查看当前目录)、ll(表示查看当前文件)、mkdir(表示创建)、mv(表示剪切或重命名)等。Wherein, the host operation refers to the user's operation on the host identified based on the batch IO of the file system service, and may also be referred to as aggregation operation, user operation, and the like. The host operation can be, for example, cp (indicates copying), rm (indicates deleting a directory or file), rmdir (indicating deleting an empty directory), grep (indicating query), cd (indicating switching), and ls (indicating viewing the current directory) on the Linux system. ), ll (indicates to view the current file), mkdir (indicates to create), mv (indicates to cut or rename), etc.
操作模式分布用于指示批次IO对应的各操作模式的数量。操作模式包括但不限于顺序读、顺序写、随机读、随机写、创建写、追加写、覆盖写、文件锁、协议锁、文件系统锁等。The operating mode distribution is used to indicate the quantity of each operating mode corresponding to the batch IO. Operation modes include but are not limited to sequential read, sequential write, random read, random write, create write, append write, overwrite, file lock, protocol lock, file system lock, etc.
操作命令占比可以是IO操作占比、主机操作占比等。其中,IO操作占比为批次IO中IO操作(其对应的字段为读read、写wirte或查看look up等)的数量与批次IO的IO总数的比值,主机操作占比为从批次IO识别出的主机操作的数量与批次IO的IO总数的比值。The proportion of operation commands can be the proportion of IO operations, the proportion of host operations, and so on. Among them, the proportion of IO operations is the ratio of the number of IO operations in the batch IO (the corresponding fields are read read, write wirte, or look up, etc.) to the total number of IOs in the batch IO, and the proportion of host operations is the slave batch The ratio of the number of host operations identified by the IO to the total number of IOs in the batch IO.
在本申请实施例中,在文件系统业务场景下,第一workload特征还包括全局特征,全局特征用于指示文件系统的层次结构分布,全局特征包括下述特征中的至少一种:文件系统的文件数量、文件系统的目录数量、文件系统的目录深度分布、文件系统的目录下文件数量分布、文件系统的文件访问频率分布和文件系统的目录访问频率分布。In this embodiment of the application, in the file system business scenario, the first workload feature also includes a global feature, which is used to indicate the hierarchical structure distribution of the file system, and the global feature includes at least one of the following features: The number of files, the number of directories in the file system, the distribution of directory depth in the file system, the distribution of the number of files in the directory of the file system, the distribution of file access frequency in the file system, and the distribution of directory access frequency in the file system.
在一些可能的实施例中,全局特征还可以包括各目录深度的子目录数量分布、各目录深度的文件数量分布和各目录深度的文件大小分布。In some possible embodiments, the global feature may also include the distribution of the number of subdirectories at each directory depth, the distribution of the number of files at each directory depth, and the distribution of file sizes at each directory depth.
一具体实施中,获取存储设备的第一workload特征,可以是:存储设备的处理器基于内存中的工作负载数据,获得存储设备的第一workload特征。可以看出,第一workload特征是通过存储设备在线提取的,并不是外部设备通过离线方式提取,如此,有效避免了外部设备拷贝IO数据对存储设备的性能的影响,提高了工作负载特征的提取效率。In a specific implementation, obtaining the first workload feature of the storage device may be: the processor of the storage device obtains the first workload feature of the storage device based on the workload data in the memory. It can be seen that the first workload feature is extracted online by the storage device, not by the external device offline. In this way, the impact of copying IO data by the external device on the performance of the storage device is effectively avoided, and the extraction of the workload feature is improved. efficiency.
示例性地,在块业务场景下,工作负载数据包括:块业务的IO访问的逻辑单元号(Logical Unit Number,LUN)、IO的起始位置偏移量、IO大小、IO操作类型(例如,读操作或写操作)等数据。Exemplarily, in the block business scenario, the workload data includes: the logical unit number (Logical Unit Number, LUN) of the IO access of the block business, the offset of the starting position of the IO, the size of the IO, and the type of the IO operation (for example, read operation or write operation) and other data.
示例性地,在文件系统业务场景下,工作负载数据包括:文件系统的标识ID、IO对应的客户端IP地址、IO访问的文件ID、IO访问的目录ID、IO的起始位置偏移量、IO大小、IO操作类型(例如,读操作、写操作或元数据操作)等。Exemplarily, in the file system business scenario, the workload data includes: the identification ID of the file system, the client IP address corresponding to the IO, the file ID accessed by the IO, the directory ID accessed by the IO, and the start position offset of the IO , IO size, IO operation type (for example, read operation, write operation, or metadata operation), etc.
可以看到,上述块业务的workload特征和文件系统业务的workload特征都包含基于重用距离获得的相关特征,例如块业务的重用距离分布特征,又例如文件系统业务的文件重用距离分布、目录重用距离分布、文件内地址块的重用距离分布等。It can be seen that the workload features of the above-mentioned block business and the workload feature of the file system business include related features obtained based on the reuse distance, such as the reuse distance distribution characteristics of the block business, and the file reuse distance distribution and directory reuse distance of the file system business. Distribution, reuse distance distribution of address blocks within a file, etc.
重用距离是能有效表征IO热点分布的一个重要特征。在本申请实施例中,基于块访问序列中重复块的历史分布信息对块访问序列进行一次遍历处理获得块访问序列的重用距离分布,有利于提高重用距离分布特征的提取效率。Reuse distance is an important feature that can effectively characterize the distribution of IO hotspots. In the embodiment of the present application, based on the historical distribution information of repeated blocks in the block access sequence, the block access sequence is traversed once to obtain the reuse distance distribution of the block access sequence, which is beneficial to improve the extraction efficiency of the reuse distance distribution feature.
参见图4,图4是本申请实施例提供的一种块访问序列的示意图。图4示出了块访问序列{BCACDABCBCEA},可以看出,块B被访问了3次,块C被访问了4次,块A被访问了3次,块D被访问了1次,块E被访问了1次。为了准确计算每对访问的重用距离,需跟踪每个逻辑地址块(下述简称地址块)的访问信息。以块访问序列中A的一对访问(即A2和A3)为例进行说明,其中,A的上角标表示第几次访问A:Referring to FIG. 4 , FIG. 4 is a schematic diagram of a block access sequence provided by an embodiment of the present application. Figure 4 shows the block access sequence {BCACDABCBCEA}. It can be seen that block B is accessed 3 times, block C is accessed 4 times, block A is accessed 3 times, block D is accessed 1 time, block E was visited 1 time. In order to accurately calculate the reuse distance of each pair of accesses, it is necessary to track the access information of each logical address block (hereinafter referred to as address block). Take a pair of accesses of A (that is, A 2 and A 3 ) in the block access sequence as an example, where the superscript of A indicates the number of accesses to A:
表示第二次访问A(记作A2)与第三次访问A(记作A3)之间间隔的地址块的数量, 而A2与A3之间的块访问序列为{BCBCE},故 Indicates the number of address blocks between the second visit to A (denoted as A 2 ) and the third visit to A (denoted as A 3 ), And the block access sequence between A 2 and A 3 is {BCBCE}, so
表示第二次访问A与第三次访问A之间间隔的局部重复的地址块的数量;基于A2与A3之间的块访问序列为{BCBCE}可知,只有块B和块C分别被访问了两次,即块B局部重复且块C的局部重复,故 Indicates the number of partially repeated address blocks between the second access to A and the third access to A; based on the block access sequence between A 2 and A 3 is {BCBCE}, only block B and block C are respectively is visited twice, i.e. a partial repetition of block B and a partial repetition of block C, so
表示第二次访问A与第三次访问A之间间隔的全局重复的地址块的数量;可以看出,A2之前,块B和块C已被首次访问了一次,而A2与A3之间块B和块C均被新增了两次访问,故 Indicates the number of globally repeated address blocks between the second visit to A and the third visit to A; it can be seen that before A 2 , block B and block C have been visited for the first time, and A 2 and A 3 Between block B and block C are added two visits, so
表示第二次访问A与第三次访问A之间的重用距离;需要说明的是,基于重用距离的定义,易知 Indicates the reuse distance between the second visit A and the third visit A; it should be noted that, based on the definition of reuse distance, it is easy to know
其中,三者满足数量关系即在已知时,可获得为了提高重用距离的计算效率,在一次遍历块访问序列的各个逻辑地址块时,可以分别记录重复块之间的间隔距离分布信息、当前地址块对应的地址块总数以及全局重复的地址块数量,根据重复块之间的间隔距离分布信息拟合重用距离的概率密度函数,基于概率密度函数可以估算任一间隔距离在重复块组成的区间内部重复的概率。in, and The three satisfy the quantitative relationship that is known and when, available In order to improve the calculation efficiency of the reuse distance, when traversing each logical address block of the block access sequence at one time, the interval distance distribution information between repeated blocks, the total number of address blocks corresponding to the current address block, and the number of globally repeated address blocks can be recorded respectively. According to the interval distance distribution information between repeated blocks, the probability density function of the reuse distance is fitted. Based on the probability density function, the probability that any interval distance is repeated in the interval composed of repeated blocks can be estimated.
下面以重用距离为例说明的获得过程:根据A2对应的地址块总数和A3对应的地址块总数获得上述根据A2对应的全局重复的地址块数量和A3对应的全局重复的地址块数量获得上述根据概率密度函数和获得上述最后根据获得 reuse distance As an example Obtaining process: According to the total number of address blocks corresponding to A 2 and the total number of address blocks corresponding to A 3 , the above According to the number of globally repeated address blocks corresponding to A 2 and the number of globally repeated address blocks corresponding to A 3 , the above According to the probability density function and get the above final basis and get
依据上述方式可提取块业务的重用距离分布特征,上述方式也适用于提取文件系统业务的重用距离相关的特征。例如,当提取文件重用距离分布特征时,图4所示的块访问序列可替换为文件ID访问序列;当提取目录重用距离分布特征时,图4所示的块访问序列可替换为目录ID访问序列;当提取文件内地址块的重用距离分布特征时,图4所示的块访问序列中的各地址块为同一文件内的各个地址块。According to the above method, the distribution feature of the reuse distance of the block service can be extracted, and the above method is also suitable for extracting the feature related to the reuse distance of the file system service. For example, when extracting the feature of file reuse distance distribution, the block access sequence shown in Figure 4 can be replaced by file ID access sequence; when extracting the directory reuse distance distribution feature, the block access sequence shown in Figure 4 can be replaced by directory ID access sequence Sequence; when the reuse distance distribution feature of the address blocks in the file is extracted, each address block in the block access sequence shown in FIG. 4 is each address block in the same file.
S102:存储第一workload特征。S102: Store the first workload feature.
在本申请实施例中,第一workload特征可以存储于存储设备的内存中。需要说明的是,短时段内提取的第一workload特征可以存储在存储设备的内存中,以方便存储设备的其他模块在需要使用相应地workload特征时可直接从内存中获取,有利于提高数据的传输速率。In this embodiment of the present application, the first workload feature may be stored in a memory of a storage device. It should be noted that the first workload feature extracted in a short period of time can be stored in the memory of the storage device, so that other modules of the storage device can directly obtain it from the memory when they need to use the corresponding workload feature, which is conducive to improving data security. Transmission rate.
在一些可能的实施例中,当存储设备的内存中存储的各时段的第一workload特征的数据量之和达到上限阈值时,还可以将内存中各时段的第一workload特征转移至存储设备的硬盘中存储。In some possible embodiments, when the sum of the data amounts of the first workload characteristics of each period stored in the memory of the storage device reaches the upper threshold, the first workload characteristics of each period in the memory may also be transferred to the storage device. stored on the hard disk.
在一些可能的实施例中,存储设备还可以根据第一workload特征生成可视化的负载画像,其中,负载画像包括各个负载特征以及性能瓶颈信息等。其中,在块业务场景下,负载画像为块业务的负载画像;在文件系统业务场景下,负载画像为文件系统业务的负载画像。需要说明的是,负载画像可以基于用户的需求选择不同的粒度进行显示,例如,粒度可分为存储设备、控制器、LUN等。In some possible embodiments, the storage device may also generate a visualized load profile according to the first workload feature, where the load profile includes various load features and performance bottleneck information. Wherein, in the block business scenario, the load profile is the load profile of the block business; in the file system business scenario, the load profile is the load profile of the file system business. It should be noted that the load profile can be displayed at different granularities based on user requirements. For example, the granularity can be divided into storage devices, controllers, and LUNs.
可选地,在一些可能的实施例中,还可以执行:Optionally, in some possible embodiments, it is also possible to execute:
S103:根据第一workload特征,执行内存分配、数据搬移、NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。S103: Perform memory allocation, data migration, NAS load balancing, hot and cold data block identification, prefetch policy optimization, performance bottleneck perception, load prediction or load change perception according to the first workload feature.
在本申请实施例中,第一workload特征可以为存储设备的多个模块的调优提供支持,多个模块例如可以是内存分配模块、冷热数据识别模块、NAS负载调控模块等。其中,内存分配模块可执行内存分配、预取策略调优等,冷热数据识别模块可执行冷热数据块识别、数据 搬移等,NAS负载调控模块可执行NAS负载均衡、负载预测等。In the embodiment of the present application, the first workload feature can provide support for the optimization of multiple modules of the storage device, and the multiple modules can be, for example, a memory allocation module, a hot and cold data identification module, a NAS load regulation module, and the like. Among them, the memory allocation module can perform memory allocation, prefetch strategy optimization, etc., and the hot and cold data identification module can perform hot and cold data block identification, data Moving, etc., the NAS load control module can perform NAS load balancing, load forecasting, etc.
需要说明的是,存储设备中的不同模块执行调优操作时,所依赖的workload特征可以是相同的,也可以是不同的。另外,模块执行优化操作时所依赖的workload特征可以是第一workload特征中的部分特征就也可以是全部特征,在此不作具体限定。It should be noted that when different modules in the storage device perform tuning operations, they may rely on the same or different workload characteristics. In addition, the workload features that the module relies on when performing the optimization operation may be some or all of the first workload features, which are not specifically limited here.
示例性地,在块业务场景下,内存分配模块主要根据上述热点分布特征和IO大小分布等特征进行内存分配。具体地,内存分配模块基于热点分布特征可以确定批次IO中的热点IO对应的间隔距离,基于热点IO对应的间隔距离获得达到最大内存缓存cache命中率时的内存cache的大小,由此实现内存资源的分配。需要说明的是,cache命中率是指cache命中次数占cache总访问次数之比。所谓cache命中是指待读取的逻辑地址位于cache中,则可从cache中快速读取,称作cache命中。Exemplarily, in the block business scenario, the memory allocation module mainly performs memory allocation according to the above-mentioned hotspot distribution characteristics and IO size distribution and other characteristics. Specifically, the memory allocation module can determine the interval distance corresponding to the hotspot IO in the batch IO based on the distribution characteristics of the hotspot, and obtain the size of the memory cache when the maximum memory cache hit rate is achieved based on the interval distance corresponding to the hotspot IO, thereby realizing memory Allocation of resources. It should be noted that the cache hit ratio refers to the ratio of the cache hits to the total cache accesses. The so-called cache hit means that the logical address to be read is located in the cache, and can be quickly read from the cache, which is called a cache hit.
示例性地,可以根据流特征中的顺序流与间隔流占比、IO流长度分布、IO流带宽分布和IO流间隔分布执行预取策略调优。对于块业务中的不同LUN,常采用统一的预取策略,但当整体的预取存在较大浪费率的情况下则会停止各个LUN的预取功能。而在本申请实施例中,根据每个LUN的顺序流与间隔流占比、IO流长度分布、IO流带宽分布和IO流间隔分布等特征,评估每个LUN开启预取功能的收益,根据各个LUN对应的收益动态调整cache的预取策略,实现在保证预取命中率的情况下,尽可能地降低预取策略带来的读放大,防止资源的浪费。Exemplarily, the prefetch policy optimization may be performed according to the proportion of sequence flow and interval flow, IO flow length distribution, IO flow bandwidth distribution and IO flow interval distribution in flow characteristics. For different LUNs in the block business, a unified prefetching strategy is often adopted, but when the overall prefetching has a large waste rate, the prefetching function of each LUN will be stopped. However, in the embodiment of this application, according to the characteristics of each LUN, such as the proportion of sequence flow and interval flow, IO flow length distribution, IO flow bandwidth distribution, and IO flow interval distribution, etc., the benefit of enabling the prefetch function for each LUN is evaluated. The revenue corresponding to each LUN dynamically adjusts the cache prefetch strategy to reduce the read amplification caused by the prefetch strategy as much as possible while ensuring the prefetch hit rate to prevent resource waste.
示例性地,NAS负载均衡模块用于实现文件系统业务的负载均衡。具体地,NAS负载均衡模块根据文件系统的短时访问特征以及全局特征,在创建目录或文件时,可以预先评估该目录或文件的数据量以及访问频率,以确定待创建的目录或文件所归属的处理器,实现为新创建的目录或文件匹配合适的处理器。另外,在确定各个处理器的工作负载存在较大差异时,还可以根据文件系统业务的短时访问特征和全局特征,将被频繁访问的目录和文件搬移至较为空闲或者工作负载较小的处理器上。如此,实现了NAS负载均衡。Exemplarily, the NAS load balancing module is used to implement load balancing of file system services. Specifically, according to the short-term access characteristics and global characteristics of the file system, the NAS load balancing module can pre-evaluate the data volume and access frequency of the directory or file when creating a directory or file, so as to determine the directory or file to be created. The handler, implemented to match the appropriate handler for newly created directories or files. In addition, when it is determined that there is a large difference in the workload of each processor, according to the short-term access characteristics and global characteristics of the file system business, the frequently accessed directories and files can be moved to relatively idle or less workload processors. device. In this way, NAS load balancing is realized.
示例性地,还可以根据第一workload特征中的热点分布特征确定每个数据块(或称为地址块)的历史访问频次以及重用距离分布,根据数据块的历史访问频次和重用距离分布对该数据块进行冷热数据块识别。Exemplarily, the historical access frequency and reuse distance distribution of each data block (or address block) may also be determined according to the hotspot distribution feature in the first workload feature, and the historical access frequency and reuse distance distribution of the data block are used for the Data blocks are identified as hot and cold data blocks.
在一些可能的实施例中,在确定数据块的冷热属性之后,还可以执行数据搬移。例如,在融合存储中尽量将热数据存储至高性能的固态硬盘SSD中以及将冷数据放置在低性能的机械硬盘HDD,如此可以节省高性能的SSD的存储空间,因此在识别出数据块A为热数据块但数据块A放置于HDD时,则可以将数据块A对应的数据迁移至SSD以提高数据的读取效率。In some possible embodiments, after determining the hot and cold attributes of the data block, data migration may also be performed. For example, in fusion storage, try to store hot data in high-performance solid-state disk SSD and place cold data in low-performance mechanical hard disk HDD, which can save the storage space of high-performance SSD. Therefore, after identifying data block A as For hot data blocks but data block A is placed on HDD, the data corresponding to data block A can be migrated to SSD to improve data reading efficiency.
在一些可能的实施例中,存储设备还可以根据提取出的第一workload特征执行负载预测或负载变化感知,以估计未来负载的变化趋势,使得更好地实现负载调控。在一些可能的实施例中,存储设备还可以根据第一workload特征进行性能瓶颈感知、稳定写带宽等,在此不作具体限定。In some possible embodiments, the storage device may also perform load prediction or load change sensing according to the extracted first workload feature, so as to estimate a future load change trend, so as to better implement load control. In some possible embodiments, the storage device may also perform performance bottleneck detection, stabilize write bandwidth, etc. according to the first workload feature, which is not specifically limited here.
可以看到,实施本申请实施例,针对块业务和文件系统业务分别从多个维度提取了不同的工作负载特征,能够更精准地表征存储设备在不同业务场景下的工作负载。另外,存储设备的工作负载特征的在线提取,不仅提高了动作负载特征的提取效率,还能有效避免用户的IO数据的直接泄露,提高了用户数据的安全度。提取出的工作负载特征可以直接供给存储设备自身的各个模块使用,实现存储设备的自适应调优,有利于提高存储系统的智能化程度。It can be seen that, by implementing the embodiment of the present application, different workload characteristics are extracted from multiple dimensions for block business and file system business, which can more accurately characterize the workload of storage devices in different business scenarios. In addition, the online extraction of workload characteristics of storage devices not only improves the extraction efficiency of action load characteristics, but also effectively avoids the direct leakage of user IO data and improves the security of user data. The extracted workload characteristics can be directly used by each module of the storage device itself, so as to realize the self-adaptive optimization of the storage device and help to improve the intelligence of the storage system.
参见图5,图5是本申请实施例提供的一种工作负载特征提取方法的流程图,应用于存储设备和仿真设备组成的通信系统。图5实施例可以是对图3实施例的补充,也可以独立于 图3实施例。该方法包括但不限于以下步骤:Referring to FIG. 5 , FIG. 5 is a flow chart of a workload feature extraction method provided by an embodiment of the present application, which is applied to a communication system composed of a storage device and an emulation device. The embodiment in Figure 5 can be a supplement to the embodiment in Figure 3, or it can be independent of Figure 3 Example. The method includes but is not limited to the following steps:
S201:存储设备在IO执行过程中,获取自身的第一workload特征。本步骤具体可参考图3实施例中S101的相关描述,为了说明书的简洁,这里不再赘述。S201: The storage device acquires its own first workload characteristic during the IO execution process. For details of this step, reference may be made to the related description of S101 in the embodiment in FIG.
S202:存储设备向仿真设备发送第一workload特征。相应地,仿真设备接收存储设备发送的第一workload特征。S202: The storage device sends the first workload feature to the emulation device. Correspondingly, the simulation device receives the first workload feature sent by the storage device.
S203:仿真设备根据第一workload特征,执行内存分配、数据搬移、NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。S203: The simulation device performs memory allocation, data migration, NAS load balancing, hot and cold data block identification, prefetch policy optimization, performance bottleneck perception, load prediction or load change perception according to the first workload characteristics.
在本申请实施例中,仿真设备可以基于第一workload特征执行下述至少一项应用:内存分配、数据搬移、网络附加存储NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。需要说明的是,有关内存分配、数据搬移、NAS负载均衡等应用的执行过程具体可参考上述图3实施例中S103的相关描述,为了说明书的简洁,在此不再赘述。In this embodiment of the application, the simulation device can execute at least one of the following applications based on the first workload feature: memory allocation, data movement, network attached storage NAS load balancing, identification of hot and cold data blocks, prefetch strategy optimization, performance bottleneck sensing, load forecasting, or load change sensing. It should be noted that for the execution process of applications such as memory allocation, data migration, and NAS load balancing, please refer to the relevant description of S103 in the embodiment of FIG. 3 above.
在一些可能的实施例中,仿真设备在执行上述各个应用之前,仿真设备还可以先验证提取的第一workload特征是否可信,在确定第一workload特征可信的情况下,再执行上述应用。In some possible embodiments, before the simulation device executes each of the above applications, the simulation device may first verify whether the extracted first workload feature is authentic, and then execute the above application if the first workload feature is determined to be authentic.
示例性地,仿真设备确定提取的第一workload特征可信,可以是:仿真设备对第一workload特征进行仿真,获得IO仿真数据;根据IO仿真数据重新提取第三workload特征,对比第三workload特征与第一workload特征,在第三workload特征与第一workload特征一致时,确定第一workload特征可信,即说明IO仿真数据可作为第一workload特征对应的真实IO数据。可以理解,第一workload特征可信,表明第一workload特征可以较好地表征存储设备的工作负载,第一workload特征以及根据第一workload特征获得IO仿真数据均是具有参考意义的。Exemplarily, the simulation device determines that the extracted first workload feature is credible, which may be: the simulation device simulates the first workload feature to obtain IO simulation data; re-extracts the third workload feature according to the IO simulation data, and compares the third workload feature With the first workload feature, when the third workload feature is consistent with the first workload feature, it is determined that the first workload feature is credible, which means that the IO simulation data can be used as the real IO data corresponding to the first workload feature. It can be understood that the first workload feature is credible, indicating that the first workload feature can better characterize the workload of the storage device, and the first workload feature and the IO simulation data obtained according to the first workload feature are of reference significance.
需要说明的是,第三workload特征与第一workload特征一致可以是第三workload特征与第一workload特征之间的相似度满足预设条件,例如,第三workload特征与第一workload特征之间的相似度大于等于预设阈值。It should be noted that the consistency between the third workload feature and the first workload feature may be that the similarity between the third workload feature and the first workload feature satisfies a preset condition, for example, the third workload feature and the first workload feature The similarity is greater than or equal to a preset threshold.
示例性地,仿真设备确定提取的第一workload特征可信,可以是:仿真设备对第一workload特征进行仿真,获得IO仿真数据;观测IO仿真数据在设备上运行时的性能指标,获得IO仿真数据对应的性能指标与真实的工作负载数据对应的性能指标之间的差异值,在差异值小于预设差异阈值时,确定第一workload特征可信,且IO仿真数据可以作为第一workload特征对应的真实的工作负载数据。需要说明的是,性能指标可以是IOPS、IO时延、cache命中IO数量、cache预取IO数量等。Exemplarily, the simulation device determines that the extracted first workload feature is credible, which may be: the simulation device simulates the first workload feature to obtain IO simulation data; observe the performance index of the IO simulation data when running on the device to obtain the IO simulation The difference value between the performance index corresponding to the data and the performance index corresponding to the real workload data. When the difference value is less than the preset difference threshold, it is determined that the first workload feature is credible, and the IO simulation data can be used as the first workload feature corresponding real workload data. It should be noted that the performance indicators may be IOPS, IO delay, cache hit IO quantity, cache prefetch IO quantity, and so on.
在一些可能的实施例中,仿真设备还可以基于第一workload特征,辅助性能故障定位、线下测试、算法策略评估等。In some possible embodiments, the simulation device can also assist performance fault location, offline testing, algorithm strategy evaluation, etc. based on the first workload feature.
可选地,为了提高存储设备与仿真设备之间的数据传输速率,存储设备还可以对第一workload特征进行压缩,再将压缩后的workload特征发送给仿真设备。在此情况下,上述S202和S203也可以不执行,而执行下述中的S204-S206:Optionally, in order to increase the data transmission rate between the storage device and the simulation device, the storage device may also compress the first workload feature, and then send the compressed workload feature to the simulation device. In this case, the above S202 and S203 may not be executed, but the following S204-S206 are executed:
S204:存储设备对第一workload特征进行压缩,获得第二workload特征和压缩参数。S204: The storage device compresses the first workload feature to obtain a second workload feature and compression parameters.
其中,第二workload特征包含的特征的数量小于第一workload特征包含的特征的数量,压缩参数用于将第二workload特征恢复为第一workload特征。可以看出,将第一workload特征进行压缩,不仅可以减少用于存储第一workload特征的存储空间的消耗,还可以提高数据的传输速率。 Wherein, the number of features included in the second workload feature is smaller than the number of features included in the first workload feature, and the compression parameters are used to restore the second workload feature to the first workload feature. It can be seen that compressing the first workload feature can not only reduce the consumption of storage space for storing the first workload feature, but also increase the data transmission rate.
需要说明的是,第二workload特征为第一workload特征经压缩后的workload特征,换句话说,第二workload特征中的各个特征为第一workload特征经压缩后被保留的特征。可以理解,第二workload特征中特征的数量小于第一workload特征中特征的数量,故第二workload特征的数据量也小于第一workload特征的数据量。It should be noted that the second workload feature is a compressed workload feature of the first workload feature. In other words, each feature in the second workload feature is a feature that is retained after the first workload feature is compressed. It can be understood that the number of features in the second workload feature is smaller than the number of features in the first workload feature, so the data volume of the second workload feature is also smaller than the data volume of the first workload feature.
在本申请实施例中,第二workload特征中的每个特征与第一workload特征中的一个或多个特征对应。In this embodiment of the present application, each feature in the second workload features corresponds to one or more features in the first workload features.
例如,若第二workload特征中的特征a1与第一workload特征中的特征a2和特征a3对应,即说明基于压缩参数和特征a1可以还原或者恢复出特征a2和特征a3。For example, if feature a1 in the second workload feature corresponds to feature a2 and feature a3 in the first workload feature, it means that feature a2 and feature a3 can be restored or recovered based on the compression parameters and feature a1.
在本申请实施例中,第二workload特征中的各个特征的字段名在压缩前后保持不变,如此,可以保证第二workload特征中的每个特征仍具有可解释性。In the embodiment of the present application, the field names of each feature in the second workload feature remain unchanged before and after compression, so that it can be ensured that each feature in the second workload feature is still interpretable.
例如,假设第一workload特征包括特征1、特征2和特征3,而对第一workload特征压缩后获得的第二workload特征仅包括特征1,在此情况下,特征1为第一workload特征经压缩后被保留的特性,且第一workload特征中特征1的字段名与第二workload特征中特征1的字段名相同。For example, assume that the first workload feature includes feature 1, feature 2, and feature 3, and the second workload feature obtained after compressing the first workload feature only includes feature 1. In this case, feature 1 is the compressed first workload feature The feature that is reserved later, and the field name of feature 1 in the first workload feature is the same as the field name of feature 1 in the second workload feature.
在本申请实施中,对第一workload特征进行压缩,包括:根据第一workload特征中特征之间的相似性和/或可预测性,对第一workload特征进行压缩。In the implementation of the present application, compressing the first workload features includes: compressing the first workload features according to the similarity and/or predictability between features in the first workload features.
一具体实施中,第一workload特征中特征之间的相似性可以是指第一workload特征中同一批次IO的不同特征之间的相似性。在此情况下,压缩过程可以是:对于第一workload特征中任意两个不同的特征,计算这两个特征之间的相似度,当这两个特征之间的相似度满足预设相似条件时,删除该相似度对应的两个特征中的任意一个。需要说明的是,在删除该相似度对应的任一特征之前,需先将该相似度对应的两个特征之间的映射关系记录至压缩参数中。In a specific implementation, the similarity between features in the first workload feature may refer to the similarity between different features of the same batch of IOs in the first workload feature. In this case, the compression process may be: for any two different features in the first workload feature, calculate the similarity between the two features, when the similarity between the two features meets the preset similarity condition , delete any one of the two features corresponding to the similarity. It should be noted that before deleting any feature corresponding to the similarity, the mapping relationship between the two features corresponding to the similarity needs to be recorded in the compression parameters.
例如,以第一workload特征中的特征A和特征B为例,计算特征A和特征B之间的相似度,在特征A与特征B之间的相似度大于等于预设相似阈值时,可认为特征A与特征B相似,则可以删除特征A与特征B中的任意一个,由此可有效减少第一workload特征的数据量,从而实现了对第一workload特征的压缩。For example, take feature A and feature B in the first workload feature as an example, calculate the similarity between feature A and feature B, and when the similarity between feature A and feature B is greater than or equal to the preset similarity threshold, it can be considered If feature A is similar to feature B, any one of feature A and feature B can be deleted, thereby effectively reducing the data volume of the first workload feature, thereby realizing the compression of the first workload feature.
另一具体实施中,第一workload特征中特征之间的相似性可以是指第一workload特征中不同批次IO的各特征之间的相似性。In another specific implementation, the similarity between features in the first workload feature may refer to the similarity between features of different batches of IOs in the first workload feature.
例如,假设第一workload特征包括特征1、特征2、特征3、特征4、特征5和特征6,其中,特征1、特征2和特征3属于第一批次IO,特征4、特征5和特征6属于第二批次IO,特征1的字段名与特征4的字段名相同,特征2的字段名与特征5的字段名相同,特征3的字段名与特征6的字段名相同,且特征1的内容与特征4的内容相同(即相似度满足预设相似条件),特征2的内容与特征5的内容相同(即相似度满足预设相似条件),但特征3的内容与特征6的内容不相同(即相似度不满足预设相似条件),在此情况下,第一批次IO的三个特征全部保留,由于特征4与特征1相同,特征5与特征2相同,而特征1和特征2已经保留,故可以删除冗余的特征4和特征5,即对于第二批次IO仅保留了特征6。由此实现了对第一workload特征的压缩。For example, assume that the first workload features include feature 1, feature 2, feature 3, feature 4, feature 5, and feature 6, where feature 1, feature 2, and feature 3 belong to the first batch of IOs, and feature 4, feature 5, and feature 6 belongs to the second batch of IO, the field name of feature 1 is the same as that of feature 4, the field name of feature 2 is the same as that of feature 5, the field name of feature 3 is the same as that of feature 6, and the field name of feature 1 The content of feature 4 is the same as that of feature 4 (that is, the similarity meets the preset similarity conditions), the content of feature 2 is the same as that of feature 5 (that is, the similarity meets the preset similarity conditions), but the content of feature 3 is the same as that of feature 6 are not the same (that is, the similarity does not meet the preset similarity conditions), in this case, the three features of the first batch of IOs are all retained, because feature 4 is the same as feature 1, feature 5 is the same as feature 2, and feature 1 and Feature 2 has been reserved, so redundant features 4 and 5 can be deleted, that is, only feature 6 is reserved for the second batch of IOs. In this way, the compression of the first workload feature is realized.
在本申请实施例中,根据第一workload特征中特征之间的可预测性对第一workload特征进行压缩,包括:通过人工智能模型确定第一workload特征中可预测的特征,获得人工智能模型的参数,并从workload特征中删除可预测的特征,其中,人工智能模型的参数包含于压缩参数中。 In the embodiment of the present application, compressing the first workload feature according to the predictability among the features in the first workload feature includes: determining the predictable feature in the first workload feature through the artificial intelligence model, and obtaining the artificial intelligence model parameters, and remove predictable features from workload features, where the parameters of the artificial intelligence model are included in the compressed parameters.
例如,假设第一workload特征包括特征b1、特征b2、特征b3和特征b4,经人工智能模型确定基于特征b1可预测特征b2、基于特征b3可预测特征b4,则可以从第一workload特征中删除特征b2和特征b4,或者,从第一workload特征中删除特征b1和特征b3。另外,获得的压缩参数包括特征b1与特征b2之间的转换参数、特征b3与特征b4之间的转换参数。For example, assuming that the first workload feature includes feature b1, feature b2, feature b3, and feature b4, and the artificial intelligence model determines that feature b2 can be predicted based on feature b1, and feature b4 can be predicted based on feature b3, then it can be deleted from the first workload feature Feature b2 and feature b4, or, remove feature b1 and feature b3 from the first workload feature. In addition, the obtained compression parameters include conversion parameters between feature b1 and feature b2, and conversion parameters between feature b3 and feature b4.
示例性地,人工智能模型也是单层神经网络、随机森林(Random Forest,RF)模型、支持向量机(Support Vector Machine,SVM)模型或者其他预测算法,在此不作具体限定。Exemplarily, the artificial intelligence model is also a single-layer neural network, a random forest (Random Forest, RF) model, a support vector machine (Support Vector Machine, SVM) model or other prediction algorithms, which are not specifically limited herein.
需要说明的是,特征的预测可以是一对一,即基于一个特征预测另一个特征,也可以是多对一,即基于多个特征预测一个特征,还可以是一对多,即基于一个特征预测多个特征,本申请实施例不作具体限定。It should be noted that the prediction of features can be one-to-one, that is, predict another feature based on one feature, or many-to-one, that is, predict a feature based on multiple features, or one-to-many, that is, based on one feature Multiple features are predicted, which is not specifically limited in this embodiment of the present application.
可以理解,第一workload特征中可预测的特征的数量越多,则可达到的第一workload特征的压缩比例就越高,第一workload特征经压缩后获得的第二workload特征的数据量就越小。It can be understood that the larger the number of predictable features in the first workload feature, the higher the compression ratio of the first workload feature that can be achieved, and the larger the data volume of the second workload feature obtained after the first workload feature is compressed. Small.
在一些可能的实施例,还可以结合特征之间的相似性和可预测性对第一workload特征进行压缩,具体可参考上述有关相似性和可预测性的相关叙述,在此不再赘述。In some possible embodiments, the features of the first workload may also be compressed in combination with the similarity and predictability between features. For details, please refer to the related descriptions about the similarity and predictability above, which will not be repeated here.
可选地,在一些可能的实施例中,还可以执行:Optionally, in some possible embodiments, it is also possible to execute:
S205:存储设备向仿真设备发送第二workload特征和压缩参数。S205: The storage device sends the second workload characteristics and compression parameters to the emulation device.
相应地,仿真设备接收存储设备发送的第二workload特征和压缩参数。如此,将第二workload特征和压缩参数传输给仿真设备,可以有效提高存储设备与仿真设备之间的数据传输效率。Correspondingly, the simulation device receives the second workload feature and compression parameters sent by the storage device. In this way, transmitting the second workload characteristics and compression parameters to the simulation device can effectively improve the data transmission efficiency between the storage device and the simulation device.
S206:仿真设备根据第二workload特征和压缩参数,执行内存分配、数据搬移、NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。S206: The simulation device performs memory allocation, data movement, NAS load balancing, identification of hot and cold data blocks, prefetch policy optimization, performance bottleneck perception, load prediction or load change perception according to the second workload characteristics and compression parameters.
在本申请实施例中,仿真设备可以先利用压缩参数将第二workload特征还原为第一workload特征,并根据第一workload特征执行上述内存分配、数据搬移、NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。需要说明的是,有关仿真设备根据第一workload特征执行内存分配、数据搬移、NAS负载均衡等具体可参考上述S203的相关叙述,为了说明书的简洁,在此不再赘述。In the embodiment of the present application, the simulation device can first use the compression parameters to restore the second workload feature to the first workload feature, and perform the above-mentioned memory allocation, data movement, NAS load balancing, hot and cold data block identification, etc. according to the first workload feature. Prefetching strategy tuning, performance bottleneck perception, load forecasting or load change perception. It should be noted that for details about memory allocation, data migration, and NAS load balancing performed by the simulation device according to the characteristics of the first workload, please refer to the relevant description of S203 above. For the sake of brevity, details are omitted here.
可以看到,实施本申请实施例,存储设备分别针对块业务和文件系统业务提取了不同的工作负载特征,能够更精准地表征存储设备在不同业务场景下的负载特性。对提取出的工作负载特征进行压缩后再传输给仿真设备,可以有效提高存储设备与仿真设备之间数据的传输效率,也使得仿真设备可以基于工作负载特征实现业务场景的复现等功能。It can be seen that, implementing the embodiment of the present application, the storage device extracts different workload characteristics for the block service and the file system service respectively, which can more accurately characterize the load characteristics of the storage device in different service scenarios. Compressing the extracted workload characteristics and then transmitting them to the simulation device can effectively improve the data transmission efficiency between the storage device and the simulation device, and also enable the simulation device to realize the reproduction of business scenarios based on the workload characteristics.
参见图6,图6是本申请实施例提供的一种存储设备的功能结构示意图,存储设备30包括处理单元310和存储单元312。该存储设备30可以通过硬件、软件或者软硬件结合的方式来实现。Referring to FIG. 6 , FIG. 6 is a schematic diagram of a functional structure of a storage device provided by an embodiment of the present application. The storage device 30 includes a processing unit 310 and a storage unit 312 . The storage device 30 may be implemented by hardware, software, or a combination of software and hardware.
其中,处理单元310,用于在输入输出IO执行过程中,获取存储设备的第一工作负载workload特征;存储单元312,用于存储第一workload特征,第一workload特征用于存储设备的内存分配、数据搬移、网络附加存储NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。Wherein, the processing unit 310 is configured to obtain the first workload feature of the storage device during the execution of the input and output IO; the storage unit 312 is used to store the first workload feature, and the first workload feature is used for memory allocation of the storage device , data migration, network attached storage NAS load balancing, identification of hot and cold data blocks, prefetch policy tuning, performance bottleneck perception, load forecasting or load change perception.
在一些可能的实施例中,存储设备30还包括发送单元314,发送单元314用于向仿真设备发送第二workload特征和压缩参数,其中,压缩参数用于将第二workload特征恢复为第一workload特征。在一些可能的实施例中,发送单元314也可以用于向仿真设备发送第一workload特征。 In some possible embodiments, the storage device 30 further includes a sending unit 314, and the sending unit 314 is configured to send the second workload characteristics and compression parameters to the simulation device, wherein the compression parameters are used to restore the second workload characteristics to the first workload feature. In some possible embodiments, the sending unit 314 may also be configured to send the first workload feature to the simulation device.
该存储设备30的各功能模块可用于实现图3实施例所描述的方法。在图3实施例中,处理单元310可用于执行S101和S103,存储单元312可用于执行S102,发送单元314可用于执行图5中的S202或S205。该存储设备30的各功能模块还可用于实现图5实施例所描述的存储设备侧的方法,为了说明书的简洁,在此不再赘述。Each functional module of the storage device 30 may be used to implement the method described in the embodiment of FIG. 3 . In the embodiment in FIG. 3 , the processing unit 310 may be used to execute S101 and S103 , the storage unit 312 may be used to execute S102 , and the sending unit 314 may be used to execute S202 or S205 in FIG. 5 . Each functional module of the storage device 30 can also be used to implement the method on the storage device side described in the embodiment of FIG. 5 , and details are not repeated here for the sake of brevity.
以上图6所示实施例中的各个单元的一个或多个可以软件、硬件、固件或其结合实现。所述软件或固件包括但不限于计算机程序指令或代码,并可以被硬件处理器所执行。所述硬件包括但不限于各类集成电路,如中央处理单元(central processing unit,CPU)、数字信号处理器(digital signal processor,DSP)、现场可编程门阵列(field-programmable gate array,FPGA)或专用集成电路(application-specific integrated circuit,ASIC)。One or more of each unit in the above embodiment shown in FIG. 6 may be realized by software, hardware, firmware or a combination thereof. The software or firmware includes but is not limited to computer program instructions or codes, and can be executed by a hardware processor. The hardware includes but not limited to various integrated circuits, such as central processing unit (central processing unit, CPU), digital signal processor (digital signal processor, DSP), field-programmable gate array (field-programmable gate array, FPGA) Or application-specific integrated circuit (ASIC).
本申请还提供一种存储设备。如图7所示,存储设备40包括:处理器401、通信接口402、存储器403和总线404。处理器401、存储器403和通信接口402之间通过总线404通信。存储设备40可以是服务器或存储设备。应理解,本申请不限定存储设备40中的处理器、存储器的个数。The present application also provides a storage device. As shown in FIG. 7 , the storage device 40 includes: a processor 401 , a communication interface 402 , a memory 403 and a bus 404 . The processor 401 , the memory 403 and the communication interface 402 communicate through a bus 404 . Storage device 40 may be a server or a storage device. It should be understood that the present application does not limit the number of processors and memories in the storage device 40 .
总线404可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线404可包括在存储设备40各个部件(例如,存储器403、处理器401、通信接口402)之间传送信息的通路。The bus 404 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one line is used in FIG. 7 , but it does not mean that there is only one bus or one type of bus. The bus 404 may include a pathway for transferring information between various components of the storage device 40 (eg, memory 403 , processor 401 , communication interface 402 ).
处理器401可以包括中央处理器(central processing unit,CPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。The processor 401 may include any one or more of processors such as a central processing unit (central processing unit, CPU), a microprocessor (micro processor, MP), or a digital signal processor (digital signal processor, DSP).
存储器403用于提供存储空间,存储空间中可以存储操作系统和计算机程序等数据。存储器403可以是随机存取存储器(random access memory,RAM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)、只读存储器(read-only memory,ROM),或便携式只读存储器(compact disc read memory,CD-ROM)等中的一种或者多种的组合。存储器403可以单独存在,也可以集成于处理器401内部。The memory 403 is used to provide a storage space, in which data such as operating systems and computer programs can be stored. Memory 403 can be random access memory (random access memory, RAM), erasable programmable read only memory (erasable programmable read only memory, EPROM), read-only memory (read-only memory, ROM), or portable read-only memory One or more combinations of memory (compact disc read memory, CD-ROM), etc. The memory 403 may exist independently, or may be integrated inside the processor 401 .
通信接口402可用于为处理器401提供信息输入或输出。或者可替换的,该通信接口402可用于接收外部发送的数据和/或向外部发送数据,可以为包括诸如以太网电缆等的有线链路接口,也可以是无线链路(如Wi-Fi、蓝牙、通用无线传输等)接口。或者可替换的,通信接口402还可以包括与接口耦合的发射器(如射频发射器、天线等),或者接收器等。Communication interface 402 may be used to provide information input or output to processor 401 . Or alternatively, the communication interface 402 can be used to receive data sent from the outside and/or send data to the outside, and can be a wired link interface such as an Ethernet cable, or a wireless link (such as Wi-Fi, Bluetooth, general wireless transmission, etc.) interface. Or alternatively, the communication interface 402 may further include a transmitter (such as a radio frequency transmitter, an antenna, etc.) or a receiver coupled with the interface.
该存储设备40中的处理器401用于读取存储器403中存储的计算机程序,用于执行前述的方法,例如图3所描述的方法或图5所描述的存储设备侧的方法。The processor 401 in the storage device 40 is configured to read the computer program stored in the memory 403 to execute the aforementioned method, such as the method described in FIG. 3 or the method on the storage device side described in FIG. 5 .
在一种可能的设计方式中,存储设备40可为执行图3所示方法的执行主体中的一个或多个模块,该处理器401可用于读取存储器中存储的一个或多个计算机程序,用于执行以下操作:In a possible design manner, the storage device 40 may be one or more modules in the execution subject of the method shown in FIG. 3 , and the processor 401 may be used to read one or more computer programs stored in the memory, Used to do the following:
在输入输出IO执行过程中,获取存储设备的第一工作负载workload特征;During the execution of the input and output IO, acquire the first workload feature of the storage device;
通过存储单元312存储第一workload特征,第一workload特征用于存储设备的内存分配、数据搬移、网络附加存储NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。The first workload feature is stored in the storage unit 312, and the first workload feature is used for memory allocation of storage devices, data migration, network attached storage NAS load balancing, hot and cold data block identification, prefetch policy optimization, performance bottleneck perception, and load prediction or load change sensing.
在本文上述的实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详细描述的部分,可以参见其他实施例的相关描述。 In the above-mentioned embodiments herein, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.
需要说明的是,本领域普通技术人员可以看到上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质包括只读存储器(Read-Only Memory,ROM)、随机存储器(Random Access Memory,RAM)、可编程只读存储器(Programmable Read-only Memory,PROM)、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、一次可编程只读存储器(One-time Programmable Read-Only Memory,OTPROM)、电子抹除式可复写只读存储(Electrically-Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储器、磁盘存储器、磁带存储器、或者能够用于携带或存储数据的计算机可读的任何其他介质。It should be noted that those skilled in the art can see that all or part of the steps in the various methods of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium , storage medium includes read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), programmable read-only memory (Programmable Read-only Memory, PROM), erasable programmable read-only memory ( Erasable Programmable Read Only Memory, EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically-Erasable Programmable Read-Only Memory, EEPROM, Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage, tape storage, or any other computer-readable medium that can be used to carry or store data.
本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机程序产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是个人计算机,服务器,或者网络设备、机器人、单片机、芯片、机器人等)执行本申请各个实施例所述方法的全部或部分步骤。 The essence of the technical solution of the present application or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of software products. The computer program product is stored in a storage medium, including several instructions. So that a device (which may be a personal computer, a server, or a network device, a robot, a single-chip microcomputer, a chip, a robot, etc.) executes all or part of the steps of the methods described in the various embodiments of the present application.

Claims (22)

  1. 一种工作负载特征提取方法,应用于存储设备,其特征在于,所述方法包括:A workload feature extraction method applied to a storage device, wherein the method comprises:
    在输入输出IO执行过程中,获取所述存储设备的第一工作负载workload特征;During the execution of the input and output IO, acquire the first workload feature of the storage device;
    存储所述第一workload特征,所述第一workload特征用于所述存储设备的内存分配、数据搬移、网络附加存储NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。Storing the first workload feature, the first workload feature is used for memory allocation, data migration, network attached storage NAS load balancing, hot and cold data block identification, prefetch policy optimization, performance bottleneck perception, Load forecasting or load change sensing.
  2. 根据权利要求1所述的方法,其特征在于,所述获取所述存储设备的第一工作负载workload特征,包括:The method according to claim 1, wherein the acquiring the first workload feature of the storage device comprises:
    所述存储设备的处理器根据内存中的工作负载数据,获得所述第一工作负载workload特征。The processor of the storage device obtains the first workload feature according to the workload data in the memory.
  3. 根据权利要求2所述的方法,其特征在于,所述第一workload特征无需外部设备通过离线方式提取。The method according to claim 2, wherein the first workload feature is extracted offline without external equipment.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,在块业务场景下,所述第一workload特征包括时间特征、流特征和热点分布特征中的至少一项,其中,所述时间特征用于指示所述块业务的IO对应的时间间隔,所述流特征用于指示所述块业务的IO流的访问模式,所述热点分布特征用于指示所述块业务中地址块的重用距离分布。The method according to any one of claims 1-3, wherein, in a block business scenario, the first workload feature includes at least one of a time feature, a flow feature, and a hotspot distribution feature, wherein the The time feature is used to indicate the time interval corresponding to the IO of the block service, the flow feature is used to indicate the access mode of the IO flow of the block service, and the hot spot distribution feature is used to indicate the address block in the block service Reuse the distance distribution.
  5. 根据权利要求4所述的方法,其特征在于,所述第一workload特征还包括下述特征中的至少一项:IO大小分布、读写占比参数、读写总带宽和非对齐IO数量。The method according to claim 4, wherein the first workload feature further includes at least one of the following features: IO size distribution, read-write ratio parameter, total read-write bandwidth, and the number of unaligned IOs.
  6. 根据权利要求1-3任一项所述的方法,其特征在于,在文件系统业务场景下,所述第一workload特征包括短时访问特征,所述短时访问特征用于指示所述文件系统业务的批次IO在文件、目录、时间、文件内的IO和操作中至少一个维度上的访问模式。The method according to any one of claims 1-3, wherein in the file system service scenario, the first workload feature includes a short-term access feature, and the short-time access feature is used to indicate the file system The access mode of business batch IO in at least one dimension of files, directories, time, IOs in files, and operations.
  7. 根据权利要求6所述的方法,其特征在于,所述第一workload特征还包括全局特征,所述全局特征用于指示文件系统的层次结构分布,所述全局特征包括以下特征中的至少一种:The method according to claim 6, wherein the first workload feature further includes a global feature, the global feature is used to indicate the hierarchical structure distribution of the file system, and the global feature includes at least one of the following features :
    所述文件系统的文件数量;the number of files in the file system;
    所述文件系统的目录数量;the number of directories in the file system;
    所述文件系统的目录深度分布;The directory depth distribution of the file system;
    所述文件系统的目录下文件数量分布;The number distribution of files under the directory of the file system;
    所述文件系统的文件访问频率分布;和a file access frequency distribution of the file system; and
    所述文件系统的目录访问频率分布。Directory access frequency distribution of the file system.
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-7, wherein the method further comprises:
    对所述第一workload特征进行压缩,获得第二workload特征和压缩参数,所述第二workload特征包括的特征的数量小于所述第一workload特征包括的特征的数量,所述压缩参数用于将所述第二workload特征恢复为所述第一workload特征;Compressing the first workload features to obtain second workload features and compression parameters, the number of features included in the second workload features is smaller than the number of features included in the first workload features, and the compression parameters are used to compress The second workload characteristic reverts to the first workload characteristic;
    根据所述第二workload特征和所述压缩参数,执行NAS负载均衡、内存分配、数据搬移、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。According to the second workload feature and the compression parameters, perform NAS load balancing, memory allocation, data migration, identification of hot and cold data blocks, prefetch policy optimization, performance bottleneck perception, load prediction or load change perception.
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:The method according to claim 8, characterized in that the method further comprises:
    向仿真设备发送所述第二workload特征和所述压缩参数,以使所述仿真设备根据所述第二workload特征和所述压缩参数执行内存分配、数据搬移、网络附加存储NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。Sending the second workload feature and the compression parameter to the simulated device, so that the simulated device performs memory allocation, data movement, network attached storage NAS load balancing, heating and cooling according to the second workload feature and the compression parameter Data block identification, prefetch strategy tuning, performance bottleneck perception, load prediction or load change perception.
  10. 根据权利要求9所述的方法,其特征在于,所述第二workload特征和所述压缩参数还用于所述仿真设备获取IO仿真数据以及基于所述IO仿真数据验证所述第一workload特征 是否可信。The method according to claim 9, wherein the second workload feature and the compression parameters are also used for the simulation device to obtain IO simulation data and verify the first workload feature based on the IO simulation data Is it credible.
  11. 一种工作负载特征提取装置,其特征在于,所述装置包括:A workload feature extraction device, characterized in that the device comprises:
    处理单元,用于在输入输出IO执行过程中,获取所述存储设备的第一工作负载workload特征;a processing unit, configured to acquire the first workload feature of the storage device during the execution of the input and output IO;
    存储单元,用于存储所述第一workload特征,所述第一workload特征用于所述存储设备的内存分配、数据搬移、网络附加存储NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。The storage unit is configured to store the first workload feature, and the first workload feature is used for memory allocation, data migration, network attached storage NAS load balancing, hot and cold data block identification, and prefetch policy optimization of the storage device , performance bottleneck perception, load forecasting or load change perception.
  12. 根据权利要求11所述的装置,其特征在于,所述处理单元,具体用于:The device according to claim 11, wherein the processing unit is specifically used for:
    通过处理器根据内存中的工作负载数据,获得所述第一工作负载workload特征。The first workload feature is obtained by the processor according to the workload data in the memory.
  13. 根据权利要求12所述的装置,其特征在于,所述第一工作负载workload特征无需外部设备通过离线方式提取。The device according to claim 12, wherein the feature of the first workload is extracted offline without external equipment.
  14. 根据权利要求11-13任一项所述的装置,其特征在于,在块业务场景下,所述第一workload特征包括时间特征、流特征和热点分布特征中的至少一项,其中,所述时间特征用于指示所述块业务的IO对应的时间间隔,所述流特征用于指示所述块业务的IO流的访问模式,所述热点分布特征用于指示所述块业务中地址块的重用距离分布。The device according to any one of claims 11-13, wherein, in a block service scenario, the first workload feature includes at least one of a time feature, a flow feature, and a hotspot distribution feature, wherein the The time feature is used to indicate the time interval corresponding to the IO of the block service, the flow feature is used to indicate the access mode of the IO flow of the block service, and the hot spot distribution feature is used to indicate the address block in the block service Reuse the distance distribution.
  15. 根据权利要求14所述的装置,其特征在于,所述第一workload特征还包括下述特征中的至少一项:IO大小分布、读写占比参数、读写总带宽和非对齐IO数量。The device according to claim 14, wherein the first workload feature further includes at least one of the following features: IO size distribution, read-write ratio parameter, total read-write bandwidth, and the number of unaligned IOs.
  16. 根据权利要求11-13任一项所述的装置,其特征在于,在文件系统业务场景下,所述第一workload特征包括短时访问特征,所述短时访问特征用于指示所述文件系统业务的批次IO在文件、目录、时间、文件内的IO和操作中至少一个维度上的访问模式。The device according to any one of claims 11-13, wherein in the file system business scenario, the first workload feature includes a short-term access feature, and the short-time access feature is used to indicate the file system The access mode of business batch IO in at least one dimension of files, directories, time, IOs in files, and operations.
  17. 根据权利要求16所述的装置,其特征在于,所述第一workload特征还包括全局特征,所述全局特征用于指示文件系统的层次结构分布,所述全局特征包括以下特征中的至少一种:The device according to claim 16, wherein the first workload feature further includes a global feature, the global feature is used to indicate the hierarchical structure distribution of the file system, and the global feature includes at least one of the following features :
    所述文件系统的文件数量;the number of files in the file system;
    所述文件系统的目录数量;the number of directories in the file system;
    所述文件系统的目录深度分布;The directory depth distribution of the file system;
    所述文件系统的目录下文件数量分布;The number distribution of files under the directory of the file system;
    所述文件系统的文件访问频率分布;和a file access frequency distribution of the file system; and
    所述文件系统的目录访问频率分布。Directory access frequency distribution of the file system.
  18. 根据权利要求11-17任一项所述的装置,其特征在于,处理单元,还用于:The device according to any one of claims 11-17, wherein the processing unit is further configured to:
    对所述第一workload特征进行压缩,获得第二workload特征和压缩参数,所述第二workload特征包括的特征的数量小于所述第一workload特征包括的特征的数量,所述压缩参数用于将所述第二workload特征恢复为所述第一workload特征;Compressing the first workload features to obtain second workload features and compression parameters, the number of features included in the second workload features is smaller than the number of features included in the first workload features, and the compression parameters are used to compress The second workload characteristic reverts to the first workload characteristic;
    根据所述第二workload特征和所述压缩参数,执行NAS负载均衡、内存分配或数据搬移、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。According to the second workload feature and the compression parameters, perform NAS load balancing, memory allocation or data migration, identification of hot and cold data blocks, prefetch policy optimization, performance bottleneck perception, load prediction or load change perception.
  19. 根据权利要求18所述的装置,其特征在于,所述装置还包括:The device according to claim 18, further comprising:
    发送单元,用于向仿真设备发送所述第二workload特征和所述压缩参数,以使所述仿真设备根据所述第二workload特征和所述压缩参数执行内存分配、数据搬移、网络附加存储NAS负载均衡、冷热数据块识别、预取策略调优、性能瓶颈感知、负载预测或负载变化感知。A sending unit, configured to send the second workload characteristics and the compression parameters to the simulation device, so that the simulation device performs memory allocation, data movement, and network attached storage (NAS) according to the second workload characteristics and the compression parameters Load balancing, identification of hot and cold data blocks, prefetch strategy tuning, performance bottleneck perception, load forecasting or load change perception.
  20. 根据权利要求19所述的装置,其特征在于,所述第二workload特征和所述压缩参数还用于所述仿真设备验证所述第一workload特征是否可信。 The apparatus according to claim 19, wherein the second workload feature and the compression parameter are also used by the simulation device to verify whether the first workload feature is credible.
  21. 一种装置,其特征在于,所述装置包括存储器和处理器,所述存储器用于存储程序指令;在所述处理器执行所述存储器中的程序指令时,所述计装置执行如权利要求1-10中任一项所述的方法。A device, characterized in that the device comprises a memory and a processor, and the memory is used to store program instructions; when the processor executes the program instructions in the memory, the computer performs the process described in claim 1. - the method described in any one of 10.
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有程序指令,所述程序指令用于实现权利要求1-10中任一项所述的方法。 A computer-readable storage medium, wherein the computer-readable storage medium stores program instructions, and the program instructions are used to implement the method according to any one of claims 1-10.
PCT/CN2023/074657 2022-02-18 2023-02-06 Workload feature extraction method and apparatus WO2023155703A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210150753.5A CN116661675A (en) 2022-02-18 2022-02-18 Workload feature extraction method and device
CN202210150753.5 2022-02-18

Publications (1)

Publication Number Publication Date
WO2023155703A1 true WO2023155703A1 (en) 2023-08-24

Family

ID=87577578

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/074657 WO2023155703A1 (en) 2022-02-18 2023-02-06 Workload feature extraction method and apparatus

Country Status (2)

Country Link
CN (1) CN116661675A (en)
WO (1) WO2023155703A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101008907A (en) * 2007-01-26 2007-08-01 清华大学 Load-aware IO performance optimization methods based on Bayesian decision
US20170302738A1 (en) * 2016-04-13 2017-10-19 Netapp, Inc. Methods and systems for managing provisioning requests in a networked storage environment
CN109710195A (en) * 2019-01-08 2019-05-03 郑州云海信息技术有限公司 A kind of full flash memory storage load-balancing method, device and storage system
CN110149395A (en) * 2019-05-20 2019-08-20 华南理工大学 One kind is based on dynamic load balancing method in the case of mass small documents high concurrent
CN111352577A (en) * 2018-12-24 2020-06-30 杭州海康威视系统技术有限公司 Object storage method and device
US20210349749A1 (en) * 2012-02-14 2021-11-11 Aloke Guha Systems and methods for dynamic provisioning of resources for virtualized

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101008907A (en) * 2007-01-26 2007-08-01 清华大学 Load-aware IO performance optimization methods based on Bayesian decision
US20210349749A1 (en) * 2012-02-14 2021-11-11 Aloke Guha Systems and methods for dynamic provisioning of resources for virtualized
US20170302738A1 (en) * 2016-04-13 2017-10-19 Netapp, Inc. Methods and systems for managing provisioning requests in a networked storage environment
CN111352577A (en) * 2018-12-24 2020-06-30 杭州海康威视系统技术有限公司 Object storage method and device
CN109710195A (en) * 2019-01-08 2019-05-03 郑州云海信息技术有限公司 A kind of full flash memory storage load-balancing method, device and storage system
CN110149395A (en) * 2019-05-20 2019-08-20 华南理工大学 One kind is based on dynamic load balancing method in the case of mass small documents high concurrent

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHEN YU-LIANG, XU LU: "Efficient Disk I/O Characteristics Analysis Method Based on Virtual Machine Technology", JOURNAL OF SOFTWARE, vol. 21, no. 4, 6 April 2010 (2010-04-06), pages 849 - 862, XP093085199, ISSN: 1000-9825, DOI: 10.3724/SP.J.1001.2010.03492 *

Also Published As

Publication number Publication date
CN116661675A (en) 2023-08-29

Similar Documents

Publication Publication Date Title
US10853339B2 (en) Peer to peer ownership negotiation
US10489059B2 (en) Tier-optimized write scheme
US9348842B2 (en) Virtualized data storage system optimizations
JP5343166B2 (en) Local file server for transferring files to remote file server via communication network, and storage system having these file servers
US10019459B1 (en) Distributed deduplication in a distributed system of hybrid storage and compute nodes
CN109947668B (en) Method and device for storing data
US8949208B1 (en) System and method for bulk data movement between storage tiers
US8943032B1 (en) System and method for data migration using hybrid modes
US10540095B1 (en) Efficient garbage collection for stable data
TW201140430A (en) Allocating storage memory based on future use estimates
US20220236901A1 (en) Data Storage Method and Apparatus
CN109804359A (en) For the system and method by write back data to storage equipment
CN103501319A (en) Low-delay distributed storage system for small files
CN115525602A (en) Data processing method and related device
CN110352410B (en) Tracking access patterns of index nodes and pre-fetching index nodes
CN114035750A (en) File processing method, device, equipment, medium and product
CN109947667B (en) Data access prediction method and device
WO2023050856A1 (en) Data processing method and storage system
CN117112219A (en) Method and device for accessing memory data of host
WO2023155703A1 (en) Workload feature extraction method and apparatus
WO2023040305A1 (en) Data backup system and apparatus
CN114428764B (en) File writing method, system, electronic device and readable storage medium
WO2015049719A1 (en) Storage system and storage method
CN115878580A (en) Log management method and device
WO2022267627A1 (en) Data processing method and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23755706

Country of ref document: EP

Kind code of ref document: A1