CN114595063A

CN114595063A - Method and device for determining access frequency of data

Info

Publication number: CN114595063A
Application number: CN202210231567.4A
Authority: CN
Inventors: 王贯扬
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2022-06-07

Abstract

The disclosure relates to a method and a device for determining access frequency of data and a computer storage medium, and relates to the technical field of computers. The method for determining the access frequency of the data comprises the following steps: acquiring attribute information related to a target block storage instance in a cloud platform; predicting future performance achieved by the target block storage instance in a future period by utilizing a machine learning model according to attribute information related to the target block storage instance; determining, based on the predicted future performance, a frequency of access of data in the target block storage instance in the future period, the determined frequency of access being used to allocate physical resources for the target block storage instance. According to the method and the device, the accuracy of determining the access frequency of the data in the cloud platform can be improved, so that the accuracy of physical resource allocation is improved, the utilization rate of physical resources is improved, and the waste of the physical resources is reduced.

Description

Method and device for determining access frequency of data

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for determining access frequency of data, and a computer-readable storage medium.

Background

In a cloud computing scenario, the access frequency of data in a block storage instance in a cloud platform (also referred to as a cloud computing platform) is determined, and the method has an important guiding significance for reasonably allocating physical resources for the cloud platform.

In the related art, a user determines access frequency of data in a block storage instance in a cloud platform based on experience, or determines access frequency of the data in the block storage instance in a historical time by using statistical knowledge according to historical performance of the block storage instance in the historical time.

Disclosure of Invention

In the related art, the access frequency of data determined through experience has too much subjectivity, and the accuracy of the determined access frequency of the data is poor due to the fact that the access frequency of the data is excessively dependent on the experience accumulation degree of a user. In addition, the historical performance obtained through statistics of statistical knowledge determines the access frequency of the history, and the access frequency of the data in the block storage instance in the future cannot be accurately reflected.

In view of the above technical problems, the present disclosure provides a solution that can improve accuracy of determining access frequency of data in a cloud platform, thereby improving accuracy of physical resource allocation, improving a physical resource utilization rate, and reducing physical resource waste.

According to a first aspect of the present disclosure, there is provided a method of determining access frequency of data, comprising: acquiring attribute information related to a target block storage instance in a cloud platform; predicting future performance achieved by the target block storage instance in a future period by utilizing a machine learning model according to attribute information related to the target block storage instance; determining, from the predicted future performance, a frequency of access of data in the target block storage instance over the future period, the determined frequency of access being used to allocate physical resources for the target block storage instance.

In some embodiments, the future cycle comprises a plurality of future times, the future performance comprises a future performance value of at least one performance of the target block storage instance at each future time, and determining, from the predicted future performance, how frequently data in the target block storage instance is accessed within the future cycle comprises: acquiring the maximum performance value of each performance of the target block storage instance; for each performance, screening a future performance peak corresponding to each performance from a plurality of future performance values corresponding to the plurality of future moments, wherein the future performance peak reflects the general performance condition of each performance in the future cycle; for the target block storage instance, determining access frequency of data in the target block storage instance in the future period according to the maximum performance value of the at least one performance and the corresponding future performance peak value.

In some embodiments, for the target block storage instance, determining, from the maximum performance value of the at least one performance and the corresponding future performance peak, how frequently data in the target block storage instance is accessed in the future cycle comprises: for each performance, determining a ratio of a corresponding future performance peak value to a maximum performance value as a reference ratio; determining the access frequency of the data in the target block storage instance in the future period to be a first degree under the condition that the reference ratio corresponding to each performance is smaller than a first reference ratio threshold; determining the access frequency of the data in the target block storage instance in the future period to be a second degree under the condition that at least one reference ratio value corresponding to one performance is greater than or equal to the first reference ratio threshold and the reference ratio values corresponding to various performances are less than a second reference ratio threshold, wherein the second reference ratio threshold is greater than the first reference ratio threshold and the second degree is higher than the first degree; and determining that the access frequency of the data in the target block storage instance in the future period is a third degree under the condition that the reference ratio value corresponding to at least one performance is greater than or equal to the second reference ratio value threshold, wherein the third degree is higher than the second degree.

In some embodiments, the cloud platform includes at least one target image, the machine learning model includes a first machine learning model corresponding to each target image, the first machine learning model corresponding to each target image is obtained by training according to relevant data of a reference block storage instance mounted on a computing node created based on each target image, and predicting, by using the machine learning model, a future performance value reached by the target block storage instance in a future period includes: and under the condition that the image used when the computing node mounted by the target block storage instance is created belongs to the at least one target image, predicting the future performance value reached by the target block storage instance in the future period by utilizing the first machine learning model corresponding to the image corresponding to the target block storage instance according to the attribute information related to the target block storage instance.

In some embodiments, for each image in the cloud platform, the each image is a target image if a ratio of a total capacity of all block storage instances mounted under a compute node created based on the each image to a total capacity of all block storage instances in the cloud platform is greater than a first capacity ratio threshold.

In some embodiments, the cloud platform includes at least one target image, and in a case that a tenant to which a block storage instance installed under a computing node created based on each target image belongs has a preset tenant, the machine learning model includes a second machine learning model corresponding to each preset tenant, the second machine learning model corresponding to each preset tenant is obtained by training according to data related to a reference block storage instance of each preset tenant, and predicting, according to attribute information related to the target block storage instance, a future performance value reached by data in the target block storage instance in a future cycle by using the machine learning model includes: determining whether the tenant to which the target block storage instance belongs to a preset tenant or not under the condition that the mirror used when the computing node mounted by the target block storage instance is created belongs to the at least one target mirror; and under the condition that the tenant to which the target block storage instance belongs to a preset tenant, predicting a future performance value of the data in the target block storage instance in a future period by using a second machine learning model corresponding to the tenant to which the target block storage instance belongs according to the attribute information related to the target block storage instance.

In some embodiments, for each tenant in the cloud platform, in a case that a ratio of a total capacity of all block storage instances of the tenant to a total capacity of all block storage instances mounted on a compute node to which the tenant belongs is greater than a second capacity ratio threshold, the tenant is a preset tenant.

In some embodiments, the predicting the future performance value of the data in the target block storage instance in the future cycle by using the machine learning model according to the attribute information related to the target block storage instance further includes: under the condition that the tenant to which the target block storage instance belongs does not belong to a preset tenant, according to attribute information related to the target block storage instance, a future performance value of data in the target block storage instance in a future period is predicted by using a first machine learning model corresponding to a mirror image corresponding to the target block storage instance.

In some embodiments, the method of determining access frequency of data further comprises: and under the condition that the image used when the computing node mounted by the target block storage instance is created does not belong to the at least one target image, determining the access frequency of the target block storage instance in the future period to be a preset degree.

In some embodiments, the period duration of the future period is a duration required for the target block storage instance to migrate an amount of data equal to the capacity of the target block storage instance.

In some embodiments, the future performance value is a performance value associated with a read operation; and/or the performance value related to the read operation is measured by at least one performance measurement index of throughput related to the read operation and reading and writing times per second iops related to the read operation.

In some embodiments, the related data of the reference block storage instance comprises attribute information of the reference block storage instance, attribute information of a computing node on which the reference block storage instance is mounted, and historical performance values of data in the reference block storage instance over historical time; and/or the attribute information related to the target block storage instance comprises the attribute information of the target block storage instance and the attribute information of the computing node mounted by the target block storage instance.

In some embodiments, the method of determining access frequency of data further comprises: aiming at each target mirror image, training a basic machine learning model by utilizing related data of a reference block storage example mounted under a computing node created based on each target mirror image to obtain a first machine learning model; and/or for each preset tenant, training a first machine learning model corresponding to a target mirror image corresponding to each preset tenant by using the relevant data of the reference block storage instance of each preset tenant, and obtaining a second machine learning model.

According to a second aspect of the present disclosure, there is provided an apparatus for determining access frequency of data, comprising: an acquisition module configured to acquire attribute information related to a target block storage instance in a cloud platform; a prediction module configured to predict, using a machine learning model, a future performance that the target block storage instance achieves over a future period based on attribute information associated with the target block storage instance; a determination module configured to determine, based on the predicted future performance, a frequency of access of data in the target block storage instance in the future period, the determined frequency of access being used to allocate physical resources for the target block storage instance.

According to a third aspect of the present disclosure, there is provided an apparatus for determining access frequency of data, comprising: a memory; and a processor coupled to the memory, the processor configured to perform the method for determining access frequency of data according to any of the above embodiments based on instructions stored in the memory.

According to a fourth aspect of the present disclosure, there is provided a computer-storable medium having stored thereon computer program instructions which, when executed by a processor, implement the method of determining access frequency of data as described in any of the above embodiments.

In the embodiment, the accuracy of determining the access frequency of the data in the cloud platform can be improved, so that the accuracy of physical resource allocation is improved, the utilization rate of physical resources is improved, and the waste of the physical resources is reduced.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

FIG. 1 is a flow diagram illustrating a method of determining access frequency of data according to some embodiments of the present disclosure;

FIG. 2 is a block diagram illustrating an apparatus to determine how frequently data is accessed according to some embodiments of the present disclosure;

FIG. 3 is a block diagram illustrating an apparatus to determine access frequency of data according to further embodiments of the present disclosure;

FIG. 4 is a block diagram illustrating a computer system for implementing some embodiments of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Fig. 1 is a flow diagram illustrating a method of determining access frequency of data according to some embodiments of the present disclosure.

As shown in fig. 1, the method for determining the access frequency of data includes: step S1, acquiring attribute information related to a target block storage instance in the cloud platform; step S2, according to the attribute information related to the target block storage instance, predicting the future performance of the target block storage instance in the future period by using a machine learning model; and step S3, determining the access frequency of the data in the target block storage instance in the future period according to the predicted future performance. For example, the method of determining the access frequency of data is performed by an apparatus that determines the access frequency of data.

In the embodiment, the machine learning model is used for learning the characteristics of the attribute information related to the target block storage instance, so that the future performance of the target block storage instance in the future period is predicted based on the learned characteristics, and the access frequency of the data in the future period is determined according to the predicted future performance, so that not only can the advance prediction be realized, but also the access frequency can be determined more objectively, and the accuracy of determining the access frequency of the data can be improved. Because the access frequency of the data is accurately determined, the method has important guiding significance for reasonably allocating the physical resources to the block storage instances, thereby improving the accuracy of physical resource allocation, improving the utilization rate of the physical resources and reducing the waste of the physical resources.

In step S1, attribute information related to the target chunk store instance in the cloud platform is acquired. For example, the block storage instance is a cloud hard disk instance. The data in the cloud hard disk instance is actually stored on the physical resources allocated for the cloud hard disk instance. A chunk store instance is a store instance that is stored in the form of a data chunk.

In step S2, the future performance achieved by the target block storage instance in the future cycle is predicted using the machine learning model based on the attribute information associated with the target block storage instance.

In some embodiments, the period duration of the future period is the duration required for the target block storage instance to migrate the same amount of data as the capacity of the target block storage instance. For example, the capacity of the target block storage instance is 800G, the single-disk data migration rate is 10MB/s, and the single-disk daily migration data volume is 10MB/s × 3600s × 24h ≈ 800 GB/day. Therefore, the time period required for the target block storage instance to migrate the same amount of data as the capacity of the target block storage instance is 800G/day 1 day. Thus, the future cycle is 1 day. By considering the single disk daily migration amount, the target block storage instance can be ensured to migrate the data to the newly allocated physical resources in a future period after determining the access frequency of the data therein, and the utilization rate of the physical resources is further improved.

In some embodiments, the attribute information associated with the target block storage instance includes attribute information of the target block storage instance and attribute information of the compute node on which the target block storage instance is mounted. For example, the attribute information of the target block storage instance itself includes at least one of a specification and a capacity of the target block storage instance. The attribute information of the computing node itself mounted by the target block storage instance includes at least one of specification, number of CPUs, number of memories, and network bandwidth configuration information of the computing node. Generally, a cloud manufacturer provides different cloud hard disk service specifications, such as performance type and capacity type, for a cloud hard disk user according to types of storage media and storage servers. Generally, a cloud manufacturer provides different cloud computing node specifications, such as general type, memory optimization type, computing optimization type and the like, for a user of a computing node according to a CPU, a memory ratio, a CPU model and the like of a computing server.

In some embodiments, the cloud platform includes at least one target image, and the machine learning model includes a first machine learning model corresponding to each target image. And the first machine learning model corresponding to each target mirror image is obtained by training according to the relevant data of the reference block storage example mounted on the computing node established based on each target mirror image. And under the condition that the mirror image used when the computing node mounted by the target block storage instance is created belongs to at least one target mirror image, predicting the future performance value reached by the target block storage instance in the future period by utilizing the first machine learning model corresponding to the mirror image corresponding to the target block storage instance according to the attribute information related to the target block storage instance. The image used to create the compute node contains the operating system, application software, and related configuration within the operating system. Computing nodes, which typically provide the same type of application service, are created using the same image.

In some embodiments, the related data of the reference block storage instance comprises attribute information of the reference block storage instance, attribute information of a computing node on which the reference block storage instance is mounted, and historical performance values of data in the reference block storage instance over historical time. For example, the attribute information of the target block storage instance itself includes at least one of a specification and a capacity of the target block storage instance. The attribute information of the computing node itself mounted by the target block storage instance includes at least one of specification, number of CPUs, number of memories, and network bandwidth configuration information of the computing node.

In the embodiment, the difference of the I/O operation behaviors of the block storage instances under different images in the cloud platform is large, and the first machine learning model conforming to the corresponding characteristics of each target image is obtained by training data related to each target image, so that the accuracy of predicting the future performance value is improved, the accuracy of determining the access frequency of the data is further improved, the accuracy of physical resource allocation can be further improved, the physical resource utilization rate is further improved, and the physical resource waste is further reduced.

In addition, in the training process, not only the attribute information of the reference block storage instance and the historical performance value of the data stored in the reference block storage instance in the historical time are considered, but also the attribute information of the mounted computing node is considered, so that the accuracy of the first machine learning model can be further improved, the accuracy of determining the access frequency of the data can be further improved, the accuracy of physical resource allocation can be further improved, the physical resource utilization rate can be further improved, and the physical resource waste can be further reduced.

In some embodiments, for each image in the cloud platform, each image is a target image if a ratio of a total capacity of all block storage instances mounted under a compute node created based on each image to a total capacity of all block storage instances in the cloud platform is greater than a first capacity ratio threshold. For example, the first capacity ratio threshold is 1%. Also for example, different first capacity ratio thresholds may be set according to actual traffic demands. By training the first machine learning model aiming at the mirror image with larger total storage capacity, the training cost can be reduced, and the training efficiency is improved.

In some embodiments, the cloud platform includes at least one target image, and the machine learning model includes a second machine learning model corresponding to each preset tenant when the preset tenant exists in the tenant to which the block storage instance mounted on the computing node created based on each target image. And the second machine learning model corresponding to each preset tenant is obtained by training according to the related data of the reference block storage instance of each preset tenant.

In some embodiments, predicting future performance values that are achieved by data in a target block storage instance over a future period using a machine learning model based on attribute information associated with the target block storage instance may be accomplished as follows.

First, in a case where an image used when a computing node mounted by a target block storage instance is created belongs to at least one target image, it is determined whether a tenant to which the target block storage instance belongs to a preset tenant.

And then, under the condition that the tenant to which the target block storage instance belongs to a preset tenant, predicting the future performance value of the data in the target block storage instance in the future period by using a second machine learning model corresponding to the tenant to which the target block storage instance belongs according to the attribute information related to the target block storage instance.

In the above embodiment, the relevant features (including I/O operation behavior features) of the block storage instances of different tenants under the same mirror image have a large difference, and the second machine learning model that is more consistent with the features of the block storage instances under the tenant granularity is trained on all or part of tenants, so that the accuracy of predicting the future performance value is further improved, the accuracy of determining the access frequency of data is further improved, the accuracy of physical resource allocation can be further improved, the physical resource utilization rate is further improved, and the physical resource waste is further reduced.

In some embodiments, for each tenant in the cloud platform, in a case that a ratio of a total capacity of all block storage instances of each tenant to a total capacity of all block storage instances mounted on a compute node to which each tenant belongs is greater than a second capacity ratio threshold, each tenant is a preset tenant. For example, the second capacity ratio threshold is 1%. For another example, different second capacity ratio thresholds may also be set according to actual traffic demands. The first machine learning model is trained for the tenants with larger total storage capacity under a certain mirror image, so that the training cost can be reduced, and the training efficiency is improved.

In some embodiments, taking the case that the machine learning model further includes one first machine learning model corresponding to each target image as an example, in the case that the tenant to which the target block storage instance belongs does not belong to the preset tenant, the future performance value of the data in the target block storage instance in the future period is predicted according to the attribute information related to the target block storage instance by using the first machine learning model corresponding to the image corresponding to the target block storage instance.

In some embodiments, the future performance value is a performance value associated with a read operation. For example, the performance value associated with a read operation is measured using at least one performance measure of a throughput associated with the read operation and a number of reads and writes Per Second iops (Input/Output Per Second) associated with the read operation. The performance values change over time, and the performance value achieved by the performance of a block storage instance is measured or measured using a performance metric. The performance associated with read operations includes read iops performance and read throughput performance. Read iops performance is measured in terms of the number of read requests that can be processed per second. Read throughput performance is measured in terms of the read throughput bandwidth that can be processed per second.

In the above embodiment, the access frequency of the data in the block storage instance by the read operation is closely related, and the performance value related to the read operation is adopted to judge the access frequency of the data, so that the accuracy of determining the access frequency of the data can be further improved, the accuracy of physical resource allocation can be further improved, the utilization rate of physical resources can be further improved, and the waste of physical resources can be further reduced.

In step S3, the access frequency of the data in the target block storage instance in the future cycle is determined based on the predicted future performance. The determined access frequency is used to allocate physical resources for the target block storage instance.

In some embodiments, the future cycle includes a plurality of future times, and the future performance includes a future performance value of at least one performance of the target block storage instance at each future time. The above step S3 can be implemented as follows.

First, a maximum performance value per performance of the target block storage instance is obtained.

Then, for each performance, a future performance peak corresponding to each performance is selected from a plurality of future performance values corresponding to a plurality of future times. The future performance peaks reflect the general performance of each performance over the future cycles. In some embodiments, the Nth largest future performance value of the plurality of future performance values of each performance at the future time is selected as the future performance peak of the performance. N is an integer greater than 0 and less than the total number of the plurality of future performance values. For example, for a plurality of future performance values of each performance in a future cycle, the performance values are arranged in a descending order, and the future performance value of the 95 th percent bit is selected as the future performance peak.

Finally, for the target block storage instance, a frequency of access of data in the target block storage instance in a future cycle is determined based on the maximum performance value of the at least one performance and the corresponding future performance peak.

In the above embodiment, the future performance peak may reflect a general situation of each performance in a future period, and the access frequency of the data is determined by the future performance peak and the maximum performance value, so that the universality and the representativeness of the determined access frequency of the data may be improved, and the influence of the extreme data on the accuracy of determining the access frequency of the data may be avoided, thereby further improving the accuracy of determining the access frequency of the data, further improving the accuracy of physical resource allocation, further improving the physical resource utilization rate, and further reducing the physical resource waste.

In some embodiments, determining how frequently data in the target block storage instance is accessed in future cycles based on the maximum performance value of the at least one performance and the corresponding future performance peak may be accomplished as follows.

First, for each performance, a ratio of the corresponding future performance peak to the maximum performance value is determined as a reference ratio.

Secondly, under the condition that the reference ratio corresponding to various performances is smaller than the first reference ratio threshold, the access frequency of the data in the target block storage instance in the future period is determined to be a first degree. For example, the first degree is a low degree.

Then, in the case that at least one of the reference ratio values corresponding to the performances is greater than or equal to the first reference ratio threshold value and the reference ratio values corresponding to the performances are all less than the second reference ratio threshold value, determining that the access frequency of the data in the target block storage instance in the future period is a second degree. The second reference ratio threshold is greater than the first reference ratio threshold. The second level is higher than the first level. In some embodiments, the second degree is a medium degree. For example, the first reference ratio threshold is 20%. For example, the second reference ratio threshold is 60%. For example, other first reference ratio threshold and second reference ratio threshold may be set according to actual requirements.

And finally, determining the access frequency of the data in the target block storage instance in the future period to be a third degree under the condition that the reference ratio corresponding to at least one performance is greater than or equal to the second reference ratio threshold. The third degree is higher than the second degree. For example, the third degree is a high degree.

In some embodiments, in a case where an image used when the computing node on which the target block storage instance is mounted is created does not belong to at least one target image, it is determined that the target block storage instance is accessed to a preset degree frequently in a future cycle. For example, for an image used when a computing node is created for which the total number of mounted block storage instances is less than a number threshold and the ratio of the total capacity of the mounted block storage instances to the total capacity of all block storage instances of the cloud platform is less than a first capacity ratio threshold, the image does not belong to a target image, and the access frequency of the mounted target block storage instances in a future period can be preset to a first degree.

In some embodiments, physical resources are allocated to a target block storage instance based on how frequently data in the target block storage instance is accessed in future cycles. The block storage instance points to the physical resources allocated to the block storage instance in the cloud platform, and stores data on the physical resources allocated to the block storage instance.

In some embodiments, in the case that the data in the target block storage instance is accessed frequently within the future period to a first extent, the data in the target block storage instance belongs to cold data within the future period, and the data in the target block storage instance is allocated a lower-performance physical resource.

In some embodiments, in the case that the data in the target block storage instance is accessed frequently to a second extent in a future period, the data in the target block storage instance belongs to the warm data in the future period, and the data in the target block storage instance is allocated a physical resource with moderate performance.

In some embodiments, in the case that the data in the target block storage instance is accessed frequently to a third extent in a future cycle, the data in the target block storage instance belongs to hot data in the future cycle, and a higher-performance physical resource is allocated to the data in the target block storage instance.

In some embodiments, for each target image, the basic machine learning model may be trained by using data related to a reference block storage instance installed under a computing node created based on each target image, so as to obtain the first machine learning model.

In some embodiments, for each preset tenant, the first machine learning model corresponding to the target image corresponding to each preset tenant may be trained by using the relevant data of the reference block storage instance of each preset tenant, so as to obtain a second machine learning model. By training the first machine learning model for the second time, the accuracy of determining the access frequency of the data in the block storage instance of the preset tenant can be further improved, so that the accuracy of physical resource allocation can be further improved, the utilization rate of physical resources can be further improved, and the waste of physical resources can be further reduced.

Fig. 2 is a block diagram illustrating an apparatus for determining access frequency of data according to some embodiments of the present disclosure.

As shown in fig. 2, the apparatus 2 for determining access frequency of data includes an obtaining module 21, a predicting module 22, and a determining module 23.

The obtaining module 21 is configured to obtain attribute information related to the target block storage instance in the cloud platform, for example, execute step S1 shown in fig. 1.

The prediction module 22 is configured to predict, using the machine learning model, a future performance that the target block storage instance will achieve in a future cycle based on the attribute information associated with the target block storage instance, for example, performing step S2 as shown in fig. 1.

The determining module 23 is configured to determine, according to the predicted future performance, a frequency of access of data in the target block storage instance in a future period, the determined frequency of access being used for allocating physical resources for the target block storage instance, for example, performing step S3 shown in fig. 1.

FIG. 3 is a block diagram illustrating an apparatus to determine how frequently data is accessed according to further embodiments of the present disclosure.

As shown in fig. 3, the apparatus 3 for determining the access frequency of data includes a memory 31; and a processor 32 coupled to the memory 31. The memory 31 is used for storing instructions for performing the corresponding embodiment of the method for determining the access frequency of data. The processor 32 is configured to perform a method of determining how frequently data is accessed in any of the embodiments of the present disclosure based on instructions stored in the memory 31.

As shown in FIG. 4, computer system 40 may take the form of a general purpose computing device. Computer system 40 includes a memory 410, a processor 420, and a bus 400 that connects the various system components.

The memory 410 may include, for example, system memory, non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs. The system memory may include volatile storage media such as Random Access Memory (RAM) and/or cache memory. The non-volatile storage medium, for instance, stores instructions to perform corresponding embodiments of at least one of the methods of determining access frequency of data. Non-volatile storage media include, but are not limited to, magnetic disk storage, optical storage, flash memory, and the like.

Processor 420 may be implemented as discrete hardware components, such as general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gates or transistors, and the like. Accordingly, each of the modules such as the judging module and the determining module may be implemented by a Central Processing Unit (CPU) executing instructions in a memory to perform the corresponding steps, or may be implemented by a dedicated circuit to perform the corresponding steps.

Bus 400 may use any of a variety of bus architectures. For example, bus structures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, and Peripheral Component Interconnect (PCI) bus.

Computer system 40 may also include input output interface 430, network interface 440, storage interface 450, and the like. These

interfaces

430, 440, 450 and the memory 410 and the processor 420 may be connected by a bus 400. The input/output interface 430 may provide a connection interface for input/output devices such as a display, a mouse, a keyboard, and the like. The network interface 440 provides a connection interface for various networking devices. The storage interface 450 provides a connection interface for external storage devices such as a floppy disk, a usb disk, and an SD card.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable apparatus to produce a machine, such that the execution of the instructions by the processor results in an apparatus that implements the functions specified in the flowchart and/or block diagram block or blocks.

These computer-readable program instructions may also be stored in a computer-readable memory that can direct a computer to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instructions which implement the function specified in the flowchart and/or block diagram block or blocks.

The present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.

By the method and the device for determining the access frequency of the data and the computer storage medium in the embodiment, the accuracy of determining the access frequency of the data in the cloud platform can be improved, so that the accuracy of physical resource allocation is improved, the physical resource utilization rate is improved, and the physical resource waste is reduced.

Thus, a method and apparatus, computer-readable storage medium, for determining access frequency of data according to the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

Claims

1. A method of determining access frequency of data, comprising:

acquiring attribute information related to a target block storage instance in a cloud platform;

predicting future performance achieved by the target block storage instance in a future period by utilizing a machine learning model according to attribute information related to the target block storage instance;

determining, from the predicted future performance, a frequency of access of data in the target block storage instance over the future period, the determined frequency of access being used to allocate physical resources for the target block storage instance.

2. The method of determining access frequency of data according to claim 1, wherein the future cycle includes a plurality of future times, the future performance includes a future performance value of at least one performance of the target block storage instance at each future time, and determining the access frequency of the data in the target block storage instance within the future cycle based on the predicted future performance includes:

acquiring the maximum performance value of each performance of the target block storage instance;

for each performance, screening a future performance peak corresponding to each performance from a plurality of future performance values corresponding to the plurality of future moments, wherein the future performance peak reflects the general performance condition of each performance in the future cycle;

for the target block storage instance, determining access frequency of data in the target block storage instance in the future period according to the maximum performance value of the at least one performance and the corresponding future performance peak value.

3. The method of determining access frequency of data according to claim 2, wherein determining, for the target block storage instance, the access frequency of data in the target block storage instance in the future period according to the maximum performance value of the at least one performance and the corresponding future performance peak comprises:

for each performance, determining a ratio of a corresponding future performance peak value to a maximum performance value as a reference ratio;

determining the access frequency of the data in the target block storage instance in the future period to be a first degree under the condition that the reference ratio corresponding to each performance is smaller than a first reference ratio threshold;

determining the access frequency of the data in the target block storage instance in the future period to be a second degree under the condition that at least one reference ratio value corresponding to one performance is greater than or equal to the first reference ratio threshold and the reference ratio values corresponding to various performances are less than a second reference ratio threshold, wherein the second reference ratio threshold is greater than the first reference ratio threshold and the second degree is higher than the first degree;

determining that the data in the target block storage instance is accessed frequently in the future period to a third degree, where the reference ratio value corresponding to at least one performance is greater than or equal to the second reference ratio threshold, and the third degree is higher than the second degree.

4. The method for determining access frequency of data according to claim 1, wherein the cloud platform comprises at least one target image, the machine learning model comprises a first machine learning model corresponding to each target image, the first machine learning model corresponding to each target image is obtained by training according to relevant data of a reference block storage instance mounted by a computing node created based on each target image, and according to attribute information relevant to the target block storage instance, predicting a future performance value reached by the target block storage instance in a future period by using the machine learning model comprises:

and under the condition that the image used when the computing node mounted by the target block storage instance is created belongs to the at least one target image, predicting the future performance value reached by the target block storage instance in the future period by utilizing the first machine learning model corresponding to the image corresponding to the target block storage instance according to the attribute information related to the target block storage instance.

5. The method for determining access frequency of data according to claim 4, wherein for each image in the cloud platform, the each image is a target image if a ratio of a total capacity of all block storage instances mounted under a computing node created based on the each image to a total capacity of all block storage instances in the cloud platform is greater than a first capacity ratio threshold.

6. The method for determining the access frequency of data according to claim 1, wherein the cloud platform includes at least one target image, and in a case where a tenant to which a block storage instance installed under a compute node created based on each target image belongs has a preset tenant, the machine learning model includes a second machine learning model corresponding to each preset tenant, the second machine learning model corresponding to each preset tenant is obtained by training according to data related to a reference block storage instance of each preset tenant, and predicting a future performance value reached by data in the target block storage instance in a future cycle by using the machine learning model according to attribute information related to the target block storage instance includes:

determining whether the tenant to which the target block storage instance belongs to a preset tenant or not under the condition that the mirror used when the computing node mounted by the target block storage instance is created belongs to the at least one target mirror;

and under the condition that the tenant to which the target block storage instance belongs to a preset tenant, predicting a future performance value of the data in the target block storage instance in a future period by using a second machine learning model corresponding to the tenant to which the target block storage instance belongs according to the attribute information related to the target block storage instance.

7. The method for determining access frequency of data according to claim 6, wherein for each tenant in the cloud platform, the each tenant is a preset tenant if a ratio of a total capacity of all block storage instances of the each tenant to a total capacity of all block storage instances mounted on a computing node to which the each tenant belongs is greater than a second capacity ratio threshold.

8. The method of determining access frequency of data according to claim 6, wherein the machine learning model further comprises a first machine learning model corresponding to each target image, the first machine learning model corresponding to each target image is obtained by training according to the related data of the reference block storage instance mounted by the computing node corresponding to each target image, and predicting the future performance value of the data in the target block storage instance in the future period by using the machine learning model according to the attribute information related to the target block storage instance further comprises:

under the condition that the tenant to which the target block storage instance belongs does not belong to a preset tenant, according to attribute information related to the target block storage instance, a future performance value of data in the target block storage instance in a future period is predicted by using a first machine learning model corresponding to a mirror image corresponding to the target block storage instance.

9. The method of determining access frequency of data according to claim 4, further comprising:

and under the condition that the image used when the computing node mounted by the target block storage instance is created does not belong to the at least one target image, determining the access frequency of the target block storage instance in the future period to be a preset degree.

10. The method of determining access frequency of data according to claim 1, wherein a cycle duration of the future period is a duration required for the target block storage instance to migrate an amount of data equal to a capacity of the target block storage instance.

11. The method of determining access frequency of data of claim 1, wherein the future performance value is a performance value associated with a read operation; and/or

And the performance value related to the read operation is measured by adopting at least one performance measurement index of throughput related to the read operation and reading and writing times per second iops related to the read operation.

12. The method for determining the access frequency of the data according to any one of claims 4 to 9, wherein the related data of the reference block storage instance comprises the attribute information of the reference block storage instance, the attribute information of the computing node mounted by the reference block storage instance and the historical performance value of the data in the reference block storage instance in the historical time; and/or

The attribute information related to the target block storage instance comprises attribute information of the target block storage instance and attribute information of the computing node mounted by the target block storage instance.

13. The method of determining access frequency of data according to claim 8, further comprising:

aiming at each target mirror image, training a basic machine learning model by utilizing related data of a reference block storage example mounted under a computing node created based on each target mirror image to obtain a first machine learning model; and/or

And aiming at each preset tenant, training a first machine learning model corresponding to a target mirror image corresponding to each preset tenant by using the relevant data of the reference block storage instance of each preset tenant, and obtaining a second machine learning model.

14. An apparatus for determining access frequency of data, comprising:

an acquisition module configured to acquire attribute information related to a target block storage instance in a cloud platform;

a prediction module configured to predict, using a machine learning model, a future performance that the target block storage instance achieves over a future period based on attribute information associated with the target block storage instance;

a determination module configured to determine, based on the predicted future performance, a frequency of access of data in the target block storage instance in the future period, the determined frequency of access being used to allocate physical resources for the target block storage instance.

15. An apparatus for determining access frequency of data, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of determining access frequency of data according to any one of claims 1 to 13 based on instructions stored in the memory.

16. A computer-storable medium having stored thereon computer program instructions which, when executed by a processor, implement a method of determining a frequency of access to data as claimed in any one of claims 1 to 13.