CN114595063A - Method and device for determining access frequency of data - Google Patents

Method and device for determining access frequency of data Download PDF

Info

Publication number
CN114595063A
CN114595063A CN202210231567.4A CN202210231567A CN114595063A CN 114595063 A CN114595063 A CN 114595063A CN 202210231567 A CN202210231567 A CN 202210231567A CN 114595063 A CN114595063 A CN 114595063A
Authority
CN
China
Prior art keywords
block storage
storage instance
target block
data
future
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210231567.4A
Other languages
Chinese (zh)
Inventor
王贯扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Information Technology Co Ltd
Original Assignee
Jingdong Technology Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Information Technology Co Ltd filed Critical Jingdong Technology Information Technology Co Ltd
Priority to CN202210231567.4A priority Critical patent/CN114595063A/en
Publication of CN114595063A publication Critical patent/CN114595063A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3442Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for planning or managing the needed capacity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The disclosure relates to a method and a device for determining access frequency of data and a computer storage medium, and relates to the technical field of computers. The method for determining the access frequency of the data comprises the following steps: acquiring attribute information related to a target block storage instance in a cloud platform; predicting future performance achieved by the target block storage instance in a future period by utilizing a machine learning model according to attribute information related to the target block storage instance; determining, based on the predicted future performance, a frequency of access of data in the target block storage instance in the future period, the determined frequency of access being used to allocate physical resources for the target block storage instance. According to the method and the device, the accuracy of determining the access frequency of the data in the cloud platform can be improved, so that the accuracy of physical resource allocation is improved, the utilization rate of physical resources is improved, and the waste of the physical resources is reduced.

Description

Method and device for determining access frequency of data
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for determining access frequency of data, and a computer-readable storage medium.
Background
In a cloud computing scenario, the access frequency of data in a block storage instance in a cloud platform (also referred to as a cloud computing platform) is determined, and the method has an important guiding significance for reasonably allocating physical resources for the cloud platform.
In the related art, a user determines access frequency of data in a block storage instance in a cloud platform based on experience, or determines access frequency of the data in the block storage instance in a historical time by using statistical knowledge according to historical performance of the block storage instance in the historical time.
Disclosure of Invention
In the related art, the access frequency of data determined through experience has too much subjectivity, and the accuracy of the determined access frequency of the data is poor due to the fact that the access frequency of the data is excessively dependent on the experience accumulation degree of a user. In addition, the historical performance obtained through statistics of statistical knowledge determines the access frequency of the history, and the access frequency of the data in the block storage instance in the future cannot be accurately reflected.
In view of the above technical problems, the present disclosure provides a solution that can improve accuracy of determining access frequency of data in a cloud platform, thereby improving accuracy of physical resource allocation, improving a physical resource utilization rate, and reducing physical resource waste.
According to a first aspect of the present disclosure, there is provided a method of determining access frequency of data, comprising: acquiring attribute information related to a target block storage instance in a cloud platform; predicting future performance achieved by the target block storage instance in a future period by utilizing a machine learning model according to attribute information related to the target block storage instance; determining, from the predicted future performance, a frequency of access of data in the target block storage instance over the future period, the determined frequency of access being used to allocate physical resources for the target block storage instance.
In some embodiments, the future cycle comprises a plurality of future times, the future performance comprises a future performance value of at least one performance of the target block storage instance at each future time, and determining, from the predicted future performance, how frequently data in the target block storage instance is accessed within the future cycle comprises: acquiring the maximum performance value of each performance of the target block storage instance; for each performance, screening a future performance peak corresponding to each performance from a plurality of future performance values corresponding to the plurality of future moments, wherein the future performance peak reflects the general performance condition of each performance in the future cycle; for the target block storage instance, determining access frequency of data in the target block storage instance in the future period according to the maximum performance value of the at least one performance and the corresponding future performance peak value.
In some embodiments, for the target block storage instance, determining, from the maximum performance value of the at least one performance and the corresponding future performance peak, how frequently data in the target block storage instance is accessed in the future cycle comprises: for each performance, determining a ratio of a corresponding future performance peak value to a maximum performance value as a reference ratio; determining the access frequency of the data in the target block storage instance in the future period to be a first degree under the condition that the reference ratio corresponding to each performance is smaller than a first reference ratio threshold; determining the access frequency of the data in the target block storage instance in the future period to be a second degree under the condition that at least one reference ratio value corresponding to one performance is greater than or equal to the first reference ratio threshold and the reference ratio values corresponding to various performances are less than a second reference ratio threshold, wherein the second reference ratio threshold is greater than the first reference ratio threshold and the second degree is higher than the first degree; and determining that the access frequency of the data in the target block storage instance in the future period is a third degree under the condition that the reference ratio value corresponding to at least one performance is greater than or equal to the second reference ratio value threshold, wherein the third degree is higher than the second degree.
In some embodiments, the cloud platform includes at least one target image, the machine learning model includes a first machine learning model corresponding to each target image, the first machine learning model corresponding to each target image is obtained by training according to relevant data of a reference block storage instance mounted on a computing node created based on each target image, and predicting, by using the machine learning model, a future performance value reached by the target block storage instance in a future period includes: and under the condition that the image used when the computing node mounted by the target block storage instance is created belongs to the at least one target image, predicting the future performance value reached by the target block storage instance in the future period by utilizing the first machine learning model corresponding to the image corresponding to the target block storage instance according to the attribute information related to the target block storage instance.
In some embodiments, for each image in the cloud platform, the each image is a target image if a ratio of a total capacity of all block storage instances mounted under a compute node created based on the each image to a total capacity of all block storage instances in the cloud platform is greater than a first capacity ratio threshold.
In some embodiments, the cloud platform includes at least one target image, and in a case that a tenant to which a block storage instance installed under a computing node created based on each target image belongs has a preset tenant, the machine learning model includes a second machine learning model corresponding to each preset tenant, the second machine learning model corresponding to each preset tenant is obtained by training according to data related to a reference block storage instance of each preset tenant, and predicting, according to attribute information related to the target block storage instance, a future performance value reached by data in the target block storage instance in a future cycle by using the machine learning model includes: determining whether the tenant to which the target block storage instance belongs to a preset tenant or not under the condition that the mirror used when the computing node mounted by the target block storage instance is created belongs to the at least one target mirror; and under the condition that the tenant to which the target block storage instance belongs to a preset tenant, predicting a future performance value of the data in the target block storage instance in a future period by using a second machine learning model corresponding to the tenant to which the target block storage instance belongs according to the attribute information related to the target block storage instance.
In some embodiments, for each tenant in the cloud platform, in a case that a ratio of a total capacity of all block storage instances of the tenant to a total capacity of all block storage instances mounted on a compute node to which the tenant belongs is greater than a second capacity ratio threshold, the tenant is a preset tenant.
In some embodiments, the predicting the future performance value of the data in the target block storage instance in the future cycle by using the machine learning model according to the attribute information related to the target block storage instance further includes: under the condition that the tenant to which the target block storage instance belongs does not belong to a preset tenant, according to attribute information related to the target block storage instance, a future performance value of data in the target block storage instance in a future period is predicted by using a first machine learning model corresponding to a mirror image corresponding to the target block storage instance.
In some embodiments, the method of determining access frequency of data further comprises: and under the condition that the image used when the computing node mounted by the target block storage instance is created does not belong to the at least one target image, determining the access frequency of the target block storage instance in the future period to be a preset degree.
In some embodiments, the period duration of the future period is a duration required for the target block storage instance to migrate an amount of data equal to the capacity of the target block storage instance.
In some embodiments, the future performance value is a performance value associated with a read operation; and/or the performance value related to the read operation is measured by at least one performance measurement index of throughput related to the read operation and reading and writing times per second iops related to the read operation.
In some embodiments, the related data of the reference block storage instance comprises attribute information of the reference block storage instance, attribute information of a computing node on which the reference block storage instance is mounted, and historical performance values of data in the reference block storage instance over historical time; and/or the attribute information related to the target block storage instance comprises the attribute information of the target block storage instance and the attribute information of the computing node mounted by the target block storage instance.
In some embodiments, the method of determining access frequency of data further comprises: aiming at each target mirror image, training a basic machine learning model by utilizing related data of a reference block storage example mounted under a computing node created based on each target mirror image to obtain a first machine learning model; and/or for each preset tenant, training a first machine learning model corresponding to a target mirror image corresponding to each preset tenant by using the relevant data of the reference block storage instance of each preset tenant, and obtaining a second machine learning model.
According to a second aspect of the present disclosure, there is provided an apparatus for determining access frequency of data, comprising: an acquisition module configured to acquire attribute information related to a target block storage instance in a cloud platform; a prediction module configured to predict, using a machine learning model, a future performance that the target block storage instance achieves over a future period based on attribute information associated with the target block storage instance; a determination module configured to determine, based on the predicted future performance, a frequency of access of data in the target block storage instance in the future period, the determined frequency of access being used to allocate physical resources for the target block storage instance.
According to a third aspect of the present disclosure, there is provided an apparatus for determining access frequency of data, comprising: a memory; and a processor coupled to the memory, the processor configured to perform the method for determining access frequency of data according to any of the above embodiments based on instructions stored in the memory.
According to a fourth aspect of the present disclosure, there is provided a computer-storable medium having stored thereon computer program instructions which, when executed by a processor, implement the method of determining access frequency of data as described in any of the above embodiments.
In the embodiment, the accuracy of determining the access frequency of the data in the cloud platform can be improved, so that the accuracy of physical resource allocation is improved, the utilization rate of physical resources is improved, and the waste of the physical resources is reduced.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:
FIG. 1 is a flow diagram illustrating a method of determining access frequency of data according to some embodiments of the present disclosure;
FIG. 2 is a block diagram illustrating an apparatus to determine how frequently data is accessed according to some embodiments of the present disclosure;
FIG. 3 is a block diagram illustrating an apparatus to determine access frequency of data according to further embodiments of the present disclosure;
FIG. 4 is a block diagram illustrating a computer system for implementing some embodiments of the present disclosure.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Fig. 1 is a flow diagram illustrating a method of determining access frequency of data according to some embodiments of the present disclosure.
As shown in fig. 1, the method for determining the access frequency of data includes: step S1, acquiring attribute information related to a target block storage instance in the cloud platform; step S2, according to the attribute information related to the target block storage instance, predicting the future performance of the target block storage instance in the future period by using a machine learning model; and step S3, determining the access frequency of the data in the target block storage instance in the future period according to the predicted future performance. For example, the method of determining the access frequency of data is performed by an apparatus that determines the access frequency of data.
In the embodiment, the machine learning model is used for learning the characteristics of the attribute information related to the target block storage instance, so that the future performance of the target block storage instance in the future period is predicted based on the learned characteristics, and the access frequency of the data in the future period is determined according to the predicted future performance, so that not only can the advance prediction be realized, but also the access frequency can be determined more objectively, and the accuracy of determining the access frequency of the data can be improved. Because the access frequency of the data is accurately determined, the method has important guiding significance for reasonably allocating the physical resources to the block storage instances, thereby improving the accuracy of physical resource allocation, improving the utilization rate of the physical resources and reducing the waste of the physical resources.
In step S1, attribute information related to the target chunk store instance in the cloud platform is acquired. For example, the block storage instance is a cloud hard disk instance. The data in the cloud hard disk instance is actually stored on the physical resources allocated for the cloud hard disk instance. A chunk store instance is a store instance that is stored in the form of a data chunk.
In step S2, the future performance achieved by the target block storage instance in the future cycle is predicted using the machine learning model based on the attribute information associated with the target block storage instance.
In some embodiments, the period duration of the future period is the duration required for the target block storage instance to migrate the same amount of data as the capacity of the target block storage instance. For example, the capacity of the target block storage instance is 800G, the single-disk data migration rate is 10MB/s, and the single-disk daily migration data volume is 10MB/s × 3600s × 24h ≈ 800 GB/day. Therefore, the time period required for the target block storage instance to migrate the same amount of data as the capacity of the target block storage instance is 800G/day 1 day. Thus, the future cycle is 1 day. By considering the single disk daily migration amount, the target block storage instance can be ensured to migrate the data to the newly allocated physical resources in a future period after determining the access frequency of the data therein, and the utilization rate of the physical resources is further improved.
In some embodiments, the attribute information associated with the target block storage instance includes attribute information of the target block storage instance and attribute information of the compute node on which the target block storage instance is mounted. For example, the attribute information of the target block storage instance itself includes at least one of a specification and a capacity of the target block storage instance. The attribute information of the computing node itself mounted by the target block storage instance includes at least one of specification, number of CPUs, number of memories, and network bandwidth configuration information of the computing node. Generally, a cloud manufacturer provides different cloud hard disk service specifications, such as performance type and capacity type, for a cloud hard disk user according to types of storage media and storage servers. Generally, a cloud manufacturer provides different cloud computing node specifications, such as general type, memory optimization type, computing optimization type and the like, for a user of a computing node according to a CPU, a memory ratio, a CPU model and the like of a computing server.
In some embodiments, the cloud platform includes at least one target image, and the machine learning model includes a first machine learning model corresponding to each target image. And the first machine learning model corresponding to each target mirror image is obtained by training according to the relevant data of the reference block storage example mounted on the computing node established based on each target mirror image. And under the condition that the mirror image used when the computing node mounted by the target block storage instance is created belongs to at least one target mirror image, predicting the future performance value reached by the target block storage instance in the future period by utilizing the first machine learning model corresponding to the mirror image corresponding to the target block storage instance according to the attribute information related to the target block storage instance. The image used to create the compute node contains the operating system, application software, and related configuration within the operating system. Computing nodes, which typically provide the same type of application service, are created using the same image.
In some embodiments, the related data of the reference block storage instance comprises attribute information of the reference block storage instance, attribute information of a computing node on which the reference block storage instance is mounted, and historical performance values of data in the reference block storage instance over historical time. For example, the attribute information of the target block storage instance itself includes at least one of a specification and a capacity of the target block storage instance. The attribute information of the computing node itself mounted by the target block storage instance includes at least one of specification, number of CPUs, number of memories, and network bandwidth configuration information of the computing node.
In the embodiment, the difference of the I/O operation behaviors of the block storage instances under different images in the cloud platform is large, and the first machine learning model conforming to the corresponding characteristics of each target image is obtained by training data related to each target image, so that the accuracy of predicting the future performance value is improved, the accuracy of determining the access frequency of the data is further improved, the accuracy of physical resource allocation can be further improved, the physical resource utilization rate is further improved, and the physical resource waste is further reduced.
In addition, in the training process, not only the attribute information of the reference block storage instance and the historical performance value of the data stored in the reference block storage instance in the historical time are considered, but also the attribute information of the mounted computing node is considered, so that the accuracy of the first machine learning model can be further improved, the accuracy of determining the access frequency of the data can be further improved, the accuracy of physical resource allocation can be further improved, the physical resource utilization rate can be further improved, and the physical resource waste can be further reduced.
In some embodiments, for each image in the cloud platform, each image is a target image if a ratio of a total capacity of all block storage instances mounted under a compute node created based on each image to a total capacity of all block storage instances in the cloud platform is greater than a first capacity ratio threshold. For example, the first capacity ratio threshold is 1%. Also for example, different first capacity ratio thresholds may be set according to actual traffic demands. By training the first machine learning model aiming at the mirror image with larger total storage capacity, the training cost can be reduced, and the training efficiency is improved.
In some embodiments, the cloud platform includes at least one target image, and the machine learning model includes a second machine learning model corresponding to each preset tenant when the preset tenant exists in the tenant to which the block storage instance mounted on the computing node created based on each target image. And the second machine learning model corresponding to each preset tenant is obtained by training according to the related data of the reference block storage instance of each preset tenant.
In some embodiments, predicting future performance values that are achieved by data in a target block storage instance over a future period using a machine learning model based on attribute information associated with the target block storage instance may be accomplished as follows.
First, in a case where an image used when a computing node mounted by a target block storage instance is created belongs to at least one target image, it is determined whether a tenant to which the target block storage instance belongs to a preset tenant.
And then, under the condition that the tenant to which the target block storage instance belongs to a preset tenant, predicting the future performance value of the data in the target block storage instance in the future period by using a second machine learning model corresponding to the tenant to which the target block storage instance belongs according to the attribute information related to the target block storage instance.
In the above embodiment, the relevant features (including I/O operation behavior features) of the block storage instances of different tenants under the same mirror image have a large difference, and the second machine learning model that is more consistent with the features of the block storage instances under the tenant granularity is trained on all or part of tenants, so that the accuracy of predicting the future performance value is further improved, the accuracy of determining the access frequency of data is further improved, the accuracy of physical resource allocation can be further improved, the physical resource utilization rate is further improved, and the physical resource waste is further reduced.
In some embodiments, for each tenant in the cloud platform, in a case that a ratio of a total capacity of all block storage instances of each tenant to a total capacity of all block storage instances mounted on a compute node to which each tenant belongs is greater than a second capacity ratio threshold, each tenant is a preset tenant. For example, the second capacity ratio threshold is 1%. For another example, different second capacity ratio thresholds may also be set according to actual traffic demands. The first machine learning model is trained for the tenants with larger total storage capacity under a certain mirror image, so that the training cost can be reduced, and the training efficiency is improved.
In some embodiments, taking the case that the machine learning model further includes one first machine learning model corresponding to each target image as an example, in the case that the tenant to which the target block storage instance belongs does not belong to the preset tenant, the future performance value of the data in the target block storage instance in the future period is predicted according to the attribute information related to the target block storage instance by using the first machine learning model corresponding to the image corresponding to the target block storage instance.
In some embodiments, the future performance value is a performance value associated with a read operation. For example, the performance value associated with a read operation is measured using at least one performance measure of a throughput associated with the read operation and a number of reads and writes Per Second iops (Input/Output Per Second) associated with the read operation. The performance values change over time, and the performance value achieved by the performance of a block storage instance is measured or measured using a performance metric. The performance associated with read operations includes read iops performance and read throughput performance. Read iops performance is measured in terms of the number of read requests that can be processed per second. Read throughput performance is measured in terms of the read throughput bandwidth that can be processed per second.
In the above embodiment, the access frequency of the data in the block storage instance by the read operation is closely related, and the performance value related to the read operation is adopted to judge the access frequency of the data, so that the accuracy of determining the access frequency of the data can be further improved, the accuracy of physical resource allocation can be further improved, the utilization rate of physical resources can be further improved, and the waste of physical resources can be further reduced.
In step S3, the access frequency of the data in the target block storage instance in the future cycle is determined based on the predicted future performance. The determined access frequency is used to allocate physical resources for the target block storage instance.
In some embodiments, the future cycle includes a plurality of future times, and the future performance includes a future performance value of at least one performance of the target block storage instance at each future time. The above step S3 can be implemented as follows.
First, a maximum performance value per performance of the target block storage instance is obtained.
Then, for each performance, a future performance peak corresponding to each performance is selected from a plurality of future performance values corresponding to a plurality of future times. The future performance peaks reflect the general performance of each performance over the future cycles. In some embodiments, the Nth largest future performance value of the plurality of future performance values of each performance at the future time is selected as the future performance peak of the performance. N is an integer greater than 0 and less than the total number of the plurality of future performance values. For example, for a plurality of future performance values of each performance in a future cycle, the performance values are arranged in a descending order, and the future performance value of the 95 th percent bit is selected as the future performance peak.
Finally, for the target block storage instance, a frequency of access of data in the target block storage instance in a future cycle is determined based on the maximum performance value of the at least one performance and the corresponding future performance peak.
In the above embodiment, the future performance peak may reflect a general situation of each performance in a future period, and the access frequency of the data is determined by the future performance peak and the maximum performance value, so that the universality and the representativeness of the determined access frequency of the data may be improved, and the influence of the extreme data on the accuracy of determining the access frequency of the data may be avoided, thereby further improving the accuracy of determining the access frequency of the data, further improving the accuracy of physical resource allocation, further improving the physical resource utilization rate, and further reducing the physical resource waste.
In some embodiments, determining how frequently data in the target block storage instance is accessed in future cycles based on the maximum performance value of the at least one performance and the corresponding future performance peak may be accomplished as follows.
First, for each performance, a ratio of the corresponding future performance peak to the maximum performance value is determined as a reference ratio.
Secondly, under the condition that the reference ratio corresponding to various performances is smaller than the first reference ratio threshold, the access frequency of the data in the target block storage instance in the future period is determined to be a first degree. For example, the first degree is a low degree.
Then, in the case that at least one of the reference ratio values corresponding to the performances is greater than or equal to the first reference ratio threshold value and the reference ratio values corresponding to the performances are all less than the second reference ratio threshold value, determining that the access frequency of the data in the target block storage instance in the future period is a second degree. The second reference ratio threshold is greater than the first reference ratio threshold. The second level is higher than the first level. In some embodiments, the second degree is a medium degree. For example, the first reference ratio threshold is 20%. For example, the second reference ratio threshold is 60%. For example, other first reference ratio threshold and second reference ratio threshold may be set according to actual requirements.
And finally, determining the access frequency of the data in the target block storage instance in the future period to be a third degree under the condition that the reference ratio corresponding to at least one performance is greater than or equal to the second reference ratio threshold. The third degree is higher than the second degree. For example, the third degree is a high degree.
In some embodiments, in a case where an image used when the computing node on which the target block storage instance is mounted is created does not belong to at least one target image, it is determined that the target block storage instance is accessed to a preset degree frequently in a future cycle. For example, for an image used when a computing node is created for which the total number of mounted block storage instances is less than a number threshold and the ratio of the total capacity of the mounted block storage instances to the total capacity of all block storage instances of the cloud platform is less than a first capacity ratio threshold, the image does not belong to a target image, and the access frequency of the mounted target block storage instances in a future period can be preset to a first degree.
In some embodiments, physical resources are allocated to a target block storage instance based on how frequently data in the target block storage instance is accessed in future cycles. The block storage instance points to the physical resources allocated to the block storage instance in the cloud platform, and stores data on the physical resources allocated to the block storage instance.
In some embodiments, in the case that the data in the target block storage instance is accessed frequently within the future period to a first extent, the data in the target block storage instance belongs to cold data within the future period, and the data in the target block storage instance is allocated a lower-performance physical resource.
In some embodiments, in the case that the data in the target block storage instance is accessed frequently to a second extent in a future period, the data in the target block storage instance belongs to the warm data in the future period, and the data in the target block storage instance is allocated a physical resource with moderate performance.
In some embodiments, in the case that the data in the target block storage instance is accessed frequently to a third extent in a future cycle, the data in the target block storage instance belongs to hot data in the future cycle, and a higher-performance physical resource is allocated to the data in the target block storage instance.
In some embodiments, for each target image, the basic machine learning model may be trained by using data related to a reference block storage instance installed under a computing node created based on each target image, so as to obtain the first machine learning model.
In some embodiments, for each preset tenant, the first machine learning model corresponding to the target image corresponding to each preset tenant may be trained by using the relevant data of the reference block storage instance of each preset tenant, so as to obtain a second machine learning model. By training the first machine learning model for the second time, the accuracy of determining the access frequency of the data in the block storage instance of the preset tenant can be further improved, so that the accuracy of physical resource allocation can be further improved, the utilization rate of physical resources can be further improved, and the waste of physical resources can be further reduced.
Fig. 2 is a block diagram illustrating an apparatus for determining access frequency of data according to some embodiments of the present disclosure.
As shown in fig. 2, the apparatus 2 for determining access frequency of data includes an obtaining module 21, a predicting module 22, and a determining module 23.
The obtaining module 21 is configured to obtain attribute information related to the target block storage instance in the cloud platform, for example, execute step S1 shown in fig. 1.
The prediction module 22 is configured to predict, using the machine learning model, a future performance that the target block storage instance will achieve in a future cycle based on the attribute information associated with the target block storage instance, for example, performing step S2 as shown in fig. 1.
The determining module 23 is configured to determine, according to the predicted future performance, a frequency of access of data in the target block storage instance in a future period, the determined frequency of access being used for allocating physical resources for the target block storage instance, for example, performing step S3 shown in fig. 1.
FIG. 3 is a block diagram illustrating an apparatus to determine how frequently data is accessed according to further embodiments of the present disclosure.
As shown in fig. 3, the apparatus 3 for determining the access frequency of data includes a memory 31; and a processor 32 coupled to the memory 31. The memory 31 is used for storing instructions for performing the corresponding embodiment of the method for determining the access frequency of data. The processor 32 is configured to perform a method of determining how frequently data is accessed in any of the embodiments of the present disclosure based on instructions stored in the memory 31.
FIG. 4 is a block diagram illustrating a computer system for implementing some embodiments of the present disclosure.
As shown in FIG. 4, computer system 40 may take the form of a general purpose computing device. Computer system 40 includes a memory 410, a processor 420, and a bus 400 that connects the various system components.
The memory 410 may include, for example, system memory, non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs. The system memory may include volatile storage media such as Random Access Memory (RAM) and/or cache memory. The non-volatile storage medium, for instance, stores instructions to perform corresponding embodiments of at least one of the methods of determining access frequency of data. Non-volatile storage media include, but are not limited to, magnetic disk storage, optical storage, flash memory, and the like.
Processor 420 may be implemented as discrete hardware components, such as general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gates or transistors, and the like. Accordingly, each of the modules such as the judging module and the determining module may be implemented by a Central Processing Unit (CPU) executing instructions in a memory to perform the corresponding steps, or may be implemented by a dedicated circuit to perform the corresponding steps.
Bus 400 may use any of a variety of bus architectures. For example, bus structures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, and Peripheral Component Interconnect (PCI) bus.
Computer system 40 may also include input output interface 430, network interface 440, storage interface 450, and the like. These interfaces 430, 440, 450 and the memory 410 and the processor 420 may be connected by a bus 400. The input/output interface 430 may provide a connection interface for input/output devices such as a display, a mouse, a keyboard, and the like. The network interface 440 provides a connection interface for various networking devices. The storage interface 450 provides a connection interface for external storage devices such as a floppy disk, a usb disk, and an SD card.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable apparatus to produce a machine, such that the execution of the instructions by the processor results in an apparatus that implements the functions specified in the flowchart and/or block diagram block or blocks.
These computer-readable program instructions may also be stored in a computer-readable memory that can direct a computer to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instructions which implement the function specified in the flowchart and/or block diagram block or blocks.
The present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.
By the method and the device for determining the access frequency of the data and the computer storage medium in the embodiment, the accuracy of determining the access frequency of the data in the cloud platform can be improved, so that the accuracy of physical resource allocation is improved, the physical resource utilization rate is improved, and the physical resource waste is reduced.
Thus, a method and apparatus, computer-readable storage medium, for determining access frequency of data according to the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

Claims (16)

1. A method of determining access frequency of data, comprising:
acquiring attribute information related to a target block storage instance in a cloud platform;
predicting future performance achieved by the target block storage instance in a future period by utilizing a machine learning model according to attribute information related to the target block storage instance;
determining, from the predicted future performance, a frequency of access of data in the target block storage instance over the future period, the determined frequency of access being used to allocate physical resources for the target block storage instance.
2. The method of determining access frequency of data according to claim 1, wherein the future cycle includes a plurality of future times, the future performance includes a future performance value of at least one performance of the target block storage instance at each future time, and determining the access frequency of the data in the target block storage instance within the future cycle based on the predicted future performance includes:
acquiring the maximum performance value of each performance of the target block storage instance;
for each performance, screening a future performance peak corresponding to each performance from a plurality of future performance values corresponding to the plurality of future moments, wherein the future performance peak reflects the general performance condition of each performance in the future cycle;
for the target block storage instance, determining access frequency of data in the target block storage instance in the future period according to the maximum performance value of the at least one performance and the corresponding future performance peak value.
3. The method of determining access frequency of data according to claim 2, wherein determining, for the target block storage instance, the access frequency of data in the target block storage instance in the future period according to the maximum performance value of the at least one performance and the corresponding future performance peak comprises:
for each performance, determining a ratio of a corresponding future performance peak value to a maximum performance value as a reference ratio;
determining the access frequency of the data in the target block storage instance in the future period to be a first degree under the condition that the reference ratio corresponding to each performance is smaller than a first reference ratio threshold;
determining the access frequency of the data in the target block storage instance in the future period to be a second degree under the condition that at least one reference ratio value corresponding to one performance is greater than or equal to the first reference ratio threshold and the reference ratio values corresponding to various performances are less than a second reference ratio threshold, wherein the second reference ratio threshold is greater than the first reference ratio threshold and the second degree is higher than the first degree;
determining that the data in the target block storage instance is accessed frequently in the future period to a third degree, where the reference ratio value corresponding to at least one performance is greater than or equal to the second reference ratio threshold, and the third degree is higher than the second degree.
4. The method for determining access frequency of data according to claim 1, wherein the cloud platform comprises at least one target image, the machine learning model comprises a first machine learning model corresponding to each target image, the first machine learning model corresponding to each target image is obtained by training according to relevant data of a reference block storage instance mounted by a computing node created based on each target image, and according to attribute information relevant to the target block storage instance, predicting a future performance value reached by the target block storage instance in a future period by using the machine learning model comprises:
and under the condition that the image used when the computing node mounted by the target block storage instance is created belongs to the at least one target image, predicting the future performance value reached by the target block storage instance in the future period by utilizing the first machine learning model corresponding to the image corresponding to the target block storage instance according to the attribute information related to the target block storage instance.
5. The method for determining access frequency of data according to claim 4, wherein for each image in the cloud platform, the each image is a target image if a ratio of a total capacity of all block storage instances mounted under a computing node created based on the each image to a total capacity of all block storage instances in the cloud platform is greater than a first capacity ratio threshold.
6. The method for determining the access frequency of data according to claim 1, wherein the cloud platform includes at least one target image, and in a case where a tenant to which a block storage instance installed under a compute node created based on each target image belongs has a preset tenant, the machine learning model includes a second machine learning model corresponding to each preset tenant, the second machine learning model corresponding to each preset tenant is obtained by training according to data related to a reference block storage instance of each preset tenant, and predicting a future performance value reached by data in the target block storage instance in a future cycle by using the machine learning model according to attribute information related to the target block storage instance includes:
determining whether the tenant to which the target block storage instance belongs to a preset tenant or not under the condition that the mirror used when the computing node mounted by the target block storage instance is created belongs to the at least one target mirror;
and under the condition that the tenant to which the target block storage instance belongs to a preset tenant, predicting a future performance value of the data in the target block storage instance in a future period by using a second machine learning model corresponding to the tenant to which the target block storage instance belongs according to the attribute information related to the target block storage instance.
7. The method for determining access frequency of data according to claim 6, wherein for each tenant in the cloud platform, the each tenant is a preset tenant if a ratio of a total capacity of all block storage instances of the each tenant to a total capacity of all block storage instances mounted on a computing node to which the each tenant belongs is greater than a second capacity ratio threshold.
8. The method of determining access frequency of data according to claim 6, wherein the machine learning model further comprises a first machine learning model corresponding to each target image, the first machine learning model corresponding to each target image is obtained by training according to the related data of the reference block storage instance mounted by the computing node corresponding to each target image, and predicting the future performance value of the data in the target block storage instance in the future period by using the machine learning model according to the attribute information related to the target block storage instance further comprises:
under the condition that the tenant to which the target block storage instance belongs does not belong to a preset tenant, according to attribute information related to the target block storage instance, a future performance value of data in the target block storage instance in a future period is predicted by using a first machine learning model corresponding to a mirror image corresponding to the target block storage instance.
9. The method of determining access frequency of data according to claim 4, further comprising:
and under the condition that the image used when the computing node mounted by the target block storage instance is created does not belong to the at least one target image, determining the access frequency of the target block storage instance in the future period to be a preset degree.
10. The method of determining access frequency of data according to claim 1, wherein a cycle duration of the future period is a duration required for the target block storage instance to migrate an amount of data equal to a capacity of the target block storage instance.
11. The method of determining access frequency of data of claim 1, wherein the future performance value is a performance value associated with a read operation; and/or
And the performance value related to the read operation is measured by adopting at least one performance measurement index of throughput related to the read operation and reading and writing times per second iops related to the read operation.
12. The method for determining the access frequency of the data according to any one of claims 4 to 9, wherein the related data of the reference block storage instance comprises the attribute information of the reference block storage instance, the attribute information of the computing node mounted by the reference block storage instance and the historical performance value of the data in the reference block storage instance in the historical time; and/or
The attribute information related to the target block storage instance comprises attribute information of the target block storage instance and attribute information of the computing node mounted by the target block storage instance.
13. The method of determining access frequency of data according to claim 8, further comprising:
aiming at each target mirror image, training a basic machine learning model by utilizing related data of a reference block storage example mounted under a computing node created based on each target mirror image to obtain a first machine learning model; and/or
And aiming at each preset tenant, training a first machine learning model corresponding to a target mirror image corresponding to each preset tenant by using the relevant data of the reference block storage instance of each preset tenant, and obtaining a second machine learning model.
14. An apparatus for determining access frequency of data, comprising:
an acquisition module configured to acquire attribute information related to a target block storage instance in a cloud platform;
a prediction module configured to predict, using a machine learning model, a future performance that the target block storage instance achieves over a future period based on attribute information associated with the target block storage instance;
a determination module configured to determine, based on the predicted future performance, a frequency of access of data in the target block storage instance in the future period, the determined frequency of access being used to allocate physical resources for the target block storage instance.
15. An apparatus for determining access frequency of data, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the method of determining access frequency of data according to any one of claims 1 to 13 based on instructions stored in the memory.
16. A computer-storable medium having stored thereon computer program instructions which, when executed by a processor, implement a method of determining a frequency of access to data as claimed in any one of claims 1 to 13.
CN202210231567.4A 2022-03-09 2022-03-09 Method and device for determining access frequency of data Pending CN114595063A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210231567.4A CN114595063A (en) 2022-03-09 2022-03-09 Method and device for determining access frequency of data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210231567.4A CN114595063A (en) 2022-03-09 2022-03-09 Method and device for determining access frequency of data

Publications (1)

Publication Number Publication Date
CN114595063A true CN114595063A (en) 2022-06-07

Family

ID=81808980

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210231567.4A Pending CN114595063A (en) 2022-03-09 2022-03-09 Method and device for determining access frequency of data

Country Status (1)

Country Link
CN (1) CN114595063A (en)

Similar Documents

Publication Publication Date Title
US10649662B2 (en) Methods and apparatus to manage workload memory allocation
CN107239339B (en) System performance optimization parameter determination method, system performance optimization method and device
US20080134191A1 (en) Methods and apparatuses for core allocations
CN111316220A (en) Performance counters for computer memory
CN108205469B (en) MapReduce-based resource allocation method and server
EP3794461B1 (en) Automatic database query load assessment and adaptive handling
CN110209502B (en) Information storage method and device, electronic equipment and storage medium
US6647349B1 (en) Apparatus, method and system for counting logic events, determining logic event histograms and for identifying a logic event in a logic environment
CN108762885B (en) Virtual machine creating method and device, management equipment and terminal equipment
JPWO2014208139A1 (en) Abnormality detection apparatus, control method, and program
US20170337083A1 (en) Dynamic tuning of multiprocessor/multicore computing systems
CN115269108A (en) Data processing method, device and equipment
US6564175B1 (en) Apparatus, method and system for determining application runtimes based on histogram or distribution information
US20110191094A1 (en) System and method to evaluate and size relative system performance
US10587527B1 (en) Systems and methods for apportioning bandwidth in storage systems
CN112947851A (en) NUMA system and page migration method in NUMA system
CN114595063A (en) Method and device for determining access frequency of data
US10019341B2 (en) Using hardware performance counters to detect stale memory objects
CN113238974A (en) Bus bandwidth efficiency statistical method, device, equipment and medium
CN109686396B (en) Performance evaluation device and performance evaluation method
Kim et al. Measuring the optimality of Hadoop optimization
US20230401089A1 (en) Credit-based scheduling using load prediction
CN111913650B (en) Method and device for determining prediction window period
CN111767137B (en) System deployment method and device, electronic equipment and storage medium
Jeon et al. Runtime memory controller profiling with performance analysis for DRAM memory controllers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination