CN108762684B

CN108762684B - Hot spot data migration flow control method and device, electronic equipment and storage medium

Info

Publication number: CN108762684B
Application number: CN201810565747.XA
Authority: CN
Inventors: 陈学伟
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-06-04
Filing date: 2018-06-04
Publication date: 2021-03-05
Anticipated expiration: 2038-06-04
Also published as: WO2019232925A1; CN108762684A

Abstract

A hot spot data migration flow control method comprises the following steps: recording a data set accessed by a user at intervals of a preset time period; dividing the data set into a plurality of data blocks; judging whether a data block in the plurality of data blocks is hot data or not; when determining that the data block is hot data, judging whether the data block determined as the hot data is written into a cache or not; when the data block determined as the hot point data is not written into the cache, acquiring a flow control threshold corresponding to the current statistical period in the migration period; and writing the data blocks determined as the hot point data into a cache based on the flow control threshold corresponding to the current statistical period. The invention also provides a hot spot data migration flow control device, electronic equipment and a storage medium. The invention can write the hot data into the cache, save the time for reading the hot data, avoid obvious impact on normal input and output service performance and have good flow control effect.

Description

Hot spot data migration flow control method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a hot spot data migration flow control method and device, electronic equipment and a storage medium.

Background

The cache is a buffer area for data exchange, when a certain hardware, such as a CPU, needs to read data, the required data is first searched from the cache, and if found, the cache is directly executed, and if not found, the cache is found from a memory. The cache runs much faster than the memory, so the cache serves to help the hardware run faster.

However, since the cache is only a copy of a small portion of the data in the memory, the hardware cannot find the data when looking for the data in the cache (because the data is not copied from the memory to the cache), and the hardware finds the data in the memory at this time, so that the operating speed of the whole system is slowed down.

The hot data are data which are frequently needed by hardware, and are stored in the cache in advance, so that the hot data can be directly acquired from the cache when the hardware calls the hot data, and the data acquisition time is saved.

However, in the process of storing hot spot data into the cache, a large amount of Input/Output (IO) may be generated, and if the IO peak of the user application is right at this time, the response time of the user application may be affected, and bad experience may be brought to the user.

Disclosure of Invention

In view of the above, it is necessary to provide a method, an apparatus, an electronic device, and a storage medium for hot spot data migration flow control, which can write hot spot data into a cache, save time for reading hot spot data, avoid significant impact on normal input/output service performance, and have a good flow control effect.

The first aspect of the present invention provides a hot spot data migration flow control method, where the method includes:

recording a data set accessed by a user at intervals of a preset time period;

dividing the data set into a plurality of data blocks;

judging whether a data block in the plurality of data blocks is hot data or not;

when determining that the data block is hot data, judging whether the data block determined as the hot data is written into a cache or not;

when the data block determined as the hot point data is not written into the cache, acquiring a flow control threshold corresponding to the current statistical period in the migration period;

and writing the data blocks determined as the hot point data into a cache based on the flow control threshold corresponding to the current statistical period.

Preferably, dividing the data set into a plurality of data blocks comprises:

averagely dividing the data set into a preset number of data blocks; or

Randomly dividing the data set into a preset number of data blocks; or

And dividing the data set into a plurality of data blocks according to a preset size.

Preferably, the determining whether a data block in the plurality of data blocks is hot data is performed by calculating a probability value of the data block being accessed, and predicting whether the data block is hot data based on the probability value includes: counting the number of times each data block is accessed in the preset time period; calculating the probability value of each data block accessed in the preset time period based on the number of times each data block is accessed in the preset time period; judging whether the probability value of the accessed data block is greater than a preset probability value or not; when the probability value of the accessed data block is judged to be larger than the preset probability value, determining the data block corresponding to the accessed probability value larger than the preset probability value as the hot data; and when the probability value of the accessed data block is judged to be smaller than or equal to the preset probability value, determining the data block corresponding to the accessed probability value smaller than or equal to the preset probability value as non-hotspot data.

Preferably, the obtaining of the flow control threshold corresponding to the current statistical period in the migration period includes:

judging whether the current statistical period is the first statistical period or not;

when the current statistical period is determined to be the first statistical period, determining a preset flow control threshold as a flow control threshold corresponding to the current statistical period;

and when the current statistical period is determined not to be the first statistical period, obtaining the IO load applied by the user in the last statistical period, and determining the flow control threshold corresponding to the current statistical period according to the IO load applied by the user in the last statistical period.

Preferably, determining the flow control threshold corresponding to the current statistical period according to the IO load applied by the user in the previous statistical period includes:

acquiring the data block size of each IO applied by a user in the previous statistical period, and calculating the average data block size of the IO in the previous statistical period;

acquiring the transmission delay of each data block in the previous statistical period, and calculating the average IO data block delay in the previous statistical period;

acquiring a preset reference value of the size of an IO data block and a reference value of corresponding data block time delay;

calculating the IO load intensity in the last statistical period according to the average data block size, the average data block time delay, the reference value of the data block size and the reference value of the corresponding data block time delay of the IO in the last statistical period;

determining the IO load category in the last statistical period by utilizing a pre-trained load classification model according to the IO load intensity in the last statistical period;

and calculating the flow control threshold corresponding to the current statistical period according to the IO load category in the last statistical period.

Preferably, the calculation formula for calculating the IO load intensity in the previous statistical period according to the average data block size, the average data block delay, the reference value of the data block size, and the reference value of the corresponding data block delay of the IO in the previous statistical period is as follows:

wherein, X is the average data block size of the IO in the previous statistical period, Y is the average data block delay, M is a reference value of the data block size, and N is a reference value of the corresponding data block delay.

Preferably, the calculating the flow control threshold corresponding to the current statistical period according to the IO load class in the previous statistical period includes:

when the IO load category in the previous statistical period is a high load category, reducing the flow control threshold corresponding to the previous statistical period by a first preset amplitude to obtain the flow control threshold corresponding to the current statistical period;

when the IO load category in the previous statistical period is a low load category, increasing the flow control threshold corresponding to the previous statistical period by a second preset amplitude to obtain the flow control threshold corresponding to the current statistical period;

and when the IO load category in the last statistical period is a normal load category, taking the flow control threshold corresponding to the last statistical period as the flow control threshold corresponding to the current statistical period.

A second aspect of the present invention provides a hot spot data migration flow control apparatus, including:

the recording module is used for recording a data set accessed by a user every other preset time period;

a dividing module for dividing the data set into a plurality of data blocks;

the judging module is used for judging whether a data block in the plurality of data blocks is hot data;

the judging module is further used for judging whether the data block determined to be the hot spot data is written into the cache or not when the data block is determined to be the hot spot data;

the obtaining module is used for obtaining a flow control threshold corresponding to the current statistical period in the migration period when the judging module judges that the data block determined as the hot-point data is not written into the cache;

and the migration module is used for writing the data block determined as the hot point data into a cache based on the flow control threshold corresponding to the current statistical period.

A third aspect of the present invention provides an electronic device, which includes a processor and a memory, where the processor is configured to implement the hot spot data migration flow control method when executing a computer program stored in the memory.

A fourth aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the hot spot data migration flow control method.

According to the hot spot data migration flow control method, the device, the electronic equipment and the storage medium, the data set accessed by the user is recorded at intervals of a preset time period, the data set is divided into a plurality of data blocks, when the data blocks are determined to be hot spot data and not written into the cache, the data blocks determined to be hot spot data are written into the cache by acquiring flow control thresholds corresponding to different statistical periods in a migration period and based on the flow control threshold corresponding to each statistical period, so that the efficiency of migrating the user data to the cache is improved, the risk of data loss is reduced, meanwhile, obvious impact on normal input and output service performance can be avoided, and the method and the device have a good effect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a hot spot data migration flow control method according to an embodiment of the present invention.

Fig. 2 is a flowchart of a method for determining a flow control threshold corresponding to a current statistical period according to an IO load applied by a user in a previous statistical period according to a second embodiment of the present invention.

Fig. 3 is a functional block diagram of a hot spot data migration flow control apparatus according to a third embodiment of the present invention.

Fig. 4 is a schematic diagram of an electronic device according to a fourth embodiment of the present invention.

The following detailed description will further illustrate the invention in conjunction with the above-described figures.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.

In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention, and the described embodiments are merely a subset of the embodiments of the present invention, rather than a complete embodiment. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The hot spot data migration flow control method provided by the embodiment of the invention is applied to one or more electronic devices. The hot spot data migration flow control method can also be applied to a hardware environment formed by electronic equipment and a server connected with the electronic equipment through a network. Networks include, but are not limited to: a wide area network, a metropolitan area network, or a local area network. The hot spot data migration flow control method can be executed by a server or an electronic device; or may be performed by both the server and the electronic device.

For the electronic device which needs to perform the hot spot data migration flow control method, the hot spot data migration flow control function provided by the method of the present invention can be directly integrated on the electronic device, or a client for implementing the method of the present invention is installed. For another example, the method provided by the present invention may also be run on a device such as a server in the form of a Software Development Kit (SDK), and an interface providing a flow control function for the migration of hot data is provided in the form of an SDK, so that an electronic device or other devices may implement the function of performing flow control on a background write disk through the provided interface.

Example one

Fig. 1 is a flowchart of a hot spot data migration flow control method according to an embodiment of the present invention. The execution sequence in the flowchart may be changed and some steps may be omitted according to different requirements.

And S11, recording the data set accessed by the user every preset time period.

The preset time period is a preset time period, for example, one week or 10 days or the like. The preset time period is not particularly limited, and can be set according to the hardware of the electronic system or the data access condition.

When the electronic equipment detects the instruction of the user for accessing the data, the electronic equipment responds to the instruction of the user for accessing the data and feeds back the data accessed by the user to the user. And recording the data sets accessed by all users in the preset time period.

And S12, dividing the data set into a plurality of data blocks.

The recorded data set accessed by the user is divided into a plurality of data blocks.

In a preferred embodiment of the invention, the dividing of the data set into a plurality of data blocks may comprise a combination of one or more of:

1) the data set is divided into a preset number of data blocks on average.

The preset number is the number of preset data blocks, for example, the data set is divided into 10 data blocks on average, and the size of each data block is the same.

2) The data set is randomly divided into a preset number of data blocks.

For example, the data set is randomly divided into 10 data blocks, each of which has a different size.

3) And dividing the data set into a plurality of data blocks according to a preset size.

The preset size is a size of a preset data block, for example, the data set is divided into a plurality of data blocks, and each data block has a size of 1 Mb. The preset size may also be 10Mb or greater.

And S13, judging whether any data block in the plurality of data blocks is hot data.

In a preferred embodiment of the present invention, determining whether any of the plurality of data blocks is hot data may be performed by calculating a probability value of the data block being accessed, and predicting whether the data block is hot data based on the probability value.

The determining whether a data block in the plurality of data blocks is hot data may specifically include:

1) counting the number of times each data block is accessed in the preset time period;

2) calculating the probability value of each data block accessed in the preset time period based on the number of times each data block is accessed in the preset time period;

3) judging whether the probability value of the accessed data block is greater than a preset probability value or not;

4) when the probability value of the accessed data block is judged to be larger than the preset probability value, determining the data block corresponding to the accessed probability value larger than the preset probability value as the hot data; and when the probability value of the accessed data block is judged to be smaller than or equal to the preset probability value, determining the data block corresponding to the accessed probability value smaller than or equal to the preset probability value as non-hotspot data.

For example, if the preset time period is one week, the data set accessed by the user in the week is divided into 20 data blocks, including data block 1, data block 2, data block 3, data block 4, data block 5, data block 6, data block 7, data block 8, data block 9, data block 10, data block 11, data block 12, data block 13, data block 14, data block 15, data block 16, data block 17, data block 18, data block 19, and data block 20. Wherein data block 1 was accessed 10 times a week, data block 2 was accessed 5 times a week, data block 3 was accessed 8 times a week, data block 4 was accessed 20 times a week, data block 5 was accessed 50 times a week, data block 6 was accessed 3 times a week, data block 7 was accessed 20 times a week, data block 8 was accessed 40 times a week, data block 9 was accessed 1 time a week, data block 10 was accessed 5 times a week, data block 11 was accessed 9 times a week, data block 12 was accessed 11 times a week, data block 13 was accessed 10 times a week, data block 14 was accessed 12 times a week, data block 15 was accessed 20 times a week, data block 16 was accessed 30 times a week, data block 17 was accessed 14 times a week, data block 3 was accessed 3 times a week, data block 7 was accessed 20 times a week, data block 14 was accessed, Data block 18 was accessed 0 times a week, data block 19 was accessed 2 times a week, and data block 20 was accessed 50 times a week. The formula for calculating the probability value of each data block being accessed is as follows:

wherein, X_iIndicates the number of times the ith data block is accessed in a week, P_iThe probability that the ith data block is accessed within one week. The probability value of the data block 1 being accessed can thus be calculated as follows:

similarly, a probability value P of the data block 2 being accessed may be calculated₂1.56%, block of dataProbability value P of visited₃The probability value of the access of other data blocks is not described in detail, such as 2.5%.

In a preferred embodiment of the present invention, the preset threshold may be, for example, 20%, so that the data included in the data block with the access probability value greater than 20% may be regarded as the hot data.

When it is determined that there is a data block as hot data, performing step S14; when it is determined that no data block is hot data, the step S11 may be executed in return.

S14, it is determined whether the data block determined as the hot-point data is written in the cache.

When the data block determined as the hot-point data is successfully hit in the cache, the data block determined as the hot-point data is written into the cache; when the data block determined to be the hot-point data is not hit in the cache, it indicates that the data block determined to be the hot-point data is not written in the cache.

When it is judged that the data block determined as the hot-point data is written in the cache, the flow may be directly ended; when it is judged that the data block determined as the hot point data is not written in the cache, step S15 is executed.

And S15, acquiring a flow control threshold corresponding to the current statistical period in the migration period.

The whole process from the beginning of writing into the cache to the completion of writing of the data block which is determined to be the hot data and is not written into the cache is called a migration period. One transition period may be divided into a plurality of statistical periods, and one statistical period may be a preset time period, for example, one statistical period is set to 1 second.

The flow control refers to flow control. The flow control method comprises the following two methods: one is to realize the flow control based on the source address, the destination address, the source port, the destination port and the protocol type through the QoS module of the router and the switch; and the other method realizes the flow control based on the application layer through a professional flow control device.

In this preferred embodiment, the acquiring a flow control threshold corresponding to the current statistical period in the migration period may specifically include:

1) and judging whether the current statistical period is the first statistical period.

Whether the current migration period is the first statistical period can be judged by judging whether the current time is the 1 st second.

2) When the current statistical period is determined to be the first statistical period, determining a preset flow control threshold as a flow control threshold corresponding to the current statistical period;

the flow control threshold corresponding to the first statistical period in the migration period is a preset flow control threshold, and can be preset by a system manager according to experience. Namely, a preset flow control threshold is adopted as the flow control threshold of the first statistical period in the migration period.

3) And when the current statistical period is determined not to be the first statistical period, obtaining the IO load applied by the user in the last statistical period, and determining the flow control threshold corresponding to the current statistical period according to the IO load applied by the user in the last statistical period.

Each of the remaining statistical periods within the migration period, except the first statistical period, may correspond to a flow control threshold. The flow control threshold corresponding to each of the remaining statistical periods is dynamically adjusted, the flow control threshold corresponding to the current statistical period may be calculated according to the IO load in the previous statistical period, and the flow control threshold corresponding to the next statistical period may be calculated according to the IO load in the current statistical period. Specifically, a flow control threshold corresponding to a second statistical period is calculated according to the IO load in the first statistical period; calculating a flow control threshold corresponding to the third statistical period according to the IO load in the second statistical period; and so on.

The specific process of determining the flow control threshold corresponding to the current statistical period according to the IO load applied by the user in the previous statistical period may refer to fig. 2 and the corresponding description thereof.

And S16, writing the data block determined as the hot point data into a cache based on the flow control threshold corresponding to the current statistical period.

And writing the data block determined as the hot point data into the cache according to the flow control threshold corresponding to the current statistical period, and writing the data into the data block determined as the hot point data at the flow controlled by the current statistical period, so that the hot point data written into the cache is not too fast or too slow, thereby avoiding obvious impact on normal input and output service performance, and providing the hot point data written into the cache for a user to access.

Example two

S21, obtaining the data block size of each IO applied by the user in the previous statistical period, and calculating the average data block size of the IO in the previous statistical period.

The average data block size of the IO in the last statistical period may be calculated by using an arithmetic mean algorithm, a geometric mean algorithm, or a root mean square mean algorithm.

For example, suppose that ten IO times are detected in the last statistical period, and the data block sizes of the ten IO times are: 2M, 1M, 3M, 0.5M, 10M, 4M, 0.1M, 1.2M, 5M and 8M. Calculating the average data block size of the IO in the last statistical period by using the arithmetic mean algorithm as follows:

and S22, acquiring the transmission delay of each data block in the previous statistical period, and calculating the average data block delay of the IO in the previous statistical period.

The transmission delay (referred to as delay for short) refers to the time required for a node to enter a data block from the node to a transmission medium when the node transmits data, that is, the total time required by a transmitting station from the start of transmitting a data frame to the completion of transmitting the data frame, or the total time required by a receiving station from the start of receiving the data frame to the completion of receiving the data frame.

In a preferred embodiment of the present invention, the transmission delay of the data block may be obtained from a load measurement tool or a performance monitoring tool installed in each storage node.

As described above, the average data block delay of the IO in the last statistical period may also be calculated by using an arithmetic mean algorithm, a geometric mean algorithm, or a root mean square mean algorithm. Suppose, in the last statistical period, the transmission delays of ten IO are: 1s, 0.8s, 1.5s, 0.4s, 5s, 2s, 0.02s, 0.6s, 3s, and 4.5s, when the IO average data block delay in the previous statistical period is calculated by using an arithmetic mean algorithm, the result is:

(1s+0.8s+1.5s+0.4s+5s+2s+0.1s+0.6s+3s+4.4s)＝1.88s。

it should be understood that, if the average data block size of the IO in the previous statistical period is calculated by using an arithmetic mean algorithm, the average data block delay of the IO in the previous statistical period is also calculated by using the arithmetic mean algorithm; if the average data block size of the IO in the previous statistical period is calculated by adopting a geometric mean algorithm, calculating the average data block time delay of the IO in the previous statistical period by adopting the geometric mean algorithm; or if the average data block size of the IO in the previous statistical period is calculated by using the root mean square average algorithm, the average data block delay of the IO in the previous statistical period is also calculated by using the root mean square average algorithm.

And S23, acquiring a preset reference value of the IO data block size and a corresponding reference value of the data block time delay.

In a preferred embodiment of the present invention, the reference value of the IO data block size and the reference value of the corresponding data block delay may be preset by an administrator of the storage system according to experience. For example, according to experience, when a 4K data block is transmitted, the delay is minimum, and may reach 50ms in an ideal state, then the reference value of the IO data block size may be set to 4K, and the reference value of the corresponding data block delay may be set to 50 ms.

And S24, calculating the IO load intensity in the last statistical period according to the average data block size, the average data block time delay, the reference value of the data block size and the reference value of the corresponding data block time delay of the IO in the last statistical period.

For example, assuming that the average data block size of the IO in the previous statistical period is X, the average data block delay is Y, the reference value of the data block size is M, and the reference value of the corresponding data block delay is N, the calculation formula of the IO load intensity in the previous statistical period is as follows:

and S25, determining the IO load category in the last statistical period by using a pre-trained load classification model according to the IO load strength in the last statistical period.

In a preferred embodiment of the present invention, the IO load categories include: high load class, normal load class, low load class.

Preferably, the load classification model includes, but is not limited to: support Vector Machine (SVM) models. And taking the average data block size of the IO in the last statistical period, the average data block time delay of the IO in the last statistical period and the IO load intensity in the last statistical period as the input of the load classification model, and outputting the IO load category in the last statistical period after calculation of the load classification model.

In a preferred embodiment of the present invention, the training process of the load classification model includes:

1) and obtaining the IO load data of the positive sample and the IO load data of the negative sample, and labeling the load class of the IO load data of the positive sample so as to enable the IO load data of the positive sample to carry the IO load class label.

For example, 500 pieces of IO load data corresponding to a high load category, a normal load category, and a low load category are respectively selected, and each piece of IO load data is labeled with a category, "1" may be used as an IO data tag of a high load, "2" may be used as an IO data tag of a normal load, and "3" may be used as an IO data tag of a low load.

2) And randomly dividing the IO load data of the positive sample and the IO load data of the negative sample into a training set with a first preset proportion and a verification set with a second preset proportion, training the load classification model by using the training set, and verifying the accuracy of the trained load classification model by using the verification set.

The training samples in the training sets of different load classes are distributed to different folders. For example, training samples of a high load category are distributed into a first folder, training samples of a normal load category are distributed into a second folder, and training samples of a low load category are distributed into a third folder. Then, training samples with a first preset proportion (for example, 70%) are respectively extracted from different folders and used as total training samples to perform training of the load classification model, and training samples with a remaining second preset proportion (for example, 30%) are respectively extracted from different folders and used as total test samples to perform accuracy verification on the trained load classification model.

3) If the accuracy is greater than or equal to a preset accuracy, ending the training, and identifying the IO load category in the current statistical period by taking the trained load classification model as a classifier; and if the accuracy is smaller than the preset accuracy, increasing the number of positive samples and the number of negative samples to retrain the load classification model until the accuracy is larger than or equal to the preset accuracy.

And S26, calculating the flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period.

Specifically, the calculating a flow control threshold corresponding to the current statistical period according to the IO load class in the previous statistical period may include:

1) and when the IO load category in the last statistical period is a high load category, reducing the flow control threshold corresponding to the last statistical period by a first preset amplitude to obtain the flow control threshold corresponding to the current statistical period.

And when the IO load in the last statistical period is high load, reducing the flow control threshold according to the first preset amplitude, so as to perform the operation of writing into the cache on the data block determined as the hot point data in the current statistical period by using the low flow control threshold, and ensuring the high-efficiency access of the user application by reducing the data migration speed.

In a preferred embodiment of the present invention, the first preset amplitude may be 1/2 of the flow control threshold corresponding to the previous statistical period. That is, the flow control threshold corresponding to the current statistical period is 1/2 of the flow control threshold corresponding to the previous statistical period, and the flow control threshold corresponding to the next statistical period is 1/2 of the flow control threshold corresponding to the current statistical period.

2) And when the IO load category in the last statistical period is a low load category, increasing the flow control threshold corresponding to the last statistical period by a second preset amplitude to obtain the flow control threshold corresponding to the next statistical period.

And when the IO load in the last statistical period is low, increasing the flow control threshold according to the second preset amplitude, so as to perform the operation of writing the data block determined as the hot point data into the cache by using the high flow control threshold in the current statistical period, and increase the data migration speed on the basis of ensuring the access quality of the user application.

In a preferred embodiment of the present invention, the second preset amplitude may be 1.5 times of a flow control threshold corresponding to a previous statistical period. That is, the flow control threshold corresponding to the current statistical period is 1.5 times of the flow control threshold corresponding to the previous statistical period, and the flow control threshold corresponding to the next statistical period is 1.5 times of the flow control threshold corresponding to the current statistical period.

3) And when the IO load category in the last statistical period is a normal load category, taking the flow control threshold corresponding to the last statistical period as the flow control threshold corresponding to the current statistical period.

In summary, the hot spot data migration flow control method according to the present invention records a data set accessed by a user every a preset time period, divides the data set into a plurality of data blocks, and when it is determined that a data block is hot spot data and is not written into a cache, writes the data block determined as hot spot data into the cache by obtaining flow control thresholds corresponding to different statistical periods in a migration period and based on the flow control threshold corresponding to each statistical period, so as to improve efficiency of migrating user data into the cache, reduce a risk of data loss, avoid causing significant impact on normal input and output service performance, and have a good flow control effect.

And secondly, the flow control threshold corresponding to the current statistical period is automatically and dynamically adjusted according to the IO load applied by the user in the previous statistical period without manual adjustment of a manager, so that the workload of the manager is reduced, and the problem of inaccurate adjustment caused by subjective factors of the manager is solved.

The above description is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and it will be apparent to those skilled in the art that modifications may be made without departing from the inventive concept of the present invention, and these modifications are within the scope of the present invention.

With reference to fig. 3 to 4, a functional module and a hardware structure of an electronic device for implementing the hot spot data migration flow control method are respectively described below.

EXAMPLE III

Fig. 3 is a functional block diagram of a hot spot data migration flow control apparatus according to a preferred embodiment of the present invention.

In some embodiments, the hot spot data migration flow control apparatus 30 is operated in an electronic device. The hot spot data migration flow control device 30 may include a plurality of functional modules composed of program code segments. The program codes of the respective program segments of the hot spot data migration flow control apparatus 30 may be stored in a memory and executed by at least one processor to execute (see fig. 1-2 and the related description thereof for details) the hot spot data migration flow control method.

In this embodiment, the hot spot data migration flow control device 30 may be divided into a plurality of functional modules according to the functions executed by the hot spot data migration flow control device. The functional module may include: the system comprises a recording module 301, a dividing module 302, a judging module 303, an obtaining module 304, a transferring module 305, a calculating module 306, a determining module 307 and a training module 308. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory. In some embodiments, the functionality of the modules will be described in greater detail in subsequent embodiments.

A recording module 301, configured to record a data set accessed by a user every preset time period.

When the electronic equipment detects the instruction of the user for accessing the data, the electronic equipment responds to the instruction of the user for accessing the data and feeds back the data accessed by the user to the user. The recording module 301 records the data sets accessed by all users within the preset time period.

A dividing module 302, configured to divide the data set into a plurality of data blocks.

In a preferred embodiment of the present invention, the dividing module 302 divides the data set into a plurality of data blocks, which may include one or more of the following combinations:

1) the data set is divided into a preset number of data blocks on average.

2) The data set is randomly divided into a preset number of data blocks.

The determining module 303 is configured to determine whether a data block in the multiple data blocks is hot data.

In a preferred embodiment of the present invention, the determining module 303 determines whether any data block in the plurality of data blocks is hot data by calculating a probability value of the data block being accessed, and predicts whether the data block is hot data based on the probability value.

The determining module 303 may specifically determine whether a data block in the multiple data blocks is hot data, and the determining may include:

For example, if the preset time period is one week, the data set accessed by the user in the week is divided into 20 data blocks, including data block 1, data block 2, data block 3, data block 4, data block 5, data block 6, data block 7, data block 8, data block 9, data block 10, data block 11, data block 12, data block 13, data block 14, data block 15, data block 16, data block 17, data block 18, data block 19, and data block 20. Wherein data block 1 was accessed 10 times a week, data block 2 was accessed 5 times a week, data block 3 was accessed 8 times a week, data block 4 was accessed 20 times a week, and data block 5 was accessed 50 times a weekData block 6 was accessed 3 times a week, data block 7 was accessed 20 times a week, data block 8 was accessed 40 times a week, data block 9 was accessed 1 time a week, data block 10 was accessed 5 times a week, data block 11 was accessed 9 times a week, data block 12 was accessed 11 times a week, data block 13 was accessed 10 times a week, data block 14 was accessed 12 times a week, data block 15 was accessed 20 times a week, data block 16 was accessed 30 times a week, data block 17 was accessed 14 times a week, data block 18 was accessed 0 times a week, data block 19 was accessed 2 times a week, and data block 20 was accessed 50 times a week. The formula for calculating the probability value of each data block being accessed is as follows:

similarly, a probability value P of the data block 2 being accessed may be calculated₂Probability value P for 1.56% that data block 3 was accessed₃The probability value of the access of other data blocks is not described in detail, such as 2.5%.

The determining module 303 is further configured to determine whether the data block determined as hot spot data is written into the cache when it is determined that there is a data block as hot spot data.

An obtaining module 304, configured to obtain a flow control threshold corresponding to a current statistical period in a migration period when the determining module 303 determines that the data block determined as the hot-point data is not written in the cache.

In this preferred embodiment, the acquiring module 304 may specifically acquire the flow control threshold corresponding to the current statistical period in the migration period, including:

The migration module 305 is configured to write the data block determined as the hot-point data into the cache based on the flow control threshold corresponding to the current statistical period.

And writing the data blocks determined as the hot spot data into the cache according to the flow control threshold corresponding to the current statistical period, and writing the data into the data blocks determined as the hot spot data at the flow controlled by the current statistical period, so that the hot spot data written into the cache is not too fast or too slow, and the hot spot data written into the cache can be accessed by a user.

The obtaining module 304 is further configured to obtain a data block size of each IO applied by the user in a previous statistical period, and calculate an average data block size of the IO in the previous statistical period.

the obtaining module 304 is further configured to obtain a transmission delay of each data block in the previous statistical period, and calculate an average data block delay of the IO in the previous statistical period.

(1s+0.8s+1.5s+0.4s+5s+2s+0.1s+0.6s+3s+4.4s)＝1.88s。

The obtaining module 304 is further configured to obtain a preset reference value of the size of the IO data block and a corresponding reference value of the data block delay.

A calculating module 306, configured to calculate the IO load intensity in the last statistical period according to the average data block size, the average data block delay, the reference value of the data block size, and the reference value of the corresponding data block delay of the IO in the last statistical period.

and a determining module 307, configured to determine, according to the IO load strength in the previous statistical period, an IO load category in the previous statistical period by using a pre-trained load classification model.

And a training module 308 for training the load classification model.

The process of the training module 308 training the load classification model includes:

The calculating module 306 is further configured to calculate a flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period.

Specifically, the calculating module 306 may calculate the flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period, where:

In summary, the hot spot data migration flow control device according to the present invention records a data set accessed by a user every a preset time period, divides the data set into a plurality of data blocks, when it is determined that a data block is hot spot data and is not written into a cache, writes the data block determined as hot spot data into the cache by obtaining flow control thresholds corresponding to different statistical periods in a migration period and based on the flow control threshold corresponding to each statistical period, so as to improve efficiency of migrating user data into the cache, reduce a risk of data loss, avoid causing significant impact on normal input and output service performance, and have a good flow control effect.

The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a dual-screen device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.

Example four

The electronic device 4 includes: a memory 41, at least one processor 42, a computer program 43 stored in said memory 41 and executable on said at least one processor 42, and at least one communication bus 44.

The steps in the above-described method embodiments are implemented when the computer program 43 is executed by the at least one processor 42.

Illustratively, the computer program 43 may be divided into one or more modules/units, which are stored in the memory 41 and executed by the at least one processor 42 to perform the steps in the above-described method embodiments of the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 43 in the electronic device 4.

The electronic device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. It will be understood by those skilled in the art that the schematic diagram 4 is merely an example of the electronic device 4, and does not constitute a limitation to the electronic device 4, and may include more or less components than those shown, or combine some components, or different components, for example, the electronic device 4 may further include an input-output device, a network access device, a bus, etc.

The at least one Processor 42 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The processor 42 may be a microprocessor or the processor 42 may be any conventional processor or the like, the processor 42 being the control center of the electronic device 4 and connecting the various parts of the entire electronic device 4 using various interfaces and lines.

The memory 41 may be used for storing the computer program 43 and/or the module/unit, and the processor 42 may implement various functions of the electronic device 4 by running or executing the computer program and/or the module/unit stored in the memory 41 and calling data stored in the memory 41. The memory 41 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the stored data area may store data (such as audio data, a phonebook, etc.) created according to the use of the electronic apparatus 4, and the like. In addition, the memory 41 may include a high speed random access memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The integrated modules/units of the electronic device 4 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

In the embodiments provided in the present invention, it should be understood that the disclosed electronic device and method can be implemented in other ways. For example, the above-described embodiments of the electronic device are merely illustrative, and for example, the division of the units is only one logical functional division, and there may be other divisions when the actual implementation is performed.

In addition, functional units in the embodiments of the present invention may be integrated into the same processing unit, or each unit may exist alone physically, or two or more units are integrated into the same unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit scope of the technical solutions of the present invention.

Claims

1. A hot spot data migration flow control method is characterized by comprising the following steps:

recording a data set accessed by a user at intervals of a preset time period;

dividing the data set into a plurality of data blocks;

when the data block determined as the hot-point data is not written into the cache, acquiring a flow control threshold corresponding to the current statistical period in the migration period, wherein the flow control threshold comprises: when the current statistical period is determined not to be the first statistical period, acquiring the data block size of each IO applied by the user in the previous statistical period, and calculating the average data block size of the IO in the previous statistical period; acquiring the transmission delay of each data block in the previous statistical period, and calculating the average IO data block delay in the previous statistical period; acquiring a preset reference value of the size of an IO data block and a reference value of corresponding data block time delay; calculating the IO load intensity in the last statistical period according to the average data block size, the average data block time delay, the reference value of the data block size and the reference value of the corresponding data block time delay of the IO in the last statistical period; determining the IO load category in the last statistical period by utilizing a pre-trained load classification model according to the IO load intensity in the last statistical period; calculating a flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period;

2. The method of claim 1, wherein dividing the data set into a plurality of data blocks comprises:

averagely dividing the data set into a preset number of data blocks; or

Randomly dividing the data set into a preset number of data blocks; or

3. The method of claim 1, wherein the determining whether any of the plurality of data blocks is hot data is performed by calculating a probability value that the data block is accessed, and predicting whether the data block is hot data based on the probability value comprises:

counting the number of times each data block is accessed in the preset time period;

calculating the probability value of each data block accessed in the preset time period based on the number of times each data block is accessed in the preset time period;

judging whether the probability value of the accessed data block is greater than a preset probability value or not;

when the probability value of the accessed data block is judged to be larger than the preset probability value, determining the data block corresponding to the accessed probability value larger than the preset probability value as the hot data;

and when the probability value of the accessed data block is judged to be smaller than or equal to the preset probability value, determining the data block corresponding to the accessed probability value smaller than or equal to the preset probability value as non-hotspot data.

4. The method according to claim 1, wherein the obtaining the flow control threshold corresponding to the current statistical period in the migration period further comprises:

and when the current statistical period is determined to be the first statistical period, determining a preset flow control threshold as a flow control threshold corresponding to the current statistical period.

5. The method according to claim 1, wherein the calculation formula for calculating the IO load intensity in the previous statistical period according to the average data block size, the average data block delay, the reference value of the data block size, and the reference value of the corresponding data block delay of the IO in the previous statistical period is as follows:

6. The method of claim 1, wherein the calculating the flow control threshold corresponding to the current statistical period according to the IO load class in the previous statistical period comprises:

when the IO load category in the last statistical period is a low load category, increasing the flow control threshold corresponding to the last statistical period by a second preset amplitude to obtain the flow control threshold corresponding to the current period;

7. A hot spot data migration flow control device, characterized in that the device comprises:

a dividing module for dividing the data set into a plurality of data blocks;

an obtaining module, configured to obtain a flow control threshold corresponding to a current statistical period in a migration period when the determining module determines that the data block determined as the hot-point data is not written into the cache, where the obtaining module includes: when the current statistical period is determined not to be the first statistical period, acquiring the data block size of each IO applied by the user in the previous statistical period, and calculating the average data block size of the IO in the previous statistical period; acquiring the transmission delay of each data block in the previous statistical period, and calculating the average IO data block delay in the previous statistical period; acquiring a preset reference value of the size of an IO data block and a reference value of corresponding data block time delay; calculating the IO load intensity in the last statistical period according to the average data block size, the average data block time delay, the reference value of the data block size and the reference value of the corresponding data block time delay of the IO in the last statistical period; determining the IO load category in the last statistical period by utilizing a pre-trained load classification model according to the IO load intensity in the last statistical period; calculating a flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period;

8. An electronic device, comprising a processor and a memory, wherein the processor is configured to implement the hot spot data migration flow control method according to any one of claims 1 to 6 when executing a computer program stored in the memory.

9. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the hot spot data migration flow control method according to any one of claims 1 to 6.