WO2019232925A1 - 热点数据迁移流控方法、装置、电子设备及存储介质 - Google Patents

热点数据迁移流控方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2019232925A1
WO2019232925A1 PCT/CN2018/100168 CN2018100168W WO2019232925A1 WO 2019232925 A1 WO2019232925 A1 WO 2019232925A1 CN 2018100168 W CN2018100168 W CN 2018100168W WO 2019232925 A1 WO2019232925 A1 WO 2019232925A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
data block
flow control
statistical period
control threshold
Prior art date
Application number
PCT/CN2018/100168
Other languages
English (en)
French (fr)
Inventor
陈学伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019232925A1 publication Critical patent/WO2019232925A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method, a device, an electronic device, and a storage medium for controlling hot data migration and flow.
  • Cache is a buffer for data exchange.
  • a piece of hardware such as a CPU
  • reads data it first looks for the required data from the cache, if it finds it, it executes it directly, and if it cannot find it, it looks for it from memory.
  • the cache runs much faster than the memory, so the role of the cache is to help the hardware run faster.
  • the cache is only a copy of a small amount of data in the memory, so when the hardware looks for data in the cache, it will not be found (because the data is not copied from the memory to the cache), at this time the hardware looks in memory Data, so that the entire system will slow down.
  • Hotspot data is data that is often used by hardware. Storing hotspot data in the cache in advance can facilitate the hardware to directly retrieve hotspot data from the cache to save data acquisition time.
  • a first aspect of the present application provides a hot data migration flow control method, where the method includes:
  • the flow control threshold corresponding to the current statistical period in the migration period is obtained
  • a second aspect of the present application provides a hot data migration flow control device, where the device includes:
  • a recording module for recording a data set accessed by a user every preset period of time; a dividing module for dividing the data set into a plurality of data blocks; a judging module for judging whether there are any data blocks in the plurality of data blocks
  • the data block is hot data; the judgment module is further configured to determine whether a data block determined as the hot data is written into the cache when it is determined that the data block is the hot data; the acquisition module is used to determine when the judgment is determined by the judgment module When a data block that is hot data is not written in the cache, a flow control threshold corresponding to the current statistical period in the migration period is obtained; a migration module is configured to determine the flow control threshold based on the flow control threshold corresponding to the current statistical period. Data blocks of hot data are written to the cache.
  • a third aspect of the present application provides an electronic device including a processor and a memory, where the processor is configured to implement the hot data migration flow control method when executing computer-readable instructions stored in the memory.
  • a fourth aspect of the present application provides a non-volatile readable storage medium, where computer-readable instructions are stored on the non-volatile readable storage medium, and the computer-readable instructions are implemented when executed by a processor. Hot data migration flow control method.
  • the hot data migration flow control method, device, electronic device and storage medium described in this application record the data set accessed by the user every preset time period, divide the data set into multiple data blocks, and determine that there are data blocks
  • the data block determined as the hot data is obtained by obtaining the flow control thresholds corresponding to different statistical periods in the migration period, and based on the flow control thresholds corresponding to each statistical period.
  • Writing to the cache improves the efficiency of user data migration to the cache and reduces the risk of data loss, while avoiding a significant impact on normal I / O business performance, and has a good flow control effect.
  • FIG. 1 is a flowchart of a hot data migration flow control method provided in Embodiment 1 of the present application.
  • FIG. 2 is a flowchart of a method for determining a flow control threshold corresponding to a current statistical period according to an IO load of a user application in a previous statistical period according to a second embodiment of the present application.
  • FIG. 3 is a functional module diagram of a hot data migration flow control device provided in Embodiment 3 of the present application.
  • FIG. 4 is a schematic diagram of an electronic device according to a fourth embodiment of the present application.
  • the hot data migration flow control method in the embodiment of the present application is applied to one or more electronic devices.
  • the hot data migration flow control method can also be applied to a hardware environment composed of an electronic device and a server connected to the electronic device through a network.
  • the network includes, but is not limited to: a wide area network, a metropolitan area network, or a local area network.
  • the hot data migration flow control method in the embodiment of the present application may be executed by a server or an electronic device; it may also be executed jointly by the server and the electronic device.
  • the hot data migration flow control function provided by the method of the present application may be directly integrated on the electronic device, or a client for implementing the method of the present application may be installed.
  • the method provided in this application can also be run on a device such as a server in the form of a Software Development Kit (SDK), and provide an interface for hot data migration and flow control functions in the form of an SDK, an electronic device, or other The device can implement the hot data migration flow control function through the provided interface.
  • SDK Software Development Kit
  • FIG. 1 is a flowchart of a hot data migration flow control method provided in Embodiment 1 of the present application. According to different requirements, the execution order in this flowchart can be changed, and some steps can be omitted.
  • the preset time period is a preset time period, for example, a week or 10 days. This application does not specifically limit the preset time period, and can be set by itself according to the hardware or data access conditions of the electronic system.
  • the electronic device When the electronic device detects the user's instruction to access the data, it responds to the user's instruction to access the data and feeds back the user's accessed data to the user. A data set accessed by all users during the preset time period is recorded.
  • the recorded data set accessed by the user is divided into a plurality of data blocks.
  • dividing the data set into multiple data blocks may include one or more of the following combinations:
  • the data set is evenly divided into a preset number of data blocks.
  • the preset number is a preset number of data blocks.
  • the data set is evenly divided into 10 data blocks, and each data block has the same size.
  • the data set is randomly divided into 10 data blocks, each of which has a different size.
  • the data set is divided into a plurality of data blocks according to a preset size.
  • the preset size is a preset data block size.
  • the data set is divided into multiple data blocks, and each data block has a size of 1 Mb.
  • the preset size may also be 10Mb or more.
  • the determining whether one of the plurality of data blocks is hot data may specifically include:
  • the data set accessed by the user during this week is divided into 20 data blocks, including data block 1, data block 2, data block 3, data block 4, data block 5, and data block. 6, data block 7, data block 8, data block 9, data block 10, data block 11, data block 12, data block 13, data block 14, data block 15, data block 16, data block 17, data block 18, Data block 19 and data block 20.
  • data block 1 is accessed 10 times in a week
  • data block 2 is accessed 5 times in a week
  • data block 3 is accessed 8 times in a week
  • data block 4 is accessed 20 in a week.
  • Data block 5 is accessed 50 times in a week
  • data block 6 is accessed 3 times in a week
  • data block 7 is accessed 20 times in a week
  • data block 8 is accessed 40 times in a week
  • Data block 9 is accessed once in a week
  • data block 10 is accessed 5 times in a week
  • data block 11 is accessed 9 times in a week
  • data block 12 is accessed 11 times in a week
  • data blocks 13 was accessed 10 times in a week
  • data block 14 was accessed 12 times in a week
  • data block 15 was accessed 20 times in a week
  • data block 16 was accessed 30 times in a week
  • data block 17 was accessed in It was accessed 14 times in a week
  • data block 18 was accessed 0 times in a week
  • data block 19 was accessed 2 times in a week
  • data block 20 was accessed 50 times in a week.
  • the formula for calculating the probability value of each data block being accessed is: Among them, X i represents the number of times that the i-th data block is accessed in one week, and P i is the probability that the i-th data block is accessed in one week. From this, the probability value that the data block 1 is accessed can be calculated as follows:
  • the probability value P 2 of the data block 2 being accessed can be calculated as 1.56%
  • the probability value P 3 being accessed of the data block 3 can be calculated as 2.5%
  • the probability values of other data blocks being accessed are not described in detail.
  • the preset threshold may be, for example, 20%, so data contained in a data block with a probability of being accessed greater than 20% may be regarded as hot data.
  • step S14 is performed; when it is determined that no data block is hot data, step S11 may be returned to be performed.
  • step S15 is performed.
  • a migration cycle The entire process from data blocks that are determined as hot data and not written to the cache from the beginning of writing to the completion of the writing is called a migration cycle.
  • a migration period can be divided into multiple statistical periods, and a statistical period can be a preset time period. For example, a statistical period is set to 1 second.
  • the flow control refers to flow control. There are two methods for implementing flow control: one is to implement flow control based on source address, destination address, source port, destination port, and protocol type through the QoS module of routers and switches; the other is to use professional flow control equipment Implement application-based flow control.
  • the acquiring a flow control threshold corresponding to a current statistical period within a migration period may specifically include:
  • the flow control threshold corresponding to the first statistical period in the migration period of the present application is a preset flow control threshold, which can be preset by a system administrator according to experience. That is, a preset flow control threshold is adopted as the flow control threshold of the first statistical period in the migration period.
  • Each remaining statistical period except the first statistical period in the migration period may correspond to a flow control threshold.
  • the flow control threshold corresponding to each remaining statistical period is dynamically adjusted.
  • the flow control threshold corresponding to the current statistical period can be calculated based on the IO load in the previous statistical period.
  • the flow control threshold corresponding to the next statistical period can be based on the current statistical period.
  • the calculated IO load is calculated. Specifically, the flow control threshold corresponding to the second statistical period is calculated according to the IO load in the first statistical period; the flow control threshold corresponding to the third statistical period is calculated according to the IO load in the second statistical period; analogy.
  • the data block determined as the hot data is written into the cache according to the flow control threshold corresponding to the current statistical period, and the data block determined as the hot data is written to the data with the flow controlled by the current statistical period, so that the write
  • the hot data in the cache should not be too fast or too slow, which can avoid a significant impact on normal I / O business performance.
  • the hot data written in the cache can be accessed by users.
  • FIG. 2 is a flowchart of a method for determining a flow control threshold corresponding to a current statistical period according to an IO load of a user application in a previous statistical period according to a second embodiment of the present application.
  • S21 Obtain a data block size of each IO applied by a user in a previous statistical period, and calculate an average data block size of the IO in the previous statistical period.
  • the average data block size of the IO in the last statistical period may be calculated by using an arithmetic average algorithm, a geometric mean algorithm, or a root mean square algorithm.
  • the data block sizes of the ten IOs are: 2M, 1M, 3M, 0.5M, 10M, 4M, 0.1M, 1.2M, 5M. And 8M.
  • the transmission delay refers to the time required for a node to enter a data block from the node to the transmission medium when transmitting data, that is, all required by a transmitting station from the start of sending a data frame to the completion of the data frame transmission Time, or the total time required for a receiving station to start receiving data frames and finish receiving them.
  • the transmission delay of the data block may be obtained from a load measurement tool or a performance monitoring tool installed in each storage node.
  • the average data block delay of the IO in the last statistical period may also be calculated by using an arithmetic average algorithm, a geometric mean algorithm, or a root mean square algorithm. Assume that assuming that the transmission delays of ten IOs in the previous statistical period are: 1s, 0.8s, 1.5s, 0.4s, 5s, 2s, 0.02s, 0.6s, 3s, and 4.5s, then When the average IO block delay in the previous statistical period is calculated using the arithmetic mean algorithm, the result is:
  • the average data block size of the IO in the previous statistical period is calculated using the arithmetic average algorithm, the average data block delay of the IO in the previous statistical period is also calculated using the arithmetic average algorithm; if The average data block size of the IO in the previous statistical period is calculated using the geometric mean algorithm, and the average data block delay of the IO in the previous statistical period is also calculated using the geometric mean algorithm; or The average data block size of the IO is calculated using the root mean square average algorithm, and the average data block delay of the IO in the previous statistical period is also calculated using the root mean square average algorithm.
  • the reference value of the size of the IO data block and the reference value of the corresponding data block delay may be preset by an administrator of the storage system according to experience. For example, according to experience, when a 4K data block is transmitted, the delay is the smallest, and in the ideal state, it can reach 50ms, then the reference value of the IO data block size can be set to 4k, and the corresponding data block delay reference value can be set. It is 50ms.
  • the average data block size of the IO in the previous statistical period is X
  • the average data block delay is Y
  • the reference value of the data block size is M
  • the reference value of the corresponding data block delay is N
  • the calculation formula of the IO load intensity in the previous statistical period is:
  • the IO load category includes: a high load category, a normal load category, and a low load category.
  • the load classification model includes, but is not limited to, a Support Vector Machine (SVM) model.
  • SVM Support Vector Machine
  • Using the average data block size of the IO in the last statistical period, the average data block delay of the IO in the last statistical period, and the IO load intensity in the last statistical period as the load classification model The input is calculated by the load classification model, and the IO load category in the previous statistical period is output.
  • SVM Support Vector Machine
  • the training process of the load classification model includes:
  • training samples in the training sets of different load categories are distributed to different folders. For example, training samples of high load category are distributed to the first folder, training samples of normal load category are distributed to the second folder, and training samples of low load category are distributed to the third folder.
  • training samples of the first preset ratio for example, 70%
  • second preset ratios for example, 30%
  • the accuracy rate is greater than or equal to a preset accuracy rate, end training, and use the trained load classification model as a classifier to identify the IO load category in the current statistical period; if the accuracy rate is less than When the accuracy is preset, the number of positive samples and the number of negative samples are increased to retrain the load classification model until the accuracy is greater than or equal to the preset accuracy.
  • calculating the flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period may include:
  • the flow control threshold is lowered according to the first preset amplitude, so as to perform writing to the data block determined as the hot data with the low flow control threshold in the current statistical period. Cache operations ensure efficient access to user applications by reducing the speed of data migration.
  • the first preset amplitude may be 1/2 of a flow control threshold corresponding to a previous statistical period. That is, the flow control threshold corresponding to the current statistical period is 1/2 of the flow control threshold corresponding to the previous statistical period, and the flow control threshold corresponding to the next statistical period is 1/2 of the flow control threshold corresponding to the current statistical period.
  • the flow control threshold is increased according to the second preset amplitude to perform writing to the data block determined as the hot data with the high flow control threshold in the current statistical period.
  • the cache operation improves the speed of data migration on the basis of ensuring the access quality of user applications.
  • the second preset amplitude may be 1.5 times a flow control threshold corresponding to a previous statistical period. That is, the flow control threshold corresponding to the current statistical period is 1.5 times the flow control threshold corresponding to the previous statistical period, and the flow control threshold corresponding to the next statistical period is 1.5 times the flow control threshold corresponding to the current statistical period.
  • the flow control threshold corresponding to the previous statistical cycle is used as the flow control threshold corresponding to the current statistical cycle.
  • the hot data migration flow control method described in this application records a data set accessed by a user every preset period of time, divides the data set into multiple data blocks, and determines that data blocks are hot data.
  • the data block determined as the hotspot data is written to the flow control threshold corresponding to different statistical periods in the migration period based on the flow control threshold corresponding to each statistical period.
  • the cache while improving the efficiency of user data migration to the cache and reducing the risk of data loss, it can avoid a significant impact on normal I / O business performance and has a good flow control effect.
  • the flow control threshold corresponding to the current statistical cycle is automatically adjusted dynamically according to the IO load of the user application in the previous statistical cycle, without manual adjustment by the manager, which reduces the workload of the manager and avoids the subjective factors of the manager The problem caused by inaccurate adjustment.
  • FIG. 3 is a functional module diagram of a preferred embodiment of the hot data migration flow control device of the present application.
  • the hot data migration flow control device 30 runs in an electronic device.
  • the hot data migration flow control device 30 may include a plurality of function modules composed of program code segments.
  • the program code of each program segment in the hot data migration flow control device 30 may be stored in a memory and executed by at least one processor to execute (see Figure 1-2 and related descriptions for details) the hot data migration flow. ⁇ ⁇ Control method.
  • the hot data migration flow control device 30 may be divided into a plurality of functional modules according to functions performed by the hot data migration flow control device 30.
  • the functional modules may include a recording module 301, a division module 302, a judgment module 303, an acquisition module 304, a migration module 305, a calculation module 306, a determination module 307, and a training module 308.
  • the module referred to in the present application refers to a series of computer-readable instruction segments capable of being executed by at least one processor and capable of performing fixed functions, which are stored in a memory. In some embodiments, functions of each module will be described in detail in subsequent embodiments.
  • the recording module 301 is configured to record a data set accessed by a user every preset time period.
  • the preset time period is a preset time period, for example, a week or 10 days. This application does not specifically limit the preset time period, and can be set by itself according to the hardware or data access conditions of the electronic system.
  • the recording module 301 records a data set accessed by all users in the preset time period.
  • a dividing module 302 is configured to divide the data set into multiple data blocks.
  • the recorded data set accessed by the user is divided into a plurality of data blocks.
  • the dividing module 302 divides the data set into multiple data blocks may include one or more of the following combinations:
  • the data set is evenly divided into a preset number of data blocks.
  • the preset number is a preset number of data blocks.
  • the data set is evenly divided into 10 data blocks, and each data block has the same size.
  • the data set is randomly divided into 10 data blocks, each of which has a different size.
  • the data set is divided into a plurality of data blocks according to a preset size.
  • the preset size is a preset data block size.
  • the data set is divided into multiple data blocks, and each data block has a size of 1 Mb.
  • the preset size may also be 10Mb or more.
  • the judging module 303 is configured to judge whether any data block in the multiple data blocks is hot data.
  • the determining module 303 determines whether any of the plurality of data blocks is hot data.
  • the probability value of the data block being accessed can be calculated by calculating the probability value of the data block to be accessed based on the probability value. .
  • the judging module 303 judging whether one of the plurality of data blocks is hot data may specifically include:
  • the data set accessed by the user during this week is divided into 20 data blocks, including data block 1, data block 2, data block 3, data block 4, data block 5, data block 6, data block 7, data block 8, data block 9, data block 10, data block 11, data block 12, data block 13, data block 14, data block 15, data block 16, data block 17, data block 18, Data block 19 and data block 20.
  • data block 1 is accessed 10 times in a week
  • data block 2 is accessed 5 times in a week
  • data block 3 is accessed 8 times in a week
  • data block 4 is accessed 20 in a week.
  • Data block 5 is accessed 50 times in a week
  • data block 6 is accessed 3 times in a week
  • data block 7 is accessed 20 times in a week
  • data block 8 is accessed 40 times in a week
  • Data block 9 is accessed once in a week
  • data block 10 is accessed 5 times in a week
  • data block 11 is accessed 9 times in a week
  • data block 12 is accessed 11 times in a week
  • data blocks 13 was accessed 10 times in a week
  • data block 14 was accessed 12 times in a week
  • data block 15 was accessed 20 times in a week
  • data block 16 was accessed 30 times in a week
  • data block 17 was accessed in It was accessed 14 times in a week
  • data block 18 was accessed 0 times in a week
  • data block 19 was accessed 2 times in a week
  • data block 20 was accessed 50 times in a week.
  • the formula for calculating the probability value of each data block being accessed is: Among them, X i represents the number of times that the i-th data block is accessed in one week, and P i is the probability that the i-th data block is accessed in one week. From this, the probability value that the data block 1 is accessed can be calculated as follows:
  • the probability value P 2 of the data block 2 being accessed can be calculated as 1.56%
  • the probability value P 3 being accessed of the data block 3 can be calculated as 2.5%
  • the probability values of other data blocks being accessed are not described in detail.
  • the preset threshold may be, for example, 20%, so data contained in a data block with a probability of being accessed greater than 20% may be regarded as hot data.
  • the determining module 303 is further configured to determine whether a data block determined as the hot data is written into the cache when it is determined that the data block is the hot data.
  • the obtaining module 304 is configured to obtain a flow control threshold corresponding to a current statistical period in a migration period when the determining block 303 determines that a data block determined as the hot data is not written into the cache.
  • a migration cycle The entire process from data blocks that are determined as hot data and not written to the cache from the beginning of writing to the completion of the writing is called a migration cycle.
  • a migration period can be divided into multiple statistical periods, and a statistical period can be a preset time period. For example, a statistical period is set to 1 second.
  • the flow control refers to flow control. There are two methods for implementing flow control: one is to implement flow control based on source address, destination address, source port, destination port, and protocol type through the QoS module of routers and switches; the other is to use professional flow control equipment Implement application-based flow control.
  • the obtaining module 304 obtaining the flow control threshold corresponding to the current statistical period within the migration period may specifically include:
  • the flow control threshold corresponding to the first statistical period in the migration period of the present application is a preset flow control threshold, which can be preset by a system administrator according to experience. That is, a preset flow control threshold is adopted as the flow control threshold of the first statistical period in the migration period.
  • Each remaining statistical period except the first statistical period in the migration period may correspond to a flow control threshold.
  • the flow control threshold corresponding to each remaining statistical period is dynamically adjusted.
  • the flow control threshold corresponding to the current statistical period can be calculated based on the IO load in the previous statistical period.
  • the flow control threshold corresponding to the next statistical period can be based on the current statistical period.
  • the calculated IO load is calculated. Specifically, the flow control threshold corresponding to the second statistical period is calculated according to the IO load in the first statistical period; the flow control threshold corresponding to the third statistical period is calculated according to the IO load in the second statistical period; analogy.
  • the migration module 305 is configured to write the data block determined as the hot data to a cache based on a flow control threshold corresponding to the current statistical period.
  • the data block determined as the hot data is written into the cache according to the flow control threshold corresponding to the current statistical period, and the data block determined as the hot data is written to the data with the flow controlled by the current statistical period, so that the write
  • the hot data in the cache should not be too fast or too slow, and the hot data written in the cache can be accessed by users.
  • the obtaining module 304 is further configured to obtain a data block size of each IO applied by a user in a previous statistical period, and calculate an average data block size of the IO in the previous statistical period.
  • the average data block size of the IO in the last statistical period may be calculated by using an arithmetic average algorithm, a geometric mean algorithm, or a root mean square algorithm.
  • the data block sizes of the ten IOs are: 2M, 1M, 3M, 0.5M, 10M, 4M, 0.1M, 1.2M, 5M. And 8M.
  • the obtaining module 304 is further configured to obtain a transmission delay of each data block in the last statistical period, and calculate an average data block delay of the IO in the last statistical period.
  • the transmission delay refers to the time required for a node to enter a data block from the node to the transmission medium when transmitting data, that is, all required by a transmitting station from the start of sending a data frame to the completion of the data frame transmission Time, or the total time required for a receiving station to start receiving data frames and finish receiving them.
  • the transmission delay of the data block may be obtained from a load measurement tool or a performance monitoring tool installed in each storage node.
  • the average data block delay of the IO in the last statistical period may also be calculated by using an arithmetic average algorithm, a geometric mean algorithm, or a root mean square algorithm. Assume that assuming that the transmission delays of ten IOs in the previous statistical period are: 1s, 0.8s, 1.5s, 0.4s, 5s, 2s, 0.02s, 0.6s, 3s, and 4.5s, then When the average IO block delay in the previous statistical period is calculated using the arithmetic mean algorithm, the result is:
  • the average data block size of the IO in the previous statistical period is calculated using the arithmetic average algorithm, the average data block delay of the IO in the previous statistical period is also calculated using the arithmetic average algorithm; if The average data block size of the IO in the previous statistical period is calculated using the geometric mean algorithm, and the average data block delay of the IO in the previous statistical period is also calculated using the geometric mean algorithm; or The average data block size of the IO is calculated using the root mean square average algorithm, and the average data block delay of the IO in the previous statistical period is also calculated using the root mean square average algorithm.
  • the obtaining module 304 is further configured to obtain a preset reference value of the data block size of the IO and a reference value of the corresponding data block delay.
  • the reference value of the size of the IO data block and the reference value of the corresponding data block delay may be preset by an administrator of the storage system according to experience. For example, according to experience, when a 4K data block is transmitted, the delay is the smallest, and in the ideal state, it can reach 50ms, then the reference value of the IO data block size can be set to 4k, and the corresponding data block delay reference value can be set. It is 50ms.
  • a calculation module 306 configured to calculate the average data block size, average data block delay, data block size reference value, and corresponding data block delay reference value of the IO in the last statistical period; IO load intensity in the last statistical period.
  • the average data block size of the IO in the previous statistical period is X
  • the average data block delay is Y
  • the reference value of the data block size is M
  • the reference value of the corresponding data block delay is N
  • the calculation formula of the IO load intensity in the previous statistical period is:
  • a determining module 307 is configured to determine a IO load category in the previous statistical period by using a pre-trained load classification model according to the IO load intensity in the last statistical period.
  • the IO load category includes: a high load category, a normal load category, and a low load category.
  • the load classification model includes, but is not limited to, a Support Vector Machine (SVM) model.
  • SVM Support Vector Machine
  • Using the average data block size of the IO in the last statistical period, the average data block delay of the IO in the last statistical period, and the IO load intensity in the last statistical period as the load classification model The input is calculated by the load classification model, and the IO load category in the previous statistical period is output.
  • SVM Support Vector Machine
  • the training module 308 is configured to train a load classification model.
  • the process of the training module 308 training the load classification model includes:
  • training samples in the training sets of different load categories are distributed to different folders. For example, training samples of high load category are distributed to the first folder, training samples of normal load category are distributed to the second folder, and training samples of low load category are distributed to the third folder.
  • training samples of the first preset ratio for example, 70%
  • second preset ratios for example, 30%
  • the accuracy rate is greater than or equal to a preset accuracy rate, end training, and use the trained load classification model as a classifier to identify the IO load category in the current statistical period; if the accuracy rate is less than When the accuracy is preset, the number of positive samples and the number of negative samples are increased to retrain the load classification model until the accuracy is greater than or equal to the preset accuracy.
  • the calculation module 306 is further configured to calculate a flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period.
  • the calculating module 306 calculating the flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period may include:
  • the flow control threshold is lowered according to the first preset amplitude, so as to perform writing to the data block determined as the hot data with the low flow control threshold in the current statistical period. Cache operations ensure efficient access to user applications by reducing the speed of data migration.
  • the first preset amplitude may be 1/2 of a flow control threshold corresponding to a previous statistical period. That is, the flow control threshold corresponding to the current statistical period is 1/2 of the flow control threshold corresponding to the previous statistical period, and the flow control threshold corresponding to the next statistical period is 1/2 of the flow control threshold corresponding to the current statistical period.
  • the flow control threshold is increased according to the second preset amplitude to perform writing to the data block determined as the hot data with the high flow control threshold in the current statistical period.
  • the cache operation improves the speed of data migration on the basis of ensuring the access quality of user applications.
  • the second preset amplitude may be 1.5 times a flow control threshold corresponding to a previous statistical period. That is, the flow control threshold corresponding to the current statistical period is 1.5 times the flow control threshold corresponding to the previous statistical period, and the flow control threshold corresponding to the next statistical period is 1.5 times the flow control threshold corresponding to the current statistical period.
  • the flow control threshold corresponding to the previous statistical cycle is used as the flow control threshold corresponding to the current statistical cycle.
  • the hot data migration flow control device described in this application records a data set accessed by a user every preset period of time, divides the data set into multiple data blocks, and determines that there are data blocks as hot data.
  • the data block determined as the hotspot data is written to the flow control threshold corresponding to different statistical periods in the migration period based on the flow control threshold corresponding to each statistical period.
  • the cache while improving the efficiency of user data migration to the cache and reducing the risk of data loss, it can avoid a significant impact on normal I / O business performance and has a good flow control effect.
  • the flow control threshold corresponding to the current statistical cycle is automatically adjusted dynamically according to the IO load of the user application in the previous statistical cycle, without manual adjustment by the manager, which reduces the workload of the manager and avoids the subjective factors of the manager The problem caused by inaccurate adjustment.
  • the above integrated unit implemented in the form of a software functional module may be stored in a non-volatile readable storage medium.
  • the above software function module is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a dual-screen device, or a network device) or a processor to execute the embodiments described in this application. Part of the method.
  • FIG. 4 is a schematic diagram of an electronic device according to a fourth embodiment of the present application.
  • the electronic device 4 includes: a memory 41, at least one processor 42, computer-readable instructions 43 stored in the memory 41 and executable on the at least one processor 42, and at least one communication bus 44.
  • the computer-readable instructions 43 may be divided into one or more modules / units, and the one or more modules / units are stored in the memory 41 and processed by the at least one processor 42 Perform to complete the steps in the above method embodiment of the present application.
  • the one or more modules / units may be a series of computer-readable instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 43 in the electronic device 4.
  • the electronic device 4 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the schematic diagram 4 is only an example of the electronic device 4, and does not constitute a limitation on the electronic device 4. It may include more or fewer components than shown in the figure, or combine some components, or be different
  • the electronic device 4 may further include an input / output device, a network access device, a bus, and the like.
  • the at least one processor 42 may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), and application-specific integrated circuits (ASICs). ), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the processor 42 may be a microprocessor, or the processor 42 may be any conventional processor, etc.
  • the processor 42 is a control center of the electronic device 4, and uses various interfaces and lines to connect the entire electronic device 4 The various parts.
  • the memory 41 may be configured to store the computer-readable instructions 43 and / or modules / units, and the processor 42 may execute or execute the computer-readable instructions and / or modules / units stored in the memory 41, and Recalling the data stored in the memory 41 to implement various functions of the electronic device 4.
  • the memory 41 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, application programs required for at least one function (such as a sound playback function, an image playback function, etc.), etc .; the storage data area may Data (such as audio data, phonebook, etc.) created according to the use of the electronic device 4 are stored.
  • the memory 41 may include a high-speed random access memory, and may also include a non-volatile memory, such as a hard disk, an internal memory, a plug-in hard disk, a Smart Memory Card (SMC), and a Secure Digital (SD). Card, flash memory card (Flash card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
  • a non-volatile memory such as a hard disk, an internal memory, a plug-in hard disk, a Smart Memory Card (SMC), and a Secure Digital (SD).
  • SSD Secure Digital
  • flash memory card Flash card
  • flash memory device at least one disk storage device, flash memory device, or other volatile solid-state storage device.
  • the integrated module / unit of the electronic device 4 When the integrated module / unit of the electronic device 4 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile readable storage medium. Based on this understanding, this application implements all or part of the processes in the methods of the above embodiments, and can also be completed by computer-readable instructions to instruct related hardware.
  • the computer-readable instructions can be stored in a non-volatile memory. In the read storage medium, when the computer-readable instructions are executed by a processor, the steps of the foregoing method embodiments can be implemented.
  • the computer-readable instructions include computer-readable instruction codes, and the computer-readable instruction codes may be in a source code form, an object code form, an executable file, or some intermediate form.
  • the non-volatile readable medium may include: any entity or device capable of carrying the computer program readable instruction code, a recording medium, a U disk, a mobile hard disk, a magnetic disk, an optical disk, a computer memory, a read only memory (ROM , Read-Only Memory), Random Access Memory (RAM, Random Access Memory), electric carrier signals, telecommunication signals, and software distribution media.
  • ROM Read Only memory
  • RAM Random Access Memory
  • Each functional unit in each embodiment of the present application may be integrated in the same processing unit, or each unit may exist separately physically, or two or more units may be integrated in the same unit.
  • the integrated unit can be implemented in the form of hardware, or in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种热点数据迁移流控方法,包括:每隔预设时间段记录用户访问的数据集;将所述数据集划分为多个数据块;判断所述多个数据块中是否有数据块为热点数据;当确定有数据块为热点数据时,判断被确定为热点数据的数据块是否写入缓存中;当判断被确定为热点数据的数据块没有写入缓存中时,获取迁移周期内的当前统计周期对应的流控阈值;基于所述当前统计周期对应的流控阈值,将所述被确定为热点数据的数据块写入到缓存中。本申请还提供一种热点数据迁移流控装置、电子设备及存储介质。本申请能够在将热点数据写入缓存中、节省读取热点数据的时间的同时,避免对正常输入输出业务性能造成明显冲击,具有很好的流控效果。

Description

热点数据迁移流控方法、装置、电子设备及存储介质
本申请要求于2018年06月04日提交中国专利局,申请号为201810565747.X发明名称为“热点数据迁移流控方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,具体涉及一种热点数据迁移流控方法、装置、电子设备及存储介质。
背景技术
缓存就是数据交换的缓冲区,当某一硬件,如CPU,要读取数据时,会首先从缓存中查找需要的数据,如果找到了则直接执行,找不到的话则从内存中找。缓存的运行速度比内存快得多,故缓存的作用就是帮助硬件更快地运行。
然而,缓存只是内存中少部分数据的复制品,所以硬件到缓存中寻找数据时,会出现找不到的情况(因为这些数据没有从内存复制到缓存中去),此时硬件去内存中找数据,这样整个系统的运行速度就会慢下来。
热点数据是硬件经常需要用到的数据,将热点数据提前存入到缓存中去,可以便于硬件调用热点数据时,能够直接从缓存中获取,以此节省数据获取的时间。
然而,在将热点数据存储到缓存的过程中,会产生大量的输入输出(Input/Output,IO),如果此时正好是用户应用的IO高峰期,会影响用户应用的响应时间,给用户带来不好的体验。
发明内容
鉴于以上内容,有必要提出一种热点数据迁移流控方法、装置、电子设备及存储介质,能够在将热点数据写入缓存中、节省读取热点数据的时间的同时,避免对正常输入输出业务性能造成明显冲击,具有很好的流控效果。
本申请的第一方面提供一种热点数据迁移流控方法,所述方法包括:
每隔预设时间段记录用户访问的数据集;
将所述数据集划分为多个数据块;
判断所述多个数据块中是否有数据块为热点数据;
当确定有数据块为热点数据时,判断被确定为热点数据的数据块是否写入缓存中;
当判断被确定为热点数据的数据块没有写入缓存中时,获取迁移周期内的当前统计周期对应的流控阈值;
基于所述当前统计周期对应的流控阈值,将所述被确定为热点数据的数据块写入到缓存中。
本申请的第二方面提供一种热点数据迁移流控装置,所述装置包括:
记录模块,用于每隔预设时间段记录用户访问的数据集;划分模块,用于将所述数据集划分为多个数据块;判断模块,用于判断所述多个数据块中是否有数据块为热点数据;判断模块,还用于当确定有数据块为热点数据时,判断 被确定为热点数据的数据块是否写入缓存中;获取模块,用于当所述判断模块判断被确定为热点数据的数据块没有写入缓存中时,获取迁移周期内的当前统计周期对应的流控阈值;迁移模块,用于基于所述当前统计周期对应的流控阈值,将所述被确定为热点数据的数据块写入到缓存中。
本申请的第三方面提供一种电子设备,所述电子设备包括处理器和存储器,所述处理器用于执行所述存储器中存储的计算机可读指令时实现所述热点数据迁移流控方法。
本申请的第四方面提供一种非易失性可读存储介质,所述非易失性可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现所述热点数据迁移流控方法。
本申请所述的热点数据迁移流控方法、装置、电子设备及存储介质,每隔预设时间段记录用户访问的数据集,将所述数据集划分为多个数据块,在确定有数据块为热点数据且未被写入缓存时,通过获取迁移周期内的不同统计周期对应的流控阈值,基于所述每一个统计周期对应的流控阈值,将所述被确定为热点数据的数据块写入到缓存中,在提高用户数据迁移至缓存的效率、降低数据丢失风险的同时,能够避免对正常输入输出业务性能造成明显冲击,具有很好的流控效果。
附图说明
图1是本申请实施例一提供的热点数据迁移流控方法的流程图。
图2是本申请实施例二提供的根据上一个统计周期内用户应用的IO负载确定当前统计周期对应的流控阈值的方法的流程图。
图3是本申请实施例三提供的热点数据迁移流控装置的功能模块图。
图4是本申请实施例四提供的电子设备的示意图。
如下具体实施方式将结合上述附图进一步说明本申请。
具体实施方式
本申请实施例的热点数据迁移流控方法应用在一个或者多个电子设备中。所述热点数据迁移流控方法也可以应用于由电子设备和通过网络与所述电子设备进行连接的服务器所构成的硬件环境中。网络包括但不限于:广域网、城域网或局域网。本申请实施例的热点数据迁移流控方法可以由服务器来执行,也可以由电子设备来执行;还可以是由服务器和电子设备共同执行。
对于需要进行热点数据迁移流控方法的电子设备,可以直接在电子设备上集成本申请的方法所提供的热点数据迁移流控功能,或者安装用于实现本申请的方法的客户端。再如,本申请所提供的方法还可以以软件开发工具包(Software Development Kit,SDK)的形式运行在服务器等设备上,以SDK的形式提供热点数据迁移流控功能的接口,电子设备或其他设备通过提供的接口即可实现热点数据迁移流控的功能。
实施例一
图1是本申请实施例一提供的热点数据迁移流控方法的流程图。根据不同的需求,该流程图中的执行顺序可以改变,某些步骤可以省略。
S11、每隔预设时间段记录用户访问的数据集。
预设时间段为预先设置的时间周期,例如,一周或者10天等。本申请对预设时间段不做具体限定,可依据电子系统的硬件或者数据访问情况自行设置。
电子设备侦测到用户访问数据的指令时,响应用户访问数据的指令,将用户访问的数据反馈给用户。记录在所述预设时间段内所有用户访问的数据集。
S12、将所述数据集划分为多个数据块。
将所记录的用户访问的数据集划分为多个数据块。
在本申请较佳实施例中,将所述数据集划分为多个数据块可以包括以下一种或多种的组合:
1)将所述数据集平均划分为预设数量的数据块。
所述预设数量为预先设置的数据块的个数,例如,将所述数据集平均划分为10个数据块,每个数据块的大小相同。
2)将所述数据集随机划分为预设数量的数据块。
例如,将所述数据集随机划分为10个数据块,每个数据块的大小均不相同。
3)将所述数据集按照预设大小划分为多个数据块。
所述预设大小为预先设置的数据块的大小,例如,将所述数据集划分为多个数据块,每个数据块的大小为1Mb。所述预设大小还可以为10Mb或者更大。
S13、判断所述多个数据块中是否有数据块为热点数据。
在本申请较佳实施例中,判断所述多个数据块中是否有数据块为热点数据可以通过计算数据块被访问的概率值,基于所述概率值预测数据块是否为热点数据。
所述判断所述多个数据块中是否有数据块为热点数据具体可以包括:
1)统计每个数据块在所述预设时间段内被访问的次数;
2)基于每个数据块在所述预设时间段内被访问的次数,计算每个数据块在所述预设时间段内被访问的概率值;
3)判断每个数据块被访问的概率值是否大于预设概率值;
4)当判断数据块被访问的概率值大于所述预设概率值时,确定大于所述预设概率值的被访问的概率值对应的数据块为热点数据;当判断数据块被访问的概率值小于或者等于所述预设概率值时,确定小于或者等于所述预设概率值的被访问的概率值对应的数据块为非热点数据。
举例说明,假如预设时间段为一周,将这一周内用户访问的数据集划分为20个数据块,包括数据块1、数据块2、数据块3、数据块4、数据块5、数据块6、数据块7、数据块8、数据块9、数据块10、数据块11、数据块12、数据块13、数据块14、数据块15、数据块16、数据块17、数据块18、数据块19以及数据块20。其中,所述数据块1在一周内被访问了10次、数据块2在一周内被访问了5次、数据块3在一周内被访问了8次、数据块4在一周内被访问了20次、数据块5在一周内被访问了50次、数据块6在一 周内被访问了3次、数据块7在一周内被访问了20次、数据块8在一周内被访问了40次、数据块9在一周内被访问了1次、数据块10在一周内被访问了5次、数据块11在一周内被访问了9次、数据块12在一周内被访问了11次、数据块13在一周内被访问了10次、数据块14在一周内被访问了12次、数据块15在一周内被访问了20次、数据块16在一周内被访问了30次、数据块17在一周内被访问了14次、数据块18在一周内被访问了0次、数据块19在一周内被访问了2次以及数据块20在一周内被访问了50次。计算每个数据块被访问的概率值的公式为:
Figure PCTCN2018100168-appb-000001
其中,X i表示第i个数据块在一周内被访问的次数,P i为第i个数据块在一周内被访问的概率。由此可以计算出所述数据块1被访问的概率值如下所示:
Figure PCTCN2018100168-appb-000002
类似的,可以计算出所述数据块2被访问的概率值P 2=1.56%,数据块3被访问的概率值P 3=2.5%等,其他数据块被访问的概率值不在赘述。
本申请较佳实施例中,所述预设阈值可以是,例如20%,因此被访问的概率值大于20%的数据块里面包含的数据可以被视为热点数据。
当确定有数据块为热点数据时,执行步骤S14;当确定没有数据块为热点数据时,可以返回执行上述步骤S11。
S14、判断被确定为热点数据的数据块是否写入缓存中。
当在缓存中成功命中到被确定为热点数据的数据块时,说明被确定为热点数据的数据块已经写入缓存中;当在缓存中没有命中到被确定为热点数据的数据块时,说明被确定为热点数据的数据块没有写入缓存中。
当判断被确定为热点数据的数据块写入缓存中时,可以直接结束流程;当判断被确定为热点数据的数据块没有写入缓存中时,执行步骤S15。
S15、获取迁移周期内的当前统计周期对应的流控阈值。
将被确定为热点数据且未写入缓存中的数据块从开始写入缓存到完成写入的整个过程称之为一个迁移周期。一个迁移周期可以划分为多个统计周期,一个统计周期可以为一个预设时间段,例如,一个统计周期设置为1秒钟。
所述流控是指流量控制。流控的实现方法包括以下两种:一种是通过路由器、交换机的QoS模块实现基于源地址、目的地址、源端口、目的端口以及协议类型的流量控制;另一种是通过专业的流控设备实现基于应用层的流量控制。
本较佳实施例中,所述获取迁移周期内的当前统计周期对应的流控阈值具体可以包括:
1)判断当前统计周期是否为第一个统计周期。
可以通过判断当前时间是否为第1秒来判断当前迁移周期是否为第一个统计周期。
2)当确定所述当前统计周期为第一个统计周期时,将预设流控阈值确定为所述当前统计周期对应的流控阈值;
本申请的迁移周期内的第一个统计周期对应的流控阈值为预先设置的流控阈值,可以由系统的管理者根据经验预先设置。即,采用一个预设的流控阈值作为迁移周期内的第一个统计周期的流控阈值。
3)当确定所述当前统计周期不为第一个统计周期时,获取上一个统计周期内用户应用的IO负载,根据所述上一个统计周期内用户应用的IO负载,确定所述当前统计周期对应的流控阈值。
迁移周期内的除第一个统计周期外的剩余每一个统计周期可以对应一个流控阈值。剩余每一个统计周期对应的流控阈值是动态调整的,当前统计周期对应的流控阈值可以根据上一个统计周期内的IO负载计算得到,下一个统计周期对应的流控阈值可以根据当前统计周期内的IO负载计算得到。具体而言,根据第一个统计周期内的IO负载计算第二个统计周期对应的流控阈值;根据第二个统计周期内的IO负载计算第三个统计周期对应的流控阈值;以此类推。
所述根据所述上一个统计周期内用户应用的IO负载,确定所述当前统计周期对应的流控阈值的具体过程可以参见图2及其相应描述。
S16、基于所述当前统计周期对应的流控阈值,将所述被确定为热点数据的数据块写入到缓存中。
根据所述当前统计周期对应的流控阈值将被确定为热点数据的数据块写入到缓存中,被确定为热点数据的数据块以当前统计周期控制的流量进行数据的写入,使得写入缓存中的热点数据不至于过快或过慢,能够避免对正常输入输出业务性能造成明显冲击,被写入缓存中的热点数据可供用户访问。
实施例二
图2是本申请实施例二提供的根据上一个统计周期内用户应用的IO负载确定当前统计周期对应的流控阈值的方法的流程图。
S21、获取上一个统计周期内用户应用的每一个IO的数据块大小,计算所述上一个统计周期内的IO的平均数据块大小。
所述上一个统计周期内的IO的平均数据块大小可以采用算术平均值算法、几何平均数算法,或者均方根平均数算法来计算。
举例而言,假设检测到上一个统计周期内,用户应用共有十次IO,十次IO的数据块大小分别为:2M,1M,3M,0.5M,10M,4M,0.1M,1.2M,5M以及8M。利用所述算术平均值算法计算所述上一个统计周期内的IO的平均数据块大小为:S=(2M+1M+3M+0.5M+10M+4M+0.1M+1.2M+5M+8M)/10=3.48M。
S22、获取所述上一个统计周期内的每个数据块的传输时延,计算所述上一个统计周期内的IO的平均数据块时延。
所述传输时延(简称为时延)是指节点在发送数据时使数据块从节点进入到传输媒体所需的时间,即一个发送站点从开始发送数据帧到数据帧发送完毕所需要的全部时间,或者一个接收站点从开始接收数据帧到数据帧接收完毕所需要的全部时间。
在本申请较佳实施例中,所述数据块的传输时延可以从每个存储节点中 安装的一个负载量测工具或者性能监控工具中获取得到。
如上所述,所述上一个统计周期内的IO的平均数据块时延也可以采用算术平均值算法、几何平均数算法,或者均方根平均数算法来计算。假设,假设检测到上一个统计周期内,十次IO的传输时延分别为:1s、0.8s、1.5s、0.4s、5s、2s、0.02s、0.6s、3s及4.5s,则所述上一个统计周期内的IO平均数据块时延采用算术平均值算法来计算时,其结果为:
(1s+0.8s+1.5s+0.4s+5s+2s+0.1s+0.6s+3s+4.4s)=1.88s。
应当理解的是,若上一个统计周期内的IO的平均数据块大小采用算术平均值算法来计算,则上一个统计周期内的IO的平均数据块时延也采用算术平均值算法来计算;若上一个统计周期内的IO的平均数据块大小采用几何平均数算法来计算,则上一个统计周期内的IO的平均数据块时延也采用几何平均数算法来计算;或者若上一个统计周期内的IO的平均数据块大小采用均方根平均数算法来计算,则上一个统计周期内的IO的平均数据块时延也采用均方根平均数算法来计算。
S23、获取预先设置的IO的数据块大小的基准值及对应的数据块时延的基准值。
在本申请较佳实施例中,所述IO数据块大小的基准值以及对应的数据块时延的基准值可以由存储系统的管理员根据经验预先设置。例如,根据经验,4K的数据块在传输时,时延最小,理想状态下可以达到50ms,则所述IO数据块大小的基准值可以设置为4k,对应的数据块时延的基准值可以设置为50ms。
S24、根据所述上一个统计周期内的所述IO的平均数据块大小、平均数据块时延、数据块大小的基准值、对应的数据块时延的基准值,计算所述上一个统计周期内的IO负载强度。
举例而言,假设上一个统计周期内的所述IO的平均数据块大小为X、平均数据块时延为Y、数据块大小的基准值为M、对应的数据块时延的基准值为N,则所述上一个统计周期内的IO负载强度的计算公式为:
Figure PCTCN2018100168-appb-000003
S25、根据所述上一个统计周期内的IO负载强度,利用预先训练好的负载分类模型确定所述上一个统计周期内的IO负载类别。
在本申请较佳实施例中,所述IO负载类别包括:高负载类别、正常负载类别、低负载类别。
优选地,所述负载分类模型包括,但不限于:支持向量机(Support Vector Machine,SVM)模型。将所述上一个统计周期内的IO的平均数据块大小、所述上一个统计周期内的IO的平均数据块时延、所述上一个统计周期内的IO负载强度作为所述负载分类模型的输入,经过所述负载分类模型计算后,输出上一个统计周期内的IO负载类别。
在本申请的优选实施例中,所述负载分类模型的训练过程包括:
1)获取正样本的IO负载数据及负样本的IO负载数据,并将正样本的IO负载数据标注负载类别,以使正样本的IO负载数据携带IO负载类别标签。
例如,分别选取500个高负载类别、正常负载类别、低负载类别对应的IO负载数据,并对每个IO负载数据标注类别,可以以“1”作为高负载的IO数据标签,以“2”作为正常负载的IO数据标签,以“3”作为低负载的IO数据标签。
2)将所述正样本的IO负载数据及所述负样本的IO负载数据随机分成第一预设比例的训练集和第二预设比例的验证集,利用所述训练集训练所述负载分类模型,并利用所述验证集验证训练后的所述负载分类模型的准确率。
先将不同负载类别的训练集中的训练样本分发到不同的文件夹里。例如,将高负载类别的训练样本分发到第一文件夹里、正常负载类别的训练样本分发到第二文件夹里、低负载类别的训练样本分发到第三文件夹里。然后从不同的文件夹里分别提取第一预设比例(例如,70%)的训练样本作为总的训练样本进行负载分类模型的训练,从不同的文件夹里分别取剩余第二预设比例(例如,30%)的训练样本作为总的测试样本对训练完成的所述负载分类模型进行准确性验证。
3)若所述准确率大于或者等于预设准确率时,则结束训练,以训练后的所述负载分类模型作为分类器识别所述当前统计周期内的IO负载类别;若所述准确率小于预设准确率时,则增加正样本数量及负样本数量以重新训练所述负载分类模型直至所述准确率大于或者等于预设准确率。
S26、根据上一个统计周期内的IO负载类别计算当前统计周期对应的流控阈值。
具体的,所述根据上一个统计周期内的IO负载类别计算当前统计周期对应的流控阈值可以包括:
1)当所述上一个统计周期内的IO负载类别为高负载类别时,将所述上一个统计周期对应的流控阈值降低第一预设幅度,得到当前统计周期对应的流控阈值。
在上一个统计周期内的IO负载为高负载时,按照所述第一预设幅度降低流控阈值,以在当前统计周期内以低流控阈值对被确定为热点数据的数据块执行写入缓存的操作,通过降低数据迁移的速度来保证用户应用的高效访问。
在本申请的优选实施例中,所述第一预设幅度可以是上一个统计周期对应的流控阈值的1/2。即当前统计周期对应的流控阈值为上一个统计周期对应的流控阈值的1/2,下一个统计周期对应的流控阈值为当前统计周期对应的流控阈值的1/2。
2)当所述上一个统计周期内的IO负载类别为低负载类别时,将所述上一个统计周期对应的流控阈值提高第二预设幅度,得到下一个统计周期对应的流控阈值。
在上一个统计周期内的IO负载为低负载时,按照所述第二预设幅度提高流控阈值,以在当前统计周期内以高流控阈值对被确定为热点数据的数据块执行写入缓存的操作,在保证用户应用的访问质量的基础上,提高数据迁移的速度。
在本申请的优选实施例中,所述第二预设幅度可以是上一个统计周期对应的流控阈值的1.5倍。即当前统计周期对应的流控阈值为上一个统计周期对应的流控阈值的1.5倍,下一个统计周期对应的流控阈值为当前统计周期对应的流控阈值的1.5倍。
3)当所述上一个统计周期内的IO负载类别为正常负载类别时,将所述上一个统计周期对应的流控阈值作为当前统计周期对应的流控阈值。
综上所述,本申请所述的热点数据迁移流控方法,每隔预设时间段记录用户访问的数据集,将所述数据集划分为多个数据块,在确定有数据块为热点数据且未被写入缓存时,通过获取迁移周期内的不同统计周期对应的流控阈值,基于所述每一个统计周期对应的流控阈值,将所述被确定为热点数据的数据块写入到缓存中,在提高用户数据迁移至缓存的效率、降低数据丢失风险的同时,能够避免对正常输入输出业务性能造成明显冲击,具有很好的流控效果。
其次,当前统计周期对应的流控阈值是根据上一个统计周期内用户应用的IO负载自动进行动态调整,不需管理者手动调节,减少了管理者的工作量,避免了因管理者的主观因素导致的调整不精准的问题。
下面结合第3至4图,分别对实现上述热点数据迁移流控方法的电子设备的功能模块及硬件结构进行介绍。
实施例三
图3为本申请热点数据迁移流控装置较佳实施例中的功能模块图。
在一些实施例中,所述热点数据迁移流控装置30运行于电子设备中。所述热点数据迁移流控装置30可以包括多个由程序代码段所组成的功能模块。所述热点数据迁移流控装置30中的各个程序段的程序代码可以存储于存储器中,并由至少一个处理器所执行,以执行(详见图1-2及其相关描述)热点数据迁移流控方法。
本实施例中,所述热点数据迁移流控装置30根据其所执行的功能,可以被划分为多个功能模块。所述功能模块可以包括:记录模块301、划分模块302、判断模块303、获取模块304、迁移模块305、计算模块306、确定模块307及训练模块308。本申请所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一系列计算机可读指令段,其存储在存储器中。在一些实施例中,关于各模块的功能将在后续的实施例中详述。
记录模块301,用于每隔预设时间段记录用户访问的数据集。
预设时间段为预先设置的时间周期,例如,一周或者10天等。本申请对预设时间段不做具体限定,可依据电子系统的硬件或者数据访问情况自行设置。
电子设备侦测到用户访问数据的指令时,响应用户访问数据的指令,将用户访问的数据反馈给用户。记录模块301记录在所述预设时间段内所有用户访问的数据集。
划分模块302,用于将所述数据集划分为多个数据块。
将所记录的用户访问的数据集划分为多个数据块。
在本申请较佳实施例中,划分模块302将所述数据集划分为多个数据块可以包括以下一种或多种的组合:
1)将所述数据集平均划分为预设数量的数据块。
所述预设数量为预先设置的数据块的个数,例如,将所述数据集平均划分为10个数据块,每个数据块的大小相同。
2)将所述数据集随机划分为预设数量的数据块。
例如,将所述数据集随机划分为10个数据块,每个数据块的大小均不相同。
3)将所述数据集按照预设大小划分为多个数据块。
所述预设大小为预先设置的数据块的大小,例如,将所述数据集划分为多个数据块,每个数据块的大小为1Mb。所述预设大小还可以为10Mb或者更大。
判断模块303,用于判断所述多个数据块中是否有数据块为热点数据。
在本申请较佳实施例中,判断模块303判断所述多个数据块中是否有数据块为热点数据可以通过计算数据块被访问的概率值,基于所述概率值预测数据块是否为热点数据。
所述判断模块303判断所述多个数据块中是否有数据块为热点数据具体可以包括:
1)统计每个数据块在所述预设时间段内被访问的次数;
2)基于每个数据块在所述预设时间段内被访问的次数,计算每个数据块在所述预设时间段内被访问的概率值;
3)判断每个数据块被访问的概率值是否大于预设概率值;
4)当判断数据块被访问的概率值大于所述预设概率值时,确定大于所述预设概率值的被访问的概率值对应的数据块为热点数据;当判断数据块被访问的概率值小于或者等于所述预设概率值时,确定小于或者等于所述预设概率值的被访问的概率值对应的数据块为非热点数据。
举例说明,假如预设时间段为一周,将这一周内用户访问的数据集划分为20个数据块,包括数据块1、数据块2、数据块3、数据块4、数据块5、数据块6、数据块7、数据块8、数据块9、数据块10、数据块11、数据块12、数据块13、数据块14、数据块15、数据块16、数据块17、数据块18、数据块19以及数据块20。其中,所述数据块1在一周内被访问了10次、数据块2在一周内被访问了5次、数据块3在一周内被访问了8次、数据块4在一周内被访问了20次、数据块5在一周内被访问了50次、数据块6在一周内被访问了3次、数据块7在一周内被访问了20次、数据块8在一周内被访问了40次、数据块9在一周内被访问了1次、数据块10在一周内被访问了5次、数据块11在一周内被访问了9次、数据块12在一周内被访问了11次、数据块13在一周内被访问了10次、数据块14在一周内被访问了12次、数据块15在一周内被访问了20次、数据块16在一周内被访问了30次、数据块17在一周内被访问了14次、数据块18在一周内被访问了0次、数据块19在一周内被访问了2次以及数据块20在一周内被访问了50次。计算每个 数据块被访问的概率值的公式为:
Figure PCTCN2018100168-appb-000004
其中,X i表示第i个数据块在一周内被访问的次数,P i为第i个数据块在一周内被访问的概率。由此可以计算出所述数据块1被访问的概率值如下所示:
Figure PCTCN2018100168-appb-000005
类似的,可以计算出所述数据块2被访问的概率值P 2=1.56%,数据块3被访问的概率值P 3=2.5%等,其他数据块被访问的概率值不在赘述。
本申请较佳实施例中,所述预设阈值可以是,例如20%,因此被访问的概率值大于20%的数据块里面包含的数据可以被视为热点数据。
判断模块303,还用于当确定有数据块为热点数据时,判断被确定为热点数据的数据块是否写入缓存中。
当在缓存中成功命中到被确定为热点数据的数据块时,说明被确定为热点数据的数据块已经写入缓存中;当在缓存中没有命中到被确定为热点数据的数据块时,说明被确定为热点数据的数据块没有写入缓存中。
获取模块304,用于当所述判断模块303判断被确定为热点数据的数据块没有写入缓存中时,获取迁移周期内的当前统计周期对应的流控阈值。
将被确定为热点数据且未写入缓存中的数据块从开始写入缓存到完成写入的整个过程称之为一个迁移周期。一个迁移周期可以划分为多个统计周期,一个统计周期可以为一个预设时间段,例如,一个统计周期设置为1秒钟。
所述流控是指流量控制。流控的实现方法包括以下两种:一种是通过路由器、交换机的QoS模块实现基于源地址、目的地址、源端口、目的端口以及协议类型的流量控制;另一种是通过专业的流控设备实现基于应用层的流量控制。
本较佳实施例中,所述获取模块304获取迁移周期内的当前统计周期对应的流控阈值具体可以包括:
1)判断当前统计周期是否为第一个统计周期。
可以通过判断当前时间是否为第1秒来判断当前迁移周期是否为第一个统计周期。
2)当确定所述当前统计周期为第一个统计周期时,将预设流控阈值确定为所述当前统计周期对应的流控阈值;
本申请的迁移周期内的第一个统计周期对应的流控阈值为预先设置的流控阈值,可以由系统的管理者根据经验预先设置。即,采用一个预设的流控阈值作为迁移周期内的第一个统计周期的流控阈值。
3)当确定所述当前统计周期不为第一个统计周期时,获取上一个统计周期内用户应用的IO负载,根据所述上一个统计周期内用户应用的IO负载,确定所述当前统计周期对应的流控阈值。
迁移周期内的除第一个统计周期外的剩余每一个统计周期可以对应一个流控阈值。剩余每一个统计周期对应的流控阈值是动态调整的,当前统计周期对应的流控阈值可以根据上一个统计周期内的IO负载计算得到,下一个 统计周期对应的流控阈值可以根据当前统计周期内的IO负载计算得到。具体而言,根据第一个统计周期内的IO负载计算第二个统计周期对应的流控阈值;根据第二个统计周期内的IO负载计算第三个统计周期对应的流控阈值;以此类推。
迁移模块305,用于基于所述当前统计周期对应的流控阈值,将所述被确定为热点数据的数据块写入到缓存中。
根据所述当前统计周期对应的流控阈值将被确定为热点数据的数据块写入到缓存中,被确定为热点数据的数据块以当前统计周期控制的流量进行数据的写入,使得写入缓存中的热点数据不至于过快或过慢,被写入缓存中的热点数据可供用户访问。
所述获取模块304,还用于获取上一个统计周期内用户应用的每一个IO的数据块大小,计算所述上一个统计周期内的IO的平均数据块大小。
所述上一个统计周期内的IO的平均数据块大小可以采用算术平均值算法、几何平均数算法,或者均方根平均数算法来计算。
举例而言,假设检测到上一个统计周期内,用户应用共有十次IO,十次IO的数据块大小分别为:2M,1M,3M,0.5M,10M,4M,0.1M,1.2M,5M以及8M。利用所述算术平均值算法计算所述上一个统计周期内的IO的平均数据块大小为:S=(2M+1M+3M+0.5M+10M+4M+0.1M+1.2M+5M+8M)/10=3.48M。
所述获取模块304,还用于获取所述上一个统计周期内的每个数据块的传输时延,计算所述上一个统计周期内的IO的平均数据块时延。
所述传输时延(简称为时延)是指节点在发送数据时使数据块从节点进入到传输媒体所需的时间,即一个发送站点从开始发送数据帧到数据帧发送完毕所需要的全部时间,或者一个接收站点从开始接收数据帧到数据帧接收完毕所需要的全部时间。
在本申请较佳实施例中,所述数据块的传输时延可以从每个存储节点中安装的一个负载量测工具或者性能监控工具中获取得到。
如上所述,所述上一个统计周期内的IO的平均数据块时延也可以采用算术平均值算法、几何平均数算法,或者均方根平均数算法来计算。假设,假设检测到上一个统计周期内,十次IO的传输时延分别为:1s、0.8s、1.5s、0.4s、5s、2s、0.02s、0.6s、3s及4.5s,则所述上一个统计周期内的IO平均数据块时延采用算术平均值算法来计算时,其结果为:
(1s+0.8s+1.5s+0.4s+5s+2s+0.1s+0.6s+3s+4.4s)=1.88s。
应当理解的是,若上一个统计周期内的IO的平均数据块大小采用算术平均值算法来计算,则上一个统计周期内的IO的平均数据块时延也采用算术平均值算法来计算;若上一个统计周期内的IO的平均数据块大小采用几何平均数算法来计算,则上一个统计周期内的IO的平均数据块时延也采用几何平均数算法来计算;或者若上一个统计周期内的IO的平均数据块大小采用均方根平均数算法来计算,则上一个统计周期内的IO的平均数据块时延也采用均方根平均数算法来计算。
所述获取模块304,还用于获取预先设置的IO的数据块大小的基准值及对应的数据块时延的基准值。
在本申请较佳实施例中,所述IO数据块大小的基准值以及对应的数据块时延的基准值可以由存储系统的管理员根据经验预先设置。例如,根据经验,4K的数据块在传输时,时延最小,理想状态下可以达到50ms,则所述IO数据块大小的基准值可以设置为4k,对应的数据块时延的基准值可以设置为50ms。
计算模块306,用于根据所述上一个统计周期内的所述IO的平均数据块大小、平均数据块时延、数据块大小的基准值、对应的数据块时延的基准值,计算所述上一个统计周期内的IO负载强度。
举例而言,假设上一个统计周期内的所述IO的平均数据块大小为X、平均数据块时延为Y、数据块大小的基准值为M、对应的数据块时延的基准值为N,则所述上一个统计周期内的IO负载强度的计算公式为:
Figure PCTCN2018100168-appb-000006
确定模块307,用于根据所述上一个统计周期内的IO负载强度,利用预先训练好的负载分类模型确定所述上一个统计周期内的IO负载类别。
在本申请较佳实施例中,所述IO负载类别包括:高负载类别、正常负载类别、低负载类别。
优选地,所述负载分类模型包括,但不限于:支持向量机(Support Vector Machine,SVM)模型。将所述上一个统计周期内的IO的平均数据块大小、所述上一个统计周期内的IO的平均数据块时延、所述上一个统计周期内的IO负载强度作为所述负载分类模型的输入,经过所述负载分类模型计算后,输出上一个统计周期内的IO负载类别。
训练模块308,用于训练负载分类模型。
所述训练模块308训练负载分类模型的过程包括:
1)获取正样本的IO负载数据及负样本的IO负载数据,并将正样本的IO负载数据标注负载类别,以使正样本的IO负载数据携带IO负载类别标签。
例如,分别选取500个高负载类别、正常负载类别、低负载类别对应的IO负载数据,并对每个IO负载数据标注类别,可以以“1”作为高负载的IO数据标签,以“2”作为正常负载的IO数据标签,以“3”作为低负载的IO数据标签。
2)将所述正样本的IO负载数据及所述负样本的IO负载数据随机分成第一预设比例的训练集和第二预设比例的验证集,利用所述训练集训练所述负载分类模型,并利用所述验证集验证训练后的所述负载分类模型的准确率。
先将不同负载类别的训练集中的训练样本分发到不同的文件夹里。例如,将高负载类别的训练样本分发到第一文件夹里、正常负载类别的训练样本分发到第二文件夹里、低负载类别的训练样本分发到第三文件夹里。然后从不同的文件夹里分别提取第一预设比例(例如,70%)的训练样本作为总的训练样本进行负载分类模型的训练,从不同的文件夹里分别取剩余第二预设比例(例如,30%)的训练样本作为总的测试样本对训练完成的所述负载分类 模型进行准确性验证。
3)若所述准确率大于或者等于预设准确率时,则结束训练,以训练后的所述负载分类模型作为分类器识别所述当前统计周期内的IO负载类别;若所述准确率小于预设准确率时,则增加正样本数量及负样本数量以重新训练所述负载分类模型直至所述准确率大于或者等于预设准确率。
所述计算模块306,还用于根据上一个统计周期内的IO负载类别计算当前统计周期对应的流控阈值。
具体的,所述计算模块306根据上一个统计周期内的IO负载类别计算当前统计周期对应的流控阈值可以包括:
1)当所述上一个统计周期内的IO负载类别为高负载类别时,将所述上一个统计周期对应的流控阈值降低第一预设幅度,得到当前统计周期对应的流控阈值。
在上一个统计周期内的IO负载为高负载时,按照所述第一预设幅度降低流控阈值,以在当前统计周期内以低流控阈值对被确定为热点数据的数据块执行写入缓存的操作,通过降低数据迁移的速度来保证用户应用的高效访问。
在本申请的优选实施例中,所述第一预设幅度可以是上一个统计周期对应的流控阈值的1/2。即当前统计周期对应的流控阈值为上一个统计周期对应的流控阈值的1/2,下一个统计周期对应的流控阈值为当前统计周期对应的流控阈值的1/2。
2)当所述上一个统计周期内的IO负载类别为低负载类别时,将所述上一个统计周期对应的流控阈值提高第二预设幅度,得到下一个统计周期对应的流控阈值。
在上一个统计周期内的IO负载为低负载时,按照所述第二预设幅度提高流控阈值,以在当前统计周期内以高流控阈值对被确定为热点数据的数据块执行写入缓存的操作,在保证用户应用的访问质量的基础上,提高数据迁移的速度。
在本申请的优选实施例中,所述第二预设幅度可以是上一个统计周期对应的流控阈值的1.5倍。即当前统计周期对应的流控阈值为上一个统计周期对应的流控阈值的1.5倍,下一个统计周期对应的流控阈值为当前统计周期对应的流控阈值的1.5倍。
3)当所述上一个统计周期内的IO负载类别为正常负载类别时,将所述上一个统计周期对应的流控阈值作为当前统计周期对应的流控阈值。
综上所述,本申请所述的热点数据迁移流控装置,每隔预设时间段记录用户访问的数据集,将所述数据集划分为多个数据块,在确定有数据块为热点数据且未被写入缓存时,通过获取迁移周期内的不同统计周期对应的流控阈值,基于所述每一个统计周期对应的流控阈值,将所述被确定为热点数据的数据块写入到缓存中,在提高用户数据迁移至缓存的效率、降低数据丢失风险的同时,能够避免对正常输入输出业务性能造成明显冲击,具有很好的流控效果。
其次,当前统计周期对应的流控阈值是根据上一个统计周期内用户应用的IO负载自动进行动态调整,不需管理者手动调节,减少了管理者的工作量,避免了因管理者的主观因素导致的调整不精准的问题。
上述以软件功能模块的形式实现的集成的单元,可以存储在一个非易失性可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,双屏设备,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的部分。
实施例四
图4为本申请实施例四提供的电子设备的示意图。
所述电子设备4包括:存储器41、至少一个处理器42、存储在所述存储器41中并可在所述至少一个处理器42上运行的计算机可读指令43及至少一条通讯总线44。
所述至少一个处理器42执行所述计算机可读指令43时实现上述方法实施例中的步骤。
示例性的,所述计算机可读指令43可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器41中,并由所述至少一个处理器42执行,以完成本申请上述方法实施例中的步骤。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,该指令段用于描述所述计算机可读指令43在所述电子设备4中的执行过程。
所述电子设备4可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。本领域技术人员可以理解,所述示意图4仅仅是电子设备4的示例,并不构成对电子设备4的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述电子设备4还可以包括输入输出设备、网络接入设备、总线等。
所述至少一个处理器42可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。该处理器42可以是微处理器或者该处理器42也可以是任何常规的处理器等,所述处理器42是所述电子设备4的控制中心,利用各种接口和线路连接整个电子设备4的各个部分。
所述存储器41可用于存储所述计算机可读指令43和/或模块/单元,所述处理器42通过运行或执行存储在所述存储器41内的计算机可读指令和/或模块/单元,以及调用存储在存储器41内的数据,实现所述电子设备4的各种功能。所述存储器41可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据电子设备4的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器41可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能 存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
所述电子设备4集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个非易失性可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性可读存储介质中,该计算机可读指令在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机可读指令包括计算机可读指令代码,所述计算机可读指令代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述非易失性可读介质可以包括:能够携带所述计算机程可读指令代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述非易失性可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,非易失性可读介质不包括电载波信号和电信信号。
在本申请各个实施例中的各功能单元可以集成在相同处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在相同单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神范围。

Claims (20)

  1. 一种热点数据迁移流控方法,其特征在于,所述方法包括:
    每隔预设时间段记录用户访问的数据集;
    将所述数据集划分为多个数据块;
    判断所述多个数据块中是否有数据块为热点数据;
    当确定有数据块为热点数据时,判断被确定为热点数据的数据块是否写入缓存中;
    当判断被确定为热点数据的数据块没有写入缓存中时,获取迁移周期内的当前统计周期对应的流控阈值;
    基于所述当前统计周期对应的流控阈值,将所述被确定为热点数据的数据块写入到缓存中。
  2. 如权利要求1所述的方法,其特征在于,将所述数据集划分为多个数据块包括:
    将所述数据集平均划分为预设数量的数据块;或
    将所述数据集随机划分为预设数量的数据块;或
    将所述数据集按照预设大小划分为多个数据块。
  3. 如权利要求1所述的方法,其特征在于,所述判断所述多个数据块中是否有数据块为热点数据是通过计算数据块被访问的概率值,基于所述概率值预测数据块是否为热点数据,包括:
    统计每个数据块在所述预设时间段内被访问的次数;
    基于每个数据块在所述预设时间段内被访问的次数,计算每个数据块在所述预设时间段内被访问的概率值;
    判断每个数据块被访问的概率值是否大于预设概率值;
    当判断数据块被访问的概率值大于所述预设概率值时,确定大于所述预设概率值的被访问的概率值对应的数据块为热点数据;
    当判断数据块被访问的概率值小于或者等于所述预设概率值时,确定小于或者等于所述预设概率值的被访问的概率值对应的数据块为非热点数据。
  4. 如权利要求1所述的方法,其特征在于,所述获取迁移周期内的当前统计周期对应的流控阈值包括:
    判断当前统计周期是否为第一个统计周期;
    当确定所述当前统计周期为第一个统计周期时,将预设流控阈值确定为所述当前统计周期对应的流控阈值;
    当确定所述当前统计周期不为第一个统计周期时,获取上一个统计周期内用户应用的IO负载,根据所述上一个统计周期内用户应用的IO负载,确定所述当前统计周期对应的流控阈值。
  5. 如权利要求4所述的方法,其特征在于,根据所述上一个统计周期内用户应用的IO负载,确定所述当前统计周期对应的流控阈值包括:
    获取上一个统计周期内用户应用的每一个IO的数据块大小,计算所述上一个统计周期内的IO的平均数据块大小;
    获取所述上一个统计周期内的每个数据块的传输时延,计算所述上一个统 计周期内的IO的平均数据块时延;
    获取预先设置的IO的数据块大小的基准值及对应的数据块时延的基准值;
    根据所述上一个统计周期内的所述IO的平均数据块大小、平均数据块时延、数据块大小的基准值、对应的数据块时延的基准值,计算所述上一个统计周期内的IO负载强度;
    根据所述上一个统计周期内的IO负载强度,利用预先训练好的负载分类模型确定所述上一个统计周期内的IO负载类别;
    根据上一个统计周期内的IO负载类别计算当前统计周期对应的流控阈值。
  6. 如权利要求5所述的方法,其特征在于,所述根据所述上一个统计周期内的所述IO的平均数据块大小、平均数据块时延、数据块大小的基准值、对应的数据块时延的基准值,计算所述上一个统计周期内的IO负载强度的计算公式为:
    Figure PCTCN2018100168-appb-100001
    其中,X为上述上一个统计周期内的所述IO的平均数据块大小,Y为所述平均数据块时延,M为所述数据块大小的基准值,N为所述对应的数据块时延的基准值。
  7. 如权利要求5或6所述的方法,其特征在于,所述根据上一个统计周期内的IO负载类别计算当前统计周期对应的流控阈值包括:
    当所述上一个统计周期内的IO负载类别为高负载类别时,将所述上一个统计周期对应的流控阈值降低第一预设幅度,得到当前统计周期对应的流控阈值;
    当所述上一个统计周期内的IO负载类别为低负载类别时,将所述上一个统计周期对应的流控阈值提高第二预设幅度,得到当前周期对应的流控阈值;
    当所述上一个统计周期内的IO负载类别为正常负载类别时,将所述上一个统计周期对应的流控阈值作为当前统计周期对应的流控阈值。
  8. 一种热点数据迁移流控装置,其特征在于,所述装置包括:
    记录模块,用于每隔预设时间段记录用户访问的数据集;
    划分模块,用于将所述数据集划分为多个数据块;
    判断模块,用于判断所述多个数据块中是否有数据块为热点数据;
    判断模块,还用于当确定有数据块为热点数据时,判断被确定为热点数据的数据块是否写入缓存中;
    获取模块,用于当所述判断模块判断被确定为热点数据的数据块没有写入缓存中时,获取迁移周期内的当前统计周期对应的流控阈值;
    迁移模块,用于基于所述当前统计周期对应的流控阈值,将所述被确定为热点数据的数据块写入到缓存中。
  9. 一种电子设备,其特征在于,所述电子设备包括处理器和存储器,所述处理器用于执行所述存储器中存储的计算机可读指令时实现以下步骤:
    每隔预设时间段记录用户访问的数据集;
    将所述数据集划分为多个数据块;
    判断所述多个数据块中是否有数据块为热点数据;
    当确定有数据块为热点数据时,判断被确定为热点数据的数据块是否写入缓存中;
    当判断被确定为热点数据的数据块没有写入缓存中时,获取迁移周期内的当前统计周期对应的流控阈值;
    基于所述当前统计周期对应的流控阈值,将所述被确定为热点数据的数据块写入到缓存中。
  10. 如权利要求9所述的电子设备,其特征在于,将所述数据集划分为多个数据块包括:
    将所述数据集平均划分为预设数量的数据块;或
    将所述数据集随机划分为预设数量的数据块;或
    将所述数据集按照预设大小划分为多个数据块。
  11. 如权利要求9所述的电子设备,其特征在于,所述判断所述多个数据块中是否有数据块为热点数据是通过计算数据块被访问的概率值,基于所述概率值预测数据块是否为热点数据,包括:
    统计每个数据块在所述预设时间段内被访问的次数;
    基于每个数据块在所述预设时间段内被访问的次数,计算每个数据块在所述预设时间段内被访问的概率值;
    判断每个数据块被访问的概率值是否大于预设概率值;
    当判断数据块被访问的概率值大于所述预设概率值时,确定大于所述预设概率值的被访问的概率值对应的数据块为热点数据;
    当判断数据块被访问的概率值小于或者等于所述预设概率值时,确定小于或者等于所述预设概率值的被访问的概率值对应的数据块为非热点数据。
  12. 如权利要求9所述的电子设备,其特征在于,所述获取迁移周期内的当前统计周期对应的流控阈值包括:
    判断当前统计周期是否为第一个统计周期;
    当确定所述当前统计周期为第一个统计周期时,将预设流控阈值确定为所述当前统计周期对应的流控阈值;
    当确定所述当前统计周期不为第一个统计周期时,获取上一个统计周期内用户应用的IO负载,根据所述上一个统计周期内用户应用的IO负载,确定所述当前统计周期对应的流控阈值。
  13. 如权利要求12所述的电子设备,其特征在于,根据所述上一个统计周期内用户应用的IO负载,确定所述当前统计周期对应的流控阈值包括:
    获取上一个统计周期内用户应用的每一个IO的数据块大小,计算所述上一个统计周期内的IO的平均数据块大小;
    获取所述上一个统计周期内的每个数据块的传输时延,计算所述上一个统计周期内的IO的平均数据块时延;
    获取预先设置的IO的数据块大小的基准值及对应的数据块时延的基准值;
    根据所述上一个统计周期内的所述IO的平均数据块大小、平均数据块时延、数据块大小的基准值、对应的数据块时延的基准值,计算所述上一个统计周期内的IO负载强度;
    根据所述上一个统计周期内的IO负载强度,利用预先训练好的负载分类模型确定所述上一个统计周期内的IO负载类别;
    根据上一个统计周期内的IO负载类别计算当前统计周期对应的流控阈值。
  14. 如权利要求13所述的电子设备,其特征在于,所述根据上一个统计周期内的IO负载类别计算当前统计周期对应的流控阈值包括:
    当所述上一个统计周期内的IO负载类别为高负载类别时,将所述上一个统计周期对应的流控阈值降低第一预设幅度,得到当前统计周期对应的流控阈值;
    当所述上一个统计周期内的IO负载类别为低负载类别时,将所述上一个统计周期对应的流控阈值提高第二预设幅度,得到当前周期对应的流控阈值;
    当所述上一个统计周期内的IO负载类别为正常负载类别时,将所述上一个统计周期对应的流控阈值作为当前统计周期对应的流控阈值。
  15. 一种非易失性可读存储介质,所述非易失性可读存储介质上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现以下步骤:
    每隔预设时间段记录用户访问的数据集;
    将所述数据集划分为多个数据块;
    判断所述多个数据块中是否有数据块为热点数据;
    当确定有数据块为热点数据时,判断被确定为热点数据的数据块是否写入缓存中;
    当判断被确定为热点数据的数据块没有写入缓存中时,获取迁移周期内的当前统计周期对应的流控阈值;
    基于所述当前统计周期对应的流控阈值,将所述被确定为热点数据的数据块写入到缓存中。
  16. 如权利要求15所述的存储介质,其特征在于,将所述数据集划分为多个数据块包括:
    将所述数据集平均划分为预设数量的数据块;或
    将所述数据集随机划分为预设数量的数据块;或
    将所述数据集按照预设大小划分为多个数据块。
  17. 如权利要求15所述的存储介质,其特征在于,所述判断所述多个数据块中是否有数据块为热点数据是通过计算数据块被访问的概率值,基于所述概率值预测数据块是否为热点数据,包括:
    统计每个数据块在所述预设时间段内被访问的次数;
    基于每个数据块在所述预设时间段内被访问的次数,计算每个数据块在所述预设时间段内被访问的概率值;
    判断每个数据块被访问的概率值是否大于预设概率值;
    当判断数据块被访问的概率值大于所述预设概率值时,确定大于所述预设概率值的被访问的概率值对应的数据块为热点数据;
    当判断数据块被访问的概率值小于或者等于所述预设概率值时,确定小于或者等于所述预设概率值的被访问的概率值对应的数据块为非热点数据。
  18. 如权利要求15所述的存储介质,其特征在于,所述获取迁移周期内的当前统计周期对应的流控阈值包括:
    判断当前统计周期是否为第一个统计周期;
    当确定所述当前统计周期为第一个统计周期时,将预设流控阈值确定为所 述当前统计周期对应的流控阈值;
    当确定所述当前统计周期不为第一个统计周期时,获取上一个统计周期内用户应用的IO负载,根据所述上一个统计周期内用户应用的IO负载,确定所述当前统计周期对应的流控阈值。
  19. 如权利要求18所述的存储介质,其特征在于,根据所述上一个统计周期内用户应用的IO负载,确定所述当前统计周期对应的流控阈值包括:
    获取上一个统计周期内用户应用的每一个IO的数据块大小,计算所述上一个统计周期内的IO的平均数据块大小;
    获取所述上一个统计周期内的每个数据块的传输时延,计算所述上一个统计周期内的IO的平均数据块时延;
    获取预先设置的IO的数据块大小的基准值及对应的数据块时延的基准值;
    根据所述上一个统计周期内的所述IO的平均数据块大小、平均数据块时延、数据块大小的基准值、对应的数据块时延的基准值,计算所述上一个统计周期内的IO负载强度;
    根据所述上一个统计周期内的IO负载强度,利用预先训练好的负载分类模型确定所述上一个统计周期内的IO负载类别;
    根据上一个统计周期内的IO负载类别计算当前统计周期对应的流控阈值。
  20. 如权利要求19所述的存储介质,其特征在于,所述根据上一个统计周期内的IO负载类别计算当前统计周期对应的流控阈值包括:
    当所述上一个统计周期内的IO负载类别为高负载类别时,将所述上一个统计周期对应的流控阈值降低第一预设幅度,得到当前统计周期对应的流控阈值;
    当所述上一个统计周期内的IO负载类别为低负载类别时,将所述上一个统计周期对应的流控阈值提高第二预设幅度,得到当前周期对应的流控阈值;
    当所述上一个统计周期内的IO负载类别为正常负载类别时,将所述上一个统计周期对应的流控阈值作为当前统计周期对应的流控阈值。
PCT/CN2018/100168 2018-06-04 2018-08-13 热点数据迁移流控方法、装置、电子设备及存储介质 WO2019232925A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810565747.X 2018-06-04
CN201810565747.XA CN108762684B (zh) 2018-06-04 2018-06-04 热点数据迁移流控方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2019232925A1 true WO2019232925A1 (zh) 2019-12-12

Family

ID=64002688

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/100168 WO2019232925A1 (zh) 2018-06-04 2018-08-13 热点数据迁移流控方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN108762684B (zh)
WO (1) WO2019232925A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110120973A (zh) * 2019-04-28 2019-08-13 华为技术有限公司 一种请求控制方法、相关设备及计算机存储介质
CN113076339A (zh) * 2021-03-18 2021-07-06 北京沃东天骏信息技术有限公司 一种数据缓存方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092526A (zh) * 2011-10-31 2013-05-08 国际商业机器公司 在存储设备间进行数据迁移的方法和装置
US20160004473A1 (en) * 2014-07-07 2016-01-07 International Business Machines Corporation Migration decision window selection based on hotspot characteristics
CN107222426A (zh) * 2016-03-21 2017-09-29 阿里巴巴集团控股有限公司 控流的方法、装置及系统
CN107341240A (zh) * 2017-07-05 2017-11-10 中国人民大学 一种应对倾斜数据流在线连接的处理方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9847941B2 (en) * 2015-06-04 2017-12-19 Quest Software Inc. Selectively suppress or throttle migration of data across WAN connections
CN107454004A (zh) * 2016-05-30 2017-12-08 阿里巴巴集团控股有限公司 一种流量控制方法和装置
CN106775461B (zh) * 2016-11-30 2020-01-21 华为技术有限公司 热点数据确定方法、设备及装置
CN106682705B (zh) * 2017-02-04 2019-12-24 武汉阿帕科技有限公司 负载特性的识别方法及装置
CN107463514B (zh) * 2017-08-16 2021-06-29 郑州云海信息技术有限公司 一种数据存储方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092526A (zh) * 2011-10-31 2013-05-08 国际商业机器公司 在存储设备间进行数据迁移的方法和装置
US20160004473A1 (en) * 2014-07-07 2016-01-07 International Business Machines Corporation Migration decision window selection based on hotspot characteristics
CN107222426A (zh) * 2016-03-21 2017-09-29 阿里巴巴集团控股有限公司 控流的方法、装置及系统
CN107341240A (zh) * 2017-07-05 2017-11-10 中国人民大学 一种应对倾斜数据流在线连接的处理方法

Also Published As

Publication number Publication date
CN108762684B (zh) 2021-03-05
CN108762684A (zh) 2018-11-06

Similar Documents

Publication Publication Date Title
WO2019232927A1 (zh) 分布式数据删除流控方法、装置、电子设备及存储介质
US11068409B2 (en) Method and system for user-space storage I/O stack with user-space flash translation layer
WO2019232926A1 (zh) 数据一致性校验流控方法、装置、电子设备及存储介质
WO2019232993A1 (zh) 自适应的数据恢复流控方法、装置、电子设备及存储介质
US20190045028A1 (en) Technologies for end-to-end quality of service deadline-aware i/o scheduling
TWI729606B (zh) 用於一邊緣運算網路的負載平衡裝置及方法
WO2021004310A1 (zh) 提升固态硬盘服务质量的方法及固态硬盘
WO2021093365A1 (zh) 一种gpu显存管理控制方法及相关装置
WO2019232925A1 (zh) 热点数据迁移流控方法、装置、电子设备及存储介质
WO2021208682A1 (zh) 网络设备的数据采样方法、装置、设备及介质
WO2015027731A1 (zh) 布隆过滤器生成方法和装置
WO2018166145A1 (zh) 还款数据分批报盘方法和装置
WO2019085754A1 (zh) 应用清理方法、装置、存储介质及电子设备
WO2023020247A1 (zh) 时序指标数据降精度处理方法、装置和计算机设备
WO2019232994A1 (zh) 后台写盘流控方法、装置、电子设备及存储介质
CN104486442B (zh) 分布式存储系统的数据传输方法、装置
WO2021189845A1 (zh) 时间序列异常点的检测方法、装置、设备及可读存储介质
CN111638925A (zh) 一种接口方法表生成方法、函数指针查询方法及装置
WO2024027140A1 (zh) 一种数据处理方法、装置、设备、系统及可读存储介质
WO2023165543A1 (zh) 共享缓存的管理方法、装置及存储介质
TWI777319B (zh) 幹細胞密度確定方法、裝置、電腦裝置及儲存介質
TWI734151B (zh) 參數同步方法、電腦裝置及存儲介質
US10606751B2 (en) Techniques for cache delivery
CN116319762B (zh) 局域网内文件批量复制方法、装置、电子设备及存储介质
CN117424861B (zh) 一种网络资源管理方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18921689

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18921689

Country of ref document: EP

Kind code of ref document: A1