WO2023030227A1 - 一种数据处理方法、装置及系统 - Google Patents

一种数据处理方法、装置及系统 Download PDF

Info

Publication number
WO2023030227A1
WO2023030227A1 PCT/CN2022/115437 CN2022115437W WO2023030227A1 WO 2023030227 A1 WO2023030227 A1 WO 2023030227A1 CN 2022115437 W CN2022115437 W CN 2022115437W WO 2023030227 A1 WO2023030227 A1 WO 2023030227A1
Authority
WO
WIPO (PCT)
Prior art keywords
data block
feature
access frequency
data
read
Prior art date
Application number
PCT/CN2022/115437
Other languages
English (en)
French (fr)
Inventor
张帆
胡刚
程楹楹
张弓
程卓
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22863363.2A priority Critical patent/EP4383057A1/en
Publication of WO2023030227A1 publication Critical patent/WO2023030227A1/zh
Priority to US18/588,775 priority patent/US20240192880A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of communications, and in particular to a data processing method, device and system.
  • Hot data refers to data with high real-time query requirements and high access frequency.
  • This type of data is stored in high-efficiency cloud disks to meet the needs of high-performance access.
  • Cold data refers to data with relatively low query frequency and low access frequency. This type of data uses low-cost cold data storage to meet cost-effective storage requirements. How to distinguish different types of data while ensuring system performance is a technical problem that needs to be solved urgently.
  • Embodiments of the present application provide a data processing method, device, and system, so as to distinguish different types of data and improve system performance.
  • a data processing method includes: the first device acquires the first feature of the first data block, and the first data block is a storage space Any one of the data blocks, the first feature includes a read/write feature associated with the first data block and a read feature of a second data block adjacent to the first data block; the first The device inputs the first feature into an access frequency prediction model, obtains the access frequency of the first data block output by the access frequency prediction model, and the access frequency of the first data block is used to determine the first data block type.
  • the first device in order to determine the type of the first data block, obtains the first feature of the first data block, the first feature not only includes the read/write feature of the first data block itself, but also includes the first The read feature of the second data block adjacent to the data block, that is, reflects the access situation of the first data block through multi-dimensional features.
  • the first device inputs the obtained first feature into a pre-trained access frequency prediction model, so as to determine the access frequency corresponding to the first data block at a future time through the access frequency prediction model, so as to determine the access frequency of the first data block according to the access frequency type.
  • the acquiring the first characteristic of the first data block by the first device includes: the first device receiving the first characteristic of the first data block sent by the second device.
  • the method further includes: the first device sending the access frequency of the first data block to the second device.
  • the type of the first data block is cold data or hot data.
  • the access frequency of the first data block can be used to determine whether the first data block belongs to cold data or hot data, and then determine the storage location of the first data according to the type of the first data, thereby improving storage performance. For example, when the first data block is hot data, it indicates that the first data block is frequently accessed, and in order to increase the reading speed of the first data block, the first data block may be stored in the performance layer. When the first data block is cold data, it indicates that the access frequency of the first data block is low. In order to avoid the impact of the storage of the first data block on the reading speed of other hot data, the first data block is stored in the capacity layer , to reduce the resource occupation of the performance layer.
  • the method further includes: when the feature length of the first feature changes, the first device updates the access frequency prediction model to obtain an updated access frequency prediction model.
  • the access frequency prediction model needs to be updated, so that the access frequency of the first data block can be accessed using the updated access frequency prediction model Make predictions.
  • the method further includes: the first device acquires a second feature of the first data block, and the feature length of the first feature is different from the feature length of the second feature; The first device inputs the second feature into the updated access frequency prediction model, and obtains the access frequency of the first data block output by the updated access frequency prediction model.
  • the features of the first data block that is, the second feature, are reacquired to use the second feature and The updated access frequency prediction model obtains the access frequency of the second data block.
  • the method further includes: when the characteristic length of the first characteristic changes, the first device receives an interrupt signal sent by the second device, where the interrupt signal is used to instruct the first device to suspend the use of the access
  • the frequency prediction model predicts the access frequency of the first data block.
  • a data processing method is provided, and the method further includes: the second device acquires the access frequency of the first data block, and the access frequency of the first data block is based on an access frequency prediction model And the first feature of the first data block is obtained, and the first feature includes the read/write feature related to the first data block and the read/write feature of the second data block adjacent to the first data block.
  • the feature is that the first data block is any data block in the storage space; the second device determines the type of each of the first data blocks in the storage space according to the access frequency of each of the first data blocks.
  • the second device needs to determine the type of each data block in the storage space, it can obtain the future access frequency corresponding to each data block, and determine the type of each data block according to the access frequency of each data block.
  • the second device determines the type of each first data block in the storage space according to the access frequency of each first data block, including: the second device determines the type of each first data block according to each The access frequencies of the first data blocks are sorted to obtain a sorting result; the second device determines the type of each of the first data blocks in the storage space according to the sorting result.
  • the second device may sort the first data blocks in descending order of access frequency, and then determine the first data block according to the sorting result.
  • the type of block For example, the first k first data blocks are determined as hot data, and the remaining first data blocks are cold data.
  • the first feature includes the first data block at each historical time point in the T historical time points Point characteristics, the first data block has a time-space correlation between read/write characteristics at T historical time points, and T is a positive integer.
  • the first feature includes the features of the first data block at multiple historical time points, so that when predicting the access frequency of the first data block, the influence of historical features is considered to improve the accuracy of prediction .
  • the read/write features related to the first data block include at least one of the following: read/write Writing the frequency feature of the first data block, reading the length feature of the first data block, and reading the arrangement feature of the first data block.
  • the read/write characteristics related to the first data block satisfy at least one of the following:
  • the frequency characteristic of reading/writing the first data block includes at least one of the following: the frequency of reading the first data block by one or more access interfaces, the frequency of reading the first data block by one or more access interfaces and the total frequency of reading and writing of the first data block by one or more access interfaces; and/or, the length characteristics of reading the first data block include at least one of the following: maximum read length , the minimum read length or the average read length; and/or, the arrangement feature of reading the first data block includes at least one of the following: the number of times to read the first length, the number of times to read the second length 1.
  • the first length refers to a length of 2n
  • the second length means that the length is not 2n
  • n is a positive integer
  • the reading characteristics of the second data block adjacent to the first data block include at least The reading frequency characteristic corresponding to each of the L second data blocks, where L is a positive integer.
  • the access frequency prediction model includes a first sub-model and a second sub-model, and the first sub-model is used to extract the temporal characteristics of the first feature, and the second sub-model is used to extract the spatial characteristics of the first feature.
  • the access frequency prediction model further includes a third sub-model, and the third sub-model is used to obtain the The characteristic scale of the first characteristic.
  • the access frequency prediction model is generated according to the features to be trained and the labels corresponding to the features to be trained , the label corresponding to the feature to be trained is the access frequency.
  • the method further includes: when the feature length of the first feature changes, the second device acquires a new data set to be trained, and the new data set to be trained includes A plurality of features to be trained and labels corresponding to the plurality of features to be trained, the new data set to be trained is used to train the access frequency prediction model to obtain an updated access frequency prediction model.
  • the feature length of the first feature it means that the feature on which the access frequency prediction is based has changed.
  • the access frequency prediction model needs to be updated, and the second device Obtaining a new set of data to be trained, so as to use the new set of data to be trained to train the access frequency prediction model, and obtain an updated access frequency prediction model, so as to use the updated access frequency prediction model to access the first data. Make predictions.
  • the method further includes: when the characteristic length of the first characteristic changes, the second device may also send an interrupt signal to the first device, where the interrupt signal is used to instruct the first device Stop using the current access frequency prediction model to predict the access frequency of the first data block.
  • the method further includes: the second device sending the new data set to be trained to the third device, so that the third device uses the new data set to be trained
  • the data set trains the access frequency prediction model to obtain the updated access frequency prediction model
  • the second device receives the updated access frequency prediction model sent by the third device.
  • the second device after the second device acquires the new data set to be trained, it can send the new data set to be trained to the third device, so that the third device can use the new data set to be trained to compare the access frequency
  • the prediction model is retrained to obtain a trained access frequency prediction model.
  • the method further includes: the second device sending the updated access frequency prediction model to the first device.
  • the change in the characteristic length of the first feature includes a change in the number T of historical time points and/or the number L of the second data block adjacent to the first data block change, the T is a positive integer, and the L is a positive integer.
  • the obtaining the access frequency of the first data block by the second device includes: the second device obtaining the first feature of the first data block, and sending the first feature to the first The device sends the first feature of the first data block; the second device receives the access frequency of the first data block sent by the first device, and the access frequency of the first data block is determined by the first The device acquires it by using the access frequency prediction model and the first feature.
  • a method for training and generating an access frequency prediction model includes: a third device acquires a data set to be trained, the data set to be trained includes a plurality of features to be trained and the multiple A label corresponding to a feature to be trained, the label is an access frequency, and each feature to be trained in the plurality of features to be trained includes a read/write feature related to the data block to be trained and adjacent to the data block to be trained
  • the read feature of the third data block, the data block to be trained is any data block in the storage space; the third device uses the data set to be trained to train the initial network model to generate an access frequency prediction model .
  • the feature to be trained includes the feature of each historical time point of the data block to be trained at T historical time points, and the read data of the data block to be trained at T historical time points There is a time-space correlation between writing features, and the T is a positive integer.
  • the access frequency prediction model includes a first sub-model and a second sub-model, the first sub-model is used to extract the time characteristics of the features to be trained, and the second sub-model It is used to extract the spatial characteristics of the features to be trained.
  • the access frequency prediction model further includes a third sub-model, and the third sub-model is used to process a time series model.
  • the read/write feature related to the data block to be trained includes at least one of the following: read/write the frequency feature of the data block to be trained, read the data to be trained The length feature of the block, and the arrangement feature of reading the data block to be trained.
  • the read/write characteristics related to the data block to be trained satisfy at least one of the following: the frequency characteristics of reading/writing the data block to be trained include at least one of the following: The reading frequency of all access interfaces to the data block to be trained, the writing frequency of all access interfaces to the data block to be trained, and the total frequency of reading and writing of all access interfaces to the data block to be trained; and/or, the Reading the length characteristics of the data blocks to be trained includes at least one of the following: maximum read length, minimum read length or average read length; and/or, reading the arrangement characteristics of the data blocks to be trained Including at least one of the following: the number of times to read the first length, the number of times to read the second length, the ratio of the number of times to read the first length to the total number of times to read, or the number of times to read the second length to the total The ratio of the number of reads, the first length means that the length is 2n, the second length means that the length is not 2n, and n is a positive
  • the reading characteristics of the third data blocks adjacent to the data block to be trained include at least reading frequency characteristics corresponding to each of the L third data blocks, the L is a positive integer.
  • the acquisition by the third device of the data set to be trained includes: the third device receiving the data set to be trained sent by the second device.
  • the method further includes: the third device sending the access frequency prediction model to the first device.
  • the method further includes: the third device sends the access frequency prediction model to the second device, so that the access frequency prediction model is forwarded to the first device through the second device device.
  • the method further includes: the third device receiving the changed feature length and the new data set to be trained sent by the second device; the third device using the changed The final feature length is used to analyze the new data set to be trained to obtain a plurality of new features to be trained; the third device uses the plurality of new features to be trained and the plurality of new features to be trained The label corresponding to the feature continues to train the access frequency prediction model to obtain an updated access frequency prediction model.
  • a data processing device comprising: a first acquiring unit, configured to acquire a first characteristic of a first data block, the first data block being a data block in a storage space
  • the first feature includes a read/write feature associated with the first data block and a read feature of a second data block adjacent to the first data block;
  • a processing unit is configured to The first feature is input into the access frequency prediction model, and the access frequency of the first data block output by the access frequency prediction model is obtained, and the access frequency of the first data block is used to determine the type of the first data block .
  • the device further includes: a receiving unit configured to receive the first feature of the first data block sent by the second device; the first obtaining unit is specifically configured to obtain the The first feature received by the receiving unit.
  • the device further includes: a sending unit, configured to send the access frequency of the first data block to the second device.
  • the type of the first data block is cold data or hot data.
  • the processing unit is further configured to update the access frequency prediction model when the feature length of the first feature changes, to obtain an updated access frequency prediction model.
  • the first acquiring unit is further configured to acquire a second feature of the first data block, and the length of the first feature is different from the length of the second feature; the processing unit is further configured to input the second feature into the updated access frequency prediction model, and obtain the access frequency of the first data block output by the updated access frequency prediction model.
  • a data processing device configured to obtain the access frequency of the first data block, and the access frequency of the first data block is based on The access frequency prediction model and the first feature of the first data block are acquired, and the first feature includes the read/write feature related to the first data block and the second feature adjacent to the first data block
  • the reading feature of the data block, the first data block is any data block in the storage space
  • the processing unit is configured to determine each of the first data in the storage space according to the access frequency of each of the first data blocks The type of block.
  • the processing unit is specifically configured to perform sorting according to the access frequency of each of the first data blocks to obtain a sorting result; and determine each of the first data blocks in the storage space according to the sorting result A type of data block.
  • the first feature includes the first data block at each historical time point in the T historical time points Point characteristics, the first data block has a time-space correlation between read/write characteristics at T historical time points, and T is a positive integer.
  • the read/write features related to the first data block include at least one of the following: read/write Writing the frequency feature of the first data block, reading the length feature of the first data block, and reading the arrangement feature of the first data block.
  • the read/write characteristics related to the first data block satisfy at least one of the following:
  • the frequency characteristic of reading/writing the first data block includes at least one of the following: the reading frequency of all access interfaces to the first data block, the writing frequency of all access interfaces to the first data block, and the frequency of all access interfaces to the first data block.
  • the total frequency of reading and writing of the first data block; and/or, the length characteristic of reading the first data block includes at least one of the following: maximum read length, minimum read length or average read length; And/or, the arrangement feature of reading the first data block includes at least one of the following: the number of times of reading the first length, the number of times of reading the second length, the number of times of reading the first length.
  • the read characteristics of the second data block adjacent to the first data block include at least L Read frequency characteristics corresponding to each of the second data blocks, the L is a positive integer.
  • the access frequency prediction model includes a first sub-model and a second sub-model, and the first sub-model is used to extract the temporal characteristics of the first feature, and the second sub-model is used to extract the spatial characteristics of the first feature.
  • the access frequency prediction model further includes a third sub-model, and the third sub-model is used to obtain the The characteristic scale of the first characteristic.
  • the access frequency prediction model is generated according to the features to be trained and the labels corresponding to the features to be trained , the label corresponding to the feature to be trained is the access frequency.
  • the device further includes: a third acquisition unit, configured to acquire a new set of data to be trained when the feature length of the first feature changes, and the new data to be trained
  • the set includes a plurality of features to be trained and labels corresponding to the features to be trained, and the new set of data to be trained is used to train the access frequency prediction model to obtain an updated access frequency prediction model.
  • the device further includes: a sending unit, configured to send the new data set to be trained to the third device, so that the third device uses the new data set to be trained The data set trains the access frequency prediction model to obtain the updated access frequency prediction model; a receiving unit configured to receive the updated access frequency prediction model sent by the third device.
  • the device further includes: a sending unit, configured to send the updated access frequency prediction model to the first device.
  • the change in the characteristic length of the first feature includes a change in the number T of historical time points and/or the number L of the second data block adjacent to the first data block change, the T is a positive integer, and the L is a positive integer.
  • the device further includes: a receiving unit and a sending unit, the second acquiring unit is specifically configured to acquire the first feature of the first data block; the sending unit is configured to sending the first feature of the first data block to the first device; the receiving unit is configured to receive the access frequency of the first data block sent by the first device, and the first feature of the first data block
  • the access frequency is obtained by the first device by using the access frequency prediction model and the first feature.
  • an access frequency prediction model training generation device includes: a fourth acquisition unit, configured to acquire a data set to be trained, the data set to be trained includes a plurality of features to be trained And the label corresponding to the multiple features to be trained, the label is the access frequency, each feature to be trained in the multiple features to be trained includes read/write features related to the data block to be trained and the The read feature of the third data block adjacent to the data block, the data block to be trained is any data block in the storage space; the generation unit is used to use the data set to be trained to train the initial network model, and generate Access frequency prediction models.
  • the feature to be trained includes the feature of each historical time point of the data block to be trained at T historical time points, and the read data of the data block to be trained at T historical time points There is a time-space correlation between writing features, and the T is a positive integer.
  • the access frequency prediction model includes a first sub-model and a second sub-model, the first sub-model is used to extract the time characteristics of the features to be trained, and the second sub-model It is used to extract the spatial characteristics of the features to be trained.
  • the access frequency prediction model further includes a third sub-model, and the third sub-model is used to process a time series model.
  • the read/write feature related to the data block to be trained includes at least one of the following: read/write the frequency feature of the data block to be trained, read the data to be trained The length feature of the block, and the arrangement feature of reading the data block to be trained.
  • the read and write characteristics related to the data block to be trained satisfy at least one of the following: the frequency characteristics of reading/writing the data block to be trained include at least one of the following: one The frequency of reading the data block to be trained by one or more access interfaces, the frequency of writing the data block to be trained by one or more access interfaces, and the total number of reads and writes of the data block to be trained by one or more access interfaces Frequency; and/or, the length feature of reading the data block to be trained includes at least one of the following: maximum read length, minimum read length or average read length; and/or, the read The arrangement characteristics of the data blocks to be trained include at least one of the following: the number of times to read the first length, the number of times to read the second length, the ratio of the number of times to read the first length to the total number of times to read, or the number of times to read The ratio of the times of the second length to the total number of reads is taken, the first length means that the length is 2n, the second length means that
  • the reading characteristics of the third data blocks adjacent to the data block to be trained include at least reading frequency characteristics corresponding to each of the L third data blocks, the L is a positive integer.
  • the fourth acquiring unit is specifically configured to receive the data set to be trained sent by the second device.
  • the device further includes: a sending unit, configured to send the access frequency prediction model to the first device.
  • the device further includes: a sending unit, configured to send the access frequency prediction model to a second device, so as to forward the access frequency prediction model to the first device through the second device device.
  • a sending unit configured to send the access frequency prediction model to a second device, so as to forward the access frequency prediction model to the first device through the second device device.
  • the device further includes: a receiving unit, configured to receive the changed feature length and the new data set to be trained sent by the second device; the third device utilizes the changed The final feature length is used to analyze the new data set to be trained to obtain a plurality of new features to be trained; the generating unit is also used to use the plurality of new features to be trained and the plurality of new features The label corresponding to the feature to be trained continues to train the access frequency prediction model to obtain an updated access frequency prediction model.
  • a receiving unit configured to receive the changed feature length and the new data set to be trained sent by the second device
  • the third device utilizes the changed
  • the final feature length is used to analyze the new data set to be trained to obtain a plurality of new features to be trained
  • the generating unit is also used to use the plurality of new features to be trained and the plurality of new features
  • the label corresponding to the feature to be trained continues to train the access frequency prediction model to obtain an updated access frequency prediction model.
  • a data processing system comprising: a first device and a second device; the first device is configured to execute the data processing method described in the first aspect; the A second device, configured to execute the data processing method described in the second aspect.
  • system further includes: a third device;
  • the third device is configured to execute the method described in the third aspect, and is used to train and generate an access frequency prediction model.
  • a data processing device including a processor coupled to a memory, and the processor is used to execute computer instructions in the memory, so that any one of the first aspect of the device possible implementations, any possible implementations of the second aspect or any possible implementations of the third aspect.
  • a computer-readable storage medium including instructions, which, when run on a computer, cause the computer to execute the above first aspect and any possible implementation thereof, the second aspect and Any possible implementation thereof or the method described in the third aspect and any possible implementation thereof.
  • a computer program product is provided.
  • the device executes the first aspect and any possible implementation thereof, the second aspect and its Any possible implementation or the third aspect and any possible implementation method thereof.
  • a first device including a processor, configured to execute the computer program (or computer-executable instruction) stored in the memory, when the computer program (or computer-executable instruction) is executed During execution, the device is made to execute the method in the first aspect and each possible implementation of the first aspect.
  • processor and memory are integrated;
  • the above-mentioned memory is located outside the first device.
  • the first device also includes a communication interface, which is used for the first device to communicate with other devices, such as sending or receiving data and/or signals.
  • the communication interface may be a transceiver, circuit, bus, module or other types of communication interface.
  • a second device including a processor, configured to execute the computer program (or computer-executable instruction) stored in the memory, when the computer program (or computer-executable instruction) When executed, the device is made to execute the method in the second aspect and each possible implementation of the second aspect.
  • processor and memory are integrated;
  • the memory is located outside the second device.
  • the second device also includes a communication interface, which is used for the second device to communicate with other devices, such as sending or receiving data and/or signals.
  • the communication interface may be a transceiver, circuit, bus, module or other types of communication interface.
  • a third device including a processor, configured to execute the computer program (or computer-executable instruction) stored in the memory, when the computer program (or computer-executable instruction) is executed During execution, the device is made to execute the third aspect and the method in each possible implementation of the third aspect.
  • processor and memory are integrated;
  • the above-mentioned memory is located outside the third device.
  • the third device also includes a communication interface, which is used for the third device to communicate with other devices, such as sending or receiving data and/or signals.
  • the communication interface may be a transceiver, circuit, bus, module or other types of communication interface.
  • the chip system includes a processor, and may also include a memory, for realizing the above-mentioned first aspect and any possible implementation thereof, the second aspect and its Any possible implementation or the method described in the third aspect and any possible implementation thereof.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • the access frequency prediction model is generated through pre-training, and the first device acquires the first feature of each data block in the storage space, that is, the first feature of the first data block.
  • the first characteristic includes a read/write characteristic associated with the first data block itself and a read characteristic of a second data block adjacent to the first data block.
  • the first device inputs the first feature of the first data block into the access frequency prediction model to obtain the access frequency corresponding to the first data block at a future moment.
  • the access frequency corresponding to the first data block is used to determine the type of the first data block. For example, it is determined whether the first data block belongs to cold data or hot data according to the access frequency corresponding to the first data block.
  • the embodiment of the present application obtains the multi-dimensional features of the first data block (the read/write features of the first data block itself, and the read features of adjacent data blocks), and uses the multi-dimensional features to predict the future of the first data block. Predicting the access frequency not only improves the prediction accuracy, but also determines the type of the first data block according to the access frequency of the first data block, and then performs corresponding processing on the first data block to improve the performance of the storage system.
  • FIG. 1 is a storage hierarchical structure diagram provided by an embodiment of the present application.
  • FIG. 2 is a flow chart of a data processing method provided by an embodiment of the present application.
  • Fig. 3a is a schematic diagram of a data block storage provided by the embodiment of the present application.
  • FIG. 3b is a structure diagram of an access frequency prediction model provided by an embodiment of the present application.
  • FIG. 4 is a flow chart of another data processing method provided by the embodiment of the present application.
  • FIG. 5 is a flowchart of another data processing method provided by the embodiment of the present application.
  • FIG. 6 is a data processing framework diagram provided by an embodiment of the present application.
  • FIG. 7 is a comparison diagram of a simulation structure provided by the embodiment of the present application.
  • FIG. 8 is a structural diagram of a data processing device provided in an embodiment of the present application.
  • FIG. 9 is a structural diagram of another data processing device provided in the embodiment of the present application.
  • FIG. 10 is a structural diagram of a network device provided by an embodiment of the present application.
  • FIG. 11 is a structural diagram of a data processing device provided by an embodiment of the present application.
  • a storage mode is proposed, that is, intelligent multi-mode storage, which accurately stores data in suitable media according to the value of business data, life cycle and other factors to optimize data access efficiency.
  • data flow is the key technology of intelligent multi-mode storage, which supports the management and scheduling of data flow tasks in the same layer or across layers to achieve efficient data flow.
  • Hot and cold data identification is an important part of data flow. By identifying the stored data, the data flow plan can be determined according to the identification results, so that hot data can be placed in the performance layer as much as possible, and cold data can be placed in the capacity layer to improve the efficiency of storage media. Use efficiency.
  • the performance layer refers to the top storage layer in the storage device, and the storage layer has a higher transmission rate
  • the capacity layer refers to the lower storage layer in the memory, and the storage layer has a lower transmission rate.
  • the storage area close to the central processing unit (CPU) is used as the performance layer, and the storage area far away from the CPU is used as the capacity layer. Therefore, how to accurately identify the type of data to process different types of data and improve the performance of the storage system is the key to realizing intelligent multi-mode storage.
  • one scheme is to use exponential smoothing to predict the access frequency of data, but this scheme can only accurately predict the short-term access frequency, and cannot predict the long-term Accurately predict the frequency of visits.
  • One solution is to use the Markov chain to predict the access frequency of the data. This solution can learn the static distribution of the state and use a hierarchical structure to reduce the size of the transition probability matrix. However, due to the limitations of the Markov assumption, The state space explodes, the model is difficult to converge, and the complexity is high.
  • One solution is to use the generative confrontation network to predict the access frequency of data.
  • This solution can learn the distribution of features and predict the access frequency, but the model complexity is draft, it is difficult to converge, and the inference delay is high.
  • Another solution is to use clustering and neural network models to classify data. This solution can use the characteristics of the address space for clustering and model each cluster to reduce the size of the neural network model, but the address coverage The lower the rate, the larger the prediction granularity. It can be seen that there are various problems in the traditional data type identification method, such as the inability to take into account the long-term prediction accuracy, high model complexity, and large prediction delay.
  • the embodiment of the present application provides a data processing method, which uses a neural network model to identify the type of data, and then performs different processing according to the type of data, which not only improves the accuracy of identification, but also improves the efficiency of the system in processing data. performance.
  • the access frequency prediction model is pre-trained, and when the data in the storage space needs to be processed, the first feature of each data block (first data block) in the storage space is obtained, and the first feature includes the first feature related to the first data block.
  • the obtained first feature is input into the access frequency prediction model to obtain the access frequency of the first data block, so as to determine the type of the first data block according to the access frequency of the first data block, and then perform matching processing. It can be seen that by acquiring the multi-dimensional features related to the first data block, the future access frequency of the first data block can be predicted through the multi-dimensional features, so as to improve the prediction accuracy. At the same time, the type of the first data block is determined according to the access frequency of the first data block, and then corresponding processing is performed on the first data block, thereby improving the processing performance of the system.
  • the value of data lies in the frequency with which it is queried or updated.
  • users have different usage requirements for different data.
  • some data blocks recorded by the system are crucial for analysis, and the system queries these data blocks frequently, while other data blocks are irrelevant for analysis.
  • the number of times the system queries these data is relatively sparse.
  • frequently queried data blocks need to be stored in the performance layer, and less queried data blocks are stored in the capacity layer. Therefore, the method provided by the embodiment of the present application can be used to obtain the first feature of each data block stored in the system, input the first feature of the data block into the access frequency prediction model, and use the access frequency prediction model to obtain the access frequency of the data block. frequency. After the access frequency of each data block is obtained, the location where the data block is stored is determined based on the access frequency.
  • this figure is a flow chart of a data processing method provided in the embodiment of the present application. As shown in FIG. 2, the method may include:
  • the first device acquires a first characteristic of a first data block.
  • the first feature of the first data block is obtained.
  • the first characteristic includes a read/write characteristic associated with a first data block and a read characteristic of a second data block adjacent to the first data block. That is, the first feature includes not only features related to the first data block itself, but also features of neighboring data blocks, so as to reflect the characteristics of the first data block in multiple dimensions.
  • the read/write characteristics related to the first data block include: one of the frequency characteristics of reading/writing the first data block, the length characteristics of the first data block, and the arrangement characteristics of the first data block or more.
  • the frequency characteristic of reading/writing the first data block may include, but not limited to, the read frequency of the first data block by one or more access interfaces, the write frequency of the first data block by one or more access interfaces, and one or more The total frequency of reading and writing of the first data block by the access interface. in,
  • the writing frequency of all access interfaces to the first data block refers to the frequency of adding, deleting, modifying, etc., the data of the first data block. For example, if the frequency of all access interfaces reading the first data block is x, and the frequency of all access interfaces writing the first data block is y, then the total frequency of reading and writing of the first data block by all access interfaces is x+y.
  • the length characteristic of reading the first data block includes but is not limited to a maximum read length, a minimum read length, and an average read length.
  • the arrangement characteristics of reading the first data block include but not limited to the number of times of reading the first length, the number of times of reading the second length, and the ratio of the number of times of reading the first length to the total number of times of reading Ratio and the number of reads of the second length as a percentage of the total number of reads.
  • the first length refers to a length of 2n
  • the second length is not 2n
  • n is a positive integer.
  • the reading characteristics of the second data block adjacent to the first data block include at least the reading frequency corresponding to each of the L second data blocks, where L is a positive integer.
  • the L second data blocks can be based on the first data block, and L/2 second data blocks can be taken forward and L/2 second data blocks can be taken backward. When L/2 is a non-integer, it can be rounded up.
  • the second data block adjacent to the first data block may include the second data block directly adjacent to the first data block, and also includes the second data block indirectly adjacent to the first data block.
  • the data block may be data stored between two storage addresses in the storage space. Typically, storage space can be divided into multiple data blocks.
  • the storage device is divided into multiple storage spaces, wherein the slave address 000000-011111 is used to store data block 1, the address 100000-101111 is used to store data block 2, and the address 110000-111000 is used to store data block 3 etc.
  • the first data block may be data block 2
  • the second data block may be data block 1 and data block 3 respectively.
  • the access situation of the first data block has time correlation
  • the first feature may include the feature of each historical time point of the first data block at T historical time points
  • the first data block has a time-space correlation between read/write frequency characteristics at T historical time points, where T is a positive integer.
  • the T historical time points are times before the current time point for prediction.
  • the time correlation between the read/write frequency characteristics of the first data block at T historical time points means that the read/write characteristics at different time points have a correlation.
  • the association relationship at different time points may be that the read/write frequencies at two time points change in the same direction.
  • the first data block is read at the time point t1, the first data block is also read at the time point t2, wherein the time point t2 is later than the time point t1.
  • the association relationship at different time points is that the read/write frequencies at two time points change in opposite directions.
  • the first data block is read at the time point t1, the first data block is not read at the time point t2, wherein the time point t2 is later than the time point t1.
  • there is a spatial correlation between the read/write frequency characteristics of the first data block at T historical time points which means that the read/write frequency of the first data block and the read/write frequency of the second data block are in each history dependent on each other in time.
  • the second data block can be analyzed only after obtaining the first data block. In this case, the reading frequency of the first data block is affected by the reading frequency of the second data block. The effect of frequency.
  • 10 is the number of read/write features related to the first data block, such as read/write 3 frequency features of the first data block, read 3 length features of the first data block, and read the first data block The 4 arrangement features of .
  • the number of relevant historical time points may change, that is, the value of T changes.
  • the number of correlated neighbor data blocks may also change, that is, the value of L also changes. Therefore, the length of the first feature corresponding to the first data block is also constantly changing.
  • an autocorrelation function may be used to obtain a first correlation coefficient between read/write frequencies of the first data block at different time points, so as to determine T according to the first correlation coefficient.
  • a covariance function may be used to obtain a second correlation coefficient between the read/write frequency of the first data block and the read/write frequency of the second data block, and L is determined according to the second correlation coefficient.
  • S202 The first device inputs the first feature into the access frequency prediction model, and obtains the access frequency of the first data block output by the access frequency prediction model.
  • the first feature is input into the access frequency prediction model, so as to obtain the access frequency of the first data block through the access frequency prediction model.
  • the access frequency of the first data block refers to the reading frequency of the first data block, which can be used to determine the type of the first data block.
  • the type of the first data block may be cold data or hot data, or the type of the first data block may be garbage data or non-junk data.
  • garbage data can refer to data that has no impact on the normal operation of the system in the current application scenario, and will be set according to the actual application scenario requirements.
  • the system only needs to query the data of the current year, and the data of last year and earlier are called garbage data, or after the system is upgraded or transformed, some functions are removed, and some tables related to this function may no longer be used, then The data in this part of the table is called junk data.
  • the access frequency prediction model is generated according to training features to be trained and labels corresponding to the features to be trained, and the label corresponding to the features to be trained is the access frequency.
  • the first device when the first device has a data processing function, when the first device obtains the access frequency of each first data block, it can determine the access frequency of each first data block according to the access frequency of each first data block type, in order to perform matching processing operations for different types of data blocks. For example, the first data block belonging to the hot data is moved to the performance layer of the storage, and the first data block belonging to the cold data is moved to the capacity layer of the storage.
  • the first device determines the type of each first data block according to the access frequency of each data block. Specifically, the first device sorts the first K-th data blocks according to the descending order of the access frequency. One data block is determined to be hot data, and the remaining first data block is cold data. Alternatively, the first device determines the first data block whose access frequency is greater than or equal to the access frequency threshold as hot data, and determines the first data block whose access frequency is less than the access frequency threshold as cold data.
  • the first device may also determine whether the first data block is garbage data according to the access frequency of the first data block, and if so, reclaim the first data block .
  • the first device performs fragmentation processing according to the access frequency of each first data block.
  • fragmentation processing refers to reorganizing the fragmented data generated by the storage device during long-term use through system software or professional defragmentation software, so that relevant data exists in continuous sectors, improving storage performance and data reading speed.
  • the first device determines whether to prefetch the first data block according to the access frequency of the first data block.
  • prefetching refers to extracting the required data to the storage layer closer to the CPU before the CPU is used, so that it can be obtained in time when it is really needed, and the delay can be reduced.
  • the access frequency prediction model can extract different features in the first feature time/space correlations.
  • the access frequency prediction model may include a first sub-model and a second sub-model, the first sub-model is used to extract the time characteristic of the first feature, and the second sub-model is used to extract the spatial characteristic of the first feature.
  • the first sub-model is a convolutional neural network model (convolutional neural networks, CNN)
  • the second sub-model is a long short-term memory network (long short-term memory, LSTM).
  • the access frequency prediction model may further include a third sub-model, and the third sub-model is used to obtain the feature scale of the first feature.
  • the third submodel is an autoregressive model.
  • the feature scale of the first feature refers to the dimension scale of each feature in the first feature.
  • the model includes a CNN layer, an LSTM layer, a fully connected layer, and an autoregressive layer. Wherein, the role of the fully connected layer is to combine the first features learned by the aforementioned layers to determine the access frequency of the first data block.
  • the access frequency prediction model when the feature length of the first feature changes, in order to ensure the accuracy of the prediction, it is necessary to retrain the access frequency prediction model to obtain an updated access frequency prediction model. That is, a new feature to be trained and the new feature to be trained are used to train the access frequency prediction model. Wherein, the length of the new feature to be trained is the changed feature length.
  • the first device After obtaining the updated access frequency prediction model, the first device will obtain the second feature of the first data block, the feature length of the second feature is different from the feature length of the first feature; the first device will input the second feature into the update
  • the updated access frequency prediction model is used to obtain the access frequency of the first data block output by the updated access frequency prediction model.
  • the feature types included in the second feature can be consistent with the feature types included in the first feature, but the feature lengths of the two are different.
  • the change of the feature length of the first feature may include the change of the number T of historical time points, and/or the change of the number L of adjacent second data blocks.
  • the access frequency prediction model is retrained.
  • the preset threshold corresponding to the variation of T and the preset threshold corresponding to the variation of L may be set according to actual requirements.
  • the future access frequency of the first data block is predicted according to the multi-dimensional feature, so as to improve the prediction accuracy.
  • the type of the first data block is determined according to the access frequency of the first data block, and then corresponding processing is performed on the first data block, thereby improving the processing performance of the system.
  • the first device when the first device is capable of acquiring the first feature of the first data block, it may collect the first feature of the first data block by itself. The first device may also receive the first characteristic of the first data block sent by the second device.
  • the second device is capable of acquiring the first characteristic of the first data block.
  • the second device is a host, and a large amount of data is stored in the host, and the first device is a device capable of training an access frequency prediction model and using the access frequency prediction model to predict the access frequency of the input first feature.
  • the first device after obtaining the access frequency of the first data block, the first device may send the access frequency of the first data block to the second device, and the second device performs related operations according to the access frequency of the first data block.
  • this figure is a flow chart of data processing interaction provided by the embodiment of the present application. As shown in FIG. 4, the method may include:
  • S401 The second device acquires a first characteristic of a first data block.
  • the second device is responsible for collecting the first characteristics of each first data block in the storage space, and the first characteristics include the read/write characteristics related to the first data block itself and the first data block adjacent to the first data block.
  • the read characteristics of the second data block are the read characteristics of the second data block.
  • the second device sends the first characteristic value of the first data block to the first device, and correspondingly, the first device receives the first characteristic value of the first data block sent by the second device.
  • the second device after the second device collects the first feature of the first data block, the second device sends the first feature of the first data block to the first device, so that the first device can access the first data block Frequency is predicted.
  • S403 The first device inputs the first feature into the access frequency prediction model, and obtains the access frequency of the first data block output by the access frequency prediction model.
  • the first device After obtaining the first feature of the first data block, the first device inputs the first feature into the stored access frequency prediction model, so as to obtain the access frequency of the first data block at a future moment through the access frequency prediction model.
  • the access frequency prediction model For the implementation of the first device obtaining the access frequency of the first data block by using the access frequency prediction model, reference may be made to the relevant description of S202 , which will not be repeated in this embodiment.
  • the access frequency prediction model is generated according to the training feature to be trained and the label corresponding to the feature to be trained, and the label corresponding to the feature to be trained is the access frequency.
  • the access frequency prediction model in the first device may be generated by training of the first device, or may be generated by training of the second device, and sent to the first device.
  • the features to be trained and the labels corresponding to the features to be trained used by the first device to train and generate the access frequency prediction model are also sent by the second device.
  • the second device can monitor the change of the characteristic length of the first feature, and when it detects that the characteristic length of the first feature changes, the second device acquires a new data set to be trained, and the new The data set to be trained includes multiple features to be trained and labels corresponding to the multiple features to be trained.
  • the new data set to be trained is used to train the access frequency prediction model to obtain an updated access frequency prediction model.
  • the second device may use the new data set to be trained to train the access frequency model, obtain an updated access frequency prediction model, and send the updated access frequency prediction model to the first sending device, correspondingly, the first The device receives the updated access frequency prediction model sent by the second device.
  • the second device sends a new data set to be trained to the first device, and correspondingly, the first device receives the new data set to be trained sent by the second device, so as to use the new data set to be trained to train the access frequency model , to obtain the updated access frequency prediction model.
  • the change of the feature length of the first feature includes the change of the number T of historical time points, and/or the change of the number L of the second data block adjacent to the first data block.
  • T and L are positive integers.
  • the access frequency prediction model is retrained.
  • the preset threshold corresponding to the change amount of T and the preset threshold value corresponding to the change amount of L may be set according to actual requirements.
  • the second device detects that the feature length of the first feature changes, if the current version of the access frequency prediction model is used to predict the access frequency, the accuracy of the prediction may decrease.
  • the second device detects that the characteristic length of the first characteristic changes, the second device sends an interrupt signal to the first device, and correspondingly, the first device receives the interrupt signal sent by the second device. After the first device receives the interrupt signal, the first device no longer uses the current access frequency prediction model to predict the access frequency of the first data block. After the first device obtains the updated access frequency prediction model, obtains the second feature of the first data block, and inputs the second feature into the updated access frequency prediction model to predict the access frequency of the first data block , so as to ensure the accuracy of the prediction.
  • the length of the second feature is a changed length, which is different from the length of the first feature.
  • the first device sends the access frequency of the first data block to the second device, and correspondingly, the second device receives the access frequency of the first data block sent by the first device.
  • S405 The second device determines the type of each first data block in the storage space according to the access frequency of each first data block.
  • the first device For the first data block in the storage space, after the first device obtains the access frequency of the first data block, it sends the access frequency of the first data block to the second device, and correspondingly, the second device receives the access frequency sent by the first device. Access frequency of the first data block. After obtaining the access frequency of each first data block in the storage space, the second device may determine the type of each first data block in the storage space according to the access frequency of each first data block. Wherein, the type of the first data block may include hot data or cold data, and may also be garbage data or non-junk data.
  • the second device may also perform other processing operations, such as determining whether to perform a prefetch operation on the data block according to the access frequency of the first data block, or The frequency of access to defragmentation operations, etc.
  • the device for training and generating the access frequency prediction model may also be executed by a third device, and the third device sends the third device to the first device after the training is generated.
  • the first device, the second device, and the third device may be independent devices, or may be different functional modules on the same device, and this embodiment does not limit the specific forms of the above three modules.
  • this figure is an interaction diagram of a data processing method provided in the embodiment of the present application. As shown in FIG. 5, the method may include:
  • the second device acquires a data set to be trained.
  • the second device is responsible for constructing a data set to be trained.
  • the data set to be trained may include multiple features to be trained and labels corresponding to the multiple features to be trained.
  • the label may be an access frequency.
  • each of the multiple features to be trained includes a read/write feature related to the data block to be trained and a read feature of a third data block adjacent to the data block to be trained.
  • the data block to be trained can be any data block in the storage space.
  • the read/write features related to the data blocks to be trained include but not limited to the frequency features of read/write data blocks to be trained, the length features of read data blocks to be trained, and the arrangement features of read data blocks to be trained.
  • the frequency characteristics of reading/writing the data block to be trained include but not limited to the reading frequency of the data block to be trained by one or more access interfaces, and the writing frequency of the data block to be trained by one or more access interfaces frequency and the total frequency of reading and writing of the data block to be trained by one or more access interfaces.
  • the length characteristics of reading the data block to be trained include but not limited to the maximum read length, the minimum read length or the average read length.
  • Reading the arrangement features of the data block to be trained includes but not limited to the number of times to read the first length, the number of times to read the second length, the ratio of the number of times to read the first length to the total number of times to read, and the number of times to read the second length.
  • the first length means that the length is 2n
  • the second length means that the length is not 2n
  • n is a positive integer.
  • the reading characteristics of the third data block adjacent to the data block to be trained include at least reading frequency characteristics corresponding to each of the L third data blocks, and L is a positive integer.
  • the L third data blocks may be based on the data block to be trained, and L/2 third data blocks may be taken forward and L/2 third data blocks may be taken backward.
  • the features to be trained include features of each historical time point of the data block to be trained at T historical time points, and the data block to be trained has a spatiotemporal correlation between read/write frequency features at the T historical time points.
  • T is a positive integer.
  • the second device sends the data set to be trained to the third device, and correspondingly, the third device receives the data set to be trained sent by the second device.
  • the third device uses the data set to be trained to train the initial network model to generate an access frequency prediction model.
  • the third device uses the data set to be trained sent by the second device to train the initial network model, so as to train and generate the access frequency prediction model.
  • the access frequency prediction model may include a first sub-model and a second sub-model, the first sub-model is used to extract the temporal characteristics of the features to be trained, and the second sub-model is used to extract the spatial characteristics of the features to be trained.
  • the first sub-model is a CNN network
  • the second sub-model is an LSTM network.
  • the access frequency prediction model may also include a third sub-model, the third sub-model is used to obtain the feature scale of the feature to be trained, and the feature scale of the feature to be trained is Refers to the size of the dimension of each feature in the training feature.
  • the third submodel is an autoregressive model.
  • the third device sends the access frequency prediction model to the second device, and correspondingly, the second device receives the access frequency prediction model sent by the third device.
  • the second device when the second device detects that the feature length changes, acquires a new data set to be trained, which includes multiple features to be trained and labels corresponding to the multiple features to be trained ;
  • the second device sends a new data set to be trained to the third device, correspondingly, the third device receives the new data set to be trained sent by the second device, and uses the new data set to be trained to predict the access frequency. Training to obtain the updated access frequency prediction model.
  • the third device may send the updated access frequency prediction model to the second device, so as to forward the updated access frequency prediction model to the first device through the second device.
  • the third device directly sends the updated access frequency prediction model to the first device, and correspondingly, the first device receives the updated access frequency prediction model first sent by the third device.
  • the access frequency prediction model is retrained in order to avoid resource overhead caused by frequent training, usually when the variation of T or L is greater than a preset threshold.
  • the second device when it detects that the characteristic length has changed, it can also send the changed characteristic length to the third device.
  • the third device receives the changed characteristic length sent by the second device to utilize the changed characteristic length.
  • the feature length of is to divide the features to be trained in the training data set to obtain multiple features to be trained.
  • the access frequency prediction model may also be lightweighted, so as to reduce the volume of the access frequency prediction model and reduce memory usage. For example, pruning and compression are performed on the access frequency prediction model.
  • the third device may also convert the file of the access frequency prediction model into a format supported by the interface of the first device.
  • S505 The second device forwards the access frequency prediction model to the first device, and correspondingly, the first device receives the access frequency prediction model sent by the second device.
  • S506 The second device acquires the first feature of the first data block.
  • the second device sends the first characteristic of the first data block to the first device, and correspondingly, the first device receives the first characteristic of the first data block sent by the second device.
  • S508 The first device inputs the first feature of the first data block into the access frequency prediction model, so as to obtain the access frequency of the first data block output by the access frequency prediction model.
  • the first device sends the access frequency of the first data block to the second device, and correspondingly, the second device receives the access frequency of the first data block sent by the first device.
  • the host can realize the function of the second device in the embodiment of the present application
  • the reasoning card can realize the function of the first device in the embodiment of the present application
  • the training card can realize the function of the third device in the embodiment of the present application.
  • the host computer can collect the data set to be trained, and send the data set to be trained to the training card
  • the training card uses the data set to be trained to train the model to obtain the access frequency prediction model, and the access frequency prediction model can be obtained through the host computer.
  • Send to reasoning card can be performed.
  • the host After the host obtains the first feature of a data block in the storage space, the host sends the first feature of the data block to the inference card, and the inference card inputs the first feature into the access frequency prediction model to obtain the access frequency of the data block .
  • the inference card sends the access frequency of the data block to the host, so that the host determines the type of the data block according to the access frequency of the data block.
  • the host may include a data acquisition module, a feature extraction module, a spatiotemporal characteristic processing module, and a data type identification module.
  • the data collection module is used for collecting data blocks in the storage space.
  • the feature extraction module is used to extract the features of the data block. If it is a training process, it is also necessary to obtain the label corresponding to the feature to construct a labeled data set to be trained; if it is a prediction process, it only needs to extract the features of the data block .
  • the spatio-temporal characteristics processing module is used to calculate the temporal correlation of the data blocks according to the autocorrelation function to determine the temporal correlation parameter T, and to calculate the spatial correlation of the data blocks according to the covariance function to determine the spatial correlation parameter L.
  • the data type identification module is used to determine the type of the data block according to the future access frequency of the data block.
  • the training card may include a training module, a lightweight processing module and a format matching module.
  • the training module is used to use the data set to be trained to train the initial network model to obtain the access frequency prediction model.
  • the training module can be a CPU or a graphics processing unit (graphics processing unit, GPU).
  • the lightweight processing module is used to perform lightweight processing on the access frequency prediction model, such as pruning, compression, etc., to reduce the volume of the access frequency prediction model and improve the prediction speed.
  • the format matching module is used to convert the file format of the access frequency prediction model into a format supported by the inference card interface.
  • the reasoning card may include a model iteration module, a model loading module, a reasoning module and a return module.
  • the model iteration module is used for updating the access frequency prediction model.
  • the model loading module is used to read in the access frequency prediction model file.
  • the reasoning module is configured to input the received first feature into the access frequency prediction model to obtain a prediction result. Returns the module used to send the prediction result to the host.
  • FIG. 7 shows the comparison of the accuracy of access frequency prediction of data blocks using different models.
  • the abscissa in the figure represents the ratio of the number of hot data blocks to be predicted to the total number of data blocks. For example, there are a total of 1000 data blocks, and 0.1 on the abscissa indicates that they are sorted from large to small according to the access frequency , use the first 100 data blocks as hot data blocks.
  • the vertical axis represents the prediction accuracy.
  • the simulation lines from bottom to top in Figure 7 are the simulation lines based on the exponential moving average (EMA) model, the simulation lines based on the counting bloom filter (counting bloom filter, CBF) model, and the simulation lines based on The simulation line obtained by the least recently used (LRU) model, the simulation line obtained based on the linear regression (LR) model, the simulation line obtained based on the LSTM model, and the simulation line obtained based on the recurrent neural network (RNN)
  • an embodiment of the present application provides a data processing device, which will be described below with reference to the accompanying drawings.
  • FIG. 8 this figure is a structural diagram of a data processing device provided by an embodiment of the present application.
  • the device 800 can implement the functions of the first device in the above embodiment.
  • the device 800 can include executing the above-mentioned
  • the module or unit corresponding to the method/operation/step/action performed by the first device one-to-one the unit may be a hardware circuit, or software, or a combination of hardware circuit and software.
  • the apparatus may include: a first acquiring unit 801 and a processing unit 802 .
  • the first obtaining unit 801 is configured to obtain a first characteristic of a first data block, the first data block is any data block in the storage space, and the first characteristic includes The read/write characteristics of the data block and the read characteristics of the second data block adjacent to the first data block.
  • the first acquiring unit 801 reference may be made to relevant descriptions of S201, S402, and S507, and details are not repeated in this embodiment.
  • the processing unit 802 is configured to input the first feature into the access frequency prediction model, and obtain the access frequency of the first data block output by the access frequency prediction model, and the access frequency of the first data block is used to determine the Describe the type of the first data block.
  • the processing unit 802 reference may be made to relevant descriptions of S202, S403, and S508, which will not be repeated in this embodiment.
  • the device further includes: a receiving unit, configured to receive the first feature of the first data block sent by the second device; the first obtaining unit 801, specifically configured to to acquire the first feature received by the receiving unit.
  • a receiving unit configured to receive the first feature of the first data block sent by the second device
  • the first obtaining unit 801 specifically configured to acquire the first feature received by the receiving unit.
  • the device further includes: a sending unit (not shown in the figure);
  • a sending unit configured to send the access frequency of the first data block to the second device.
  • the sending unit For the specific implementation of the sending unit, reference may be made to the related descriptions of S404 and S509 , which will not be repeated here in this embodiment.
  • the type of the first data block is cold data or hot data.
  • the processing unit 802 is further configured to update the access frequency prediction model when the feature length of the first feature changes, to obtain an updated access frequency prediction model.
  • update unit For the specific implementation of the update unit, reference may be made to the relevant descriptions of S202, S403, and S504, which will not be repeated here in this embodiment.
  • the first acquiring unit 801 is further configured to acquire a second feature of the first data block, the length of the first feature is different from the length of the second feature; the processing The unit 802 is further configured to input the second feature into the updated access frequency prediction model, and obtain the access frequency of the first data block output by the updated access frequency prediction model.
  • the first acquiring unit 801 reference may be made to relevant descriptions in S202 and S403, which will not be repeated here in this embodiment.
  • FIG. 9 is a structural diagram of another data processing device provided by the embodiment of the present application.
  • the device 900 can implement the functions of the second device in the above-mentioned method embodiment.
  • the device 900 can include The module or unit corresponding to the method/operation/step/action performed by the second device in the above method embodiment, the unit may be a hardware circuit, or software, or a combination of hardware circuit and software.
  • the apparatus further includes: a second acquiring unit 901 and a processing unit 902 .
  • the second obtaining unit 901 is configured to obtain the access frequency of the first data block, the access frequency of the first data block is obtained based on the access frequency prediction model and the first feature of the first data block, the The first feature includes the read/write feature related to the first data block and the read feature of the second data block adjacent to the first data block, and the first data block is any data in the storage space piece.
  • the second acquiring unit 901 reference may be made to the related descriptions of S404 and S509, which will not be repeated in this embodiment.
  • the processing unit 902 is configured to determine the type of each of the first data blocks in the storage space according to the access frequency of each of the first data blocks. For the specific implementation of the processing unit 902, reference may be made to relevant descriptions of S202 and S405, which will not be repeated in this embodiment.
  • the processing unit 902 is specifically configured to perform sorting according to the access frequency of each of the first data blocks to obtain a sorting result; determine each of the data blocks in the storage space according to the sorting result The type of the first data block.
  • the processing unit 902 reference may be made to relevant descriptions of S202 and S405, which will not be repeated in this embodiment.
  • the first feature includes the feature of each historical time point of the first data block at the T historical time points, and the read data of the first data block at the T historical time points There is a time-space correlation between writing features, and the T is a positive integer.
  • the read/write feature related to the first data block includes at least one of the following: read/write frequency feature of the first data block, read the first data block The length characteristic of the block, and the arrangement characteristic of reading the first data block.
  • the frequency characteristics of reading/writing the first data block include the following: one or more access interfaces read frequency of the first data block, one or more access interfaces pair The writing frequency of the first data block and the total frequency of reading and writing of the first data block by one or more access interfaces.
  • the read length characteristic of the first data block includes at least one of the following: a maximum read length, a minimum read length, or an average read length.
  • the arrangement feature of reading the first data block includes at least one of the following: the number of times to read the first length, the number of times to read the second length, the number of times to read the first data block
  • the ratio of the number of times of a length to the total number of times of reading or the ratio of the number of times of reading the second length to the total number of times of reading, the first length means that the length is 2n, and the second length means that the length is not 2n , n is a positive integer.
  • the read characteristics of the second data blocks adjacent to the first data block include at least read frequency characteristics corresponding to each of the L second data blocks, the L is a positive integer.
  • the access frequency prediction model includes a first sub-model and a second sub-model, the first sub-model is used to extract the time characteristics of the first feature, and the second sub-model is used to A spatial characteristic of the first feature is extracted.
  • the access frequency prediction model further includes a third sub-model, and the third sub-model is used to acquire a feature scale of the first feature.
  • the access frequency prediction model is generated according to training features to be trained and labels corresponding to the features to be trained, where the label corresponding to the features to be trained is access frequency.
  • the device further includes: a third acquisition unit (not shown in the figure);
  • a third acquiring unit configured to acquire a new data set to be trained when the feature length of the first feature changes, the new data set to be trained includes a plurality of features to be trained and the plurality of features to be trained For the corresponding label, the new data set to be trained is used to train the access frequency prediction model to obtain an updated access frequency prediction model.
  • the fourth acquiring unit reference may be made to the relevant descriptions of S403 and S504, which will not be repeated in this embodiment.
  • the device further includes: a sending unit and a receiving unit (not shown in the figure);
  • a sending unit configured to send the new data set to be trained to the third device, so that the third device uses the new data set to be trained to train the access frequency prediction model to obtain the An updated access frequency prediction model;
  • a receiving unit configured to receive the updated access frequency prediction model sent by the third device.
  • the device further includes: a sending unit (not shown in the figure);
  • a sending unit configured to send the updated access frequency prediction model to the first device.
  • the sending unit reference may be made to the relevant description of S505, which will not be repeated here in this embodiment.
  • the change in the characteristic length of the first feature includes a change in the number T of historical time points and/or the number L of the second data block adjacent to the first data block change, the T is a positive integer, and the L is a positive integer.
  • the device further includes: a receiving unit and a sending unit (not shown in the figure);
  • the second acquiring unit 901 is specifically configured to acquire the first characteristic of the first data block; the sending unit is configured to send the first characteristic of the first data block to the first device; the receiving unit is configured to Upon receiving the access frequency of the first data block sent by the first device, the access frequency of the first data block is obtained by the first device using the access frequency prediction model and the first feature.
  • FIG. 10 it is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • the network device may be, for example, the first device, the second device, or the third device in the embodiments shown in FIGS. 2-5 , or it may also be It is the device implementation of the data processing apparatus 800 in the embodiment shown in FIG. 8 or the device implementation of the data processing apparatus 900 in the embodiment shown in FIG. 9 .
  • the network device 1000 includes at least a processor 1010 .
  • the network device 1000 may also include a communication interface 1020 and a memory 1030 .
  • the number of processors 1010 in the network device 1000 may be one or more, and one processor is taken as an example in FIG. 10 .
  • the processor 1010, the communication interface 1020, and the memory 1030 may be connected through a bus system or other methods, wherein, in FIG. 10 , connection through a bus system 1040 is taken as an example.
  • the processor 1010 may be a CPU, a network processor (network processor, NP), or a combination of a CPU and an NP.
  • the processor 1010 may further include a hardware chip.
  • the aforementioned hardware chip may be an application-specific integrated circuit (application-specific integrated circuit, ASIC), a programmable logic device (programmable logic device, PLD) or a combination thereof.
  • the aforementioned PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL) or any combination thereof.
  • the processor 1010 may perform related functions such as inputting the first feature into the access frequency prediction model in the above method embodiment, and obtaining the access frequency of the first data block output by the access frequency prediction model.
  • the processor 1010 may perform related functions such as determining the type of each first data block in the storage space according to the access frequency of each first data block in the above method embodiment.
  • the processor 1010 may perform relevant functions such as training and generating an access frequency prediction model in the foregoing method embodiments.
  • the communication interface 1020 is used to receive and send the first feature, specifically, the communication interface 1020 may include a receiving interface and a sending interface. Wherein, the receiving interface may be used to receive the first feature, and the sending interface may be used to send the first feature.
  • the number of communication interfaces 1020 may be one or more.
  • the memory 1030 may include a volatile memory (English: volatile memory), such as a random-access memory (random-access memory, RAM); the memory 1030 may also include a non-volatile memory (English: non-volatile memory), such as a fast Flash memory (English: flash memory), hard disk (hard disk drive, HDD) or solid-state drive (solid-state drive, SSD); the memory 1030 may also include a combination of the above types of memory.
  • the memory 1030 may store, for example, an access frequency prediction model or a first feature of the first data, and the like.
  • the memory 1030 stores operating systems and programs, executable modules or data structures, or their subsets, or their extended sets, where the programs may include various operating instructions for implementing various operations.
  • the operating system may include various system programs for implementing various basic services and processing hardware-based tasks.
  • the processor 1010 can read the program in the memory 1030 to implement the data processing method provided in the embodiment of the present application.
  • the memory 1030 may be a storage device in the network device 1000 , or may be a storage device independent of the network device 1000 .
  • the bus system 1040 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus system 1040 can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 10 , but it does not mean that there is only one bus or one type of bus.
  • the embodiment of the present application also provides a data processing device 1100, which can be used to realize the functions of the first device, the second device or the third device in the above method, and the device 1100 can be a device or a chip in the device.
  • the data processing unit includes:
  • the input-output interface 1110 may be an input-output circuit.
  • the logic circuit 1120 may be a signal processor, a chip, or other integrated circuits that can implement the method of the present application.
  • At least one input and output interface 1110 is used for input or output of signals or data.
  • the I/O interface 1110 is used to receive the first characteristic of the first data block.
  • the I/O interface 1110 is used to output the first feature of the first data block.
  • the logic circuit 1120 is configured to execute some or all steps of any method provided in the embodiments of the present application. For example, when the device is the first device, it is used to perform the steps performed by the first device in various possible implementation manners in the above method embodiments, for example, the logic circuit 1120 is used to Obtain the access frequency of the first data block. When the device is the second device, it is used to execute the steps performed by the second device in various possible implementation methods in the above method embodiments, for example, the logic circuit 1120 is used to obtain the data type of the first data block.
  • the terminal chip implements the functions of the terminal in the above method embodiment.
  • the terminal chip receives information from other modules in the terminal (such as radio frequency modules or antennas), and the information is sent to the terminal by other terminals or network equipment; or, the terminal chip sends information to other modules in the terminal (such as radio frequency modules or antennas) Output information, which is sent by the terminal to other terminals or network devices.
  • the network device chip When the foregoing apparatus is a chip applied to a network device, the network device chip implements the functions of the network device in the foregoing method embodiments.
  • the network device chip receives information from other modules in the network device (such as radio frequency modules or antennas), and the information is sent to the network device by terminals or other network devices; or, the network device chip sends information to other modules in the network device (such as a radio frequency module or an antenna) output information, which is sent by the network device to the terminal or other network devices.
  • modules in the network device such as radio frequency modules or antennas
  • the present application also provides a chip or a chip system, and the chip may include a processor.
  • the chip may also include memory (or storage module) and/or transceiver (or communication module), or, the chip is coupled with memory (or storage module) and/or transceiver (or communication module), wherein the transceiver ( or communication module) can be used to support the chip for wired and/or wireless communication, the memory (or storage module) can be used to store a program or a set of instructions, and the processor calls the program or the set of instructions can be used to implement the above method embodiments, An operation performed by a terminal or a network device in any possible implementation manner of the method embodiment.
  • the system-on-a-chip may include the above-mentioned chips, and may also include the above-mentioned chips and other discrete devices, such as memory (or storage module) and/or transceiver (or communication module).
  • An embodiment of the present application provides a computer-readable storage medium, including an instruction or a computer program, which, when run on a computer, causes the computer to execute the data processing method provided in the above embodiments.
  • the embodiment of the present application also provides a computer program product including an instruction or a computer program, which, when run on a computer, causes the computer to execute the data processing method provided in the above embodiment.
  • the present application further provides a data processing system, and the data processing system may include the above first device and the second device.
  • the data processing system may be used to implement the operations performed by the first device or the second device in the foregoing method embodiments and any possible implementation manners of the method embodiments.
  • the disclosed system, device and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of units is only a logical business division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each business unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software business units.
  • the integrated unit is realized in the form of a software business unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the computer software products are stored in a storage medium, including several instructions to make a computer device (It may be a personal computer, a server, or a network device, etc.) Execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
  • the services described in this application may be implemented by hardware, software, firmware or any combination thereof.
  • the services When implemented in software, the services may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
  • Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
  • a storage media may be any available media that can be accessed by a general purpose or special purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种数据处理方法,第一装置获取存储空间中第一数据块的第一特征。该第一特征包括与第一数据块自身相关的读/写特征以及与第一数据块相邻的第二数据块的读取特征。第一装置将第一特征输入访问频率预测模型,预测第一数据块的访问频率。其中,第一数据块的访问频率用于确定第一数据块的类型。即,本申请实施例通过第一数据块的多维特征(第一数据块自身的读/写特征和相邻数据块的读取特征),对第一数据块的未来访问频率进行预测,提高预测准确性,还可以根据第一数据块的访问频率确定第一数据块的类型,以对该第一数据块进行相应处理,提高存储系统的使用性能。

Description

一种数据处理方法、装置及系统
本申请要求于2021年8月31日提交的申请号为202111018067.4、申请名称为“一种数据处理方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种数据处理方法、装置及系统。
背景技术
随着大数据、云计算以及人工智能等技术的不断发展,对数据库的性能、容量要求也越来越高。现有的数据库多采用混合存储架构,每一层存储介质对应不同的存取速度和存储成本。根据业务数据的价值、生命周期等因素准确地将数据存储在合适的介质,以优化访问效率是非常重要的。
通常情况下,数据库中存放的数据有冷热之分,热数据指的是实时性查询要求高、访问频次较高的数据,将该类数据采用高效云盘存储,满足高性能访问的需求。冷数据指的是查询频度相对较低、访问频次较低的数据,该类数据采用低价的冷数据存储,满足高性价比的存储需求。如何在保证系统性能的情况下,区分不同类型的数据,是急需解决的技术问题。
发明内容
本申请实施例提供了一种数据处理方法、装置及系统,以实现区分不同类型的数据,提高系统性能。
为实现上述目的,在本申请实施例的第一方面,提供了一种数据处理方法,所述方法包括:第一装置获取第一数据块的第一特征,所述第一数据块为存储空间中的任一数据块,所述第一特征包括与所述第一数据块相关的读/写特征以及与所述第一数据块相邻的第二数据块的读取特征;所述第一装置将所述第一特征输入访问频率预测模型,获得所述访问频率预测模型输出的所述第一数据块的访问频率,所述第一数据块的访问频率用于确定所述第一数据块的类型。在该实现方式中,为确定第一数据块的类型,第一装置获取第一数据块的第一特征,该第一特征不仅包括第一数据块自身的读/写特征,还包括与第一数据块相邻的第二数据块的读取特征,即,通过多维特征来反映第一数据块的访问情况。第一装置将获取的第一特征输入预先训练好的访问频率预测模型,从而通过该访问频率预测模型来确定未来时刻第一数据块对应的访问频率,以根据该访问频率确定第一数据块的类型。
在一种可能的实现方式中,所述第一装置获取第一数据块的第一特征,包括:所述第一装置接收第二装置发送的所述第一数据块的第一特征。
在一种具体的实现方式中,所述方法还包括:所述第一装置向所述第二装置发送所述第一数据块的访问频率。
在一种可能的实现方式中,所述第一数据块的类型为冷数据或热数据。在该实现方式中,利用第一数据块的访问频率可以确定第一数据块属于冷数据还是热数据,进而根据第一数据的类型确定第一数据的存储位置,提高存储性能。例如,当第一数据块为热数据时,表明第一数据块被访问的频率较高,为提高第一数据块的读取速度,则可以将第一数据块存储在性能层。当第一数据块为冷数据时,表明第一数据块的访问频率较低,为避免第一 数据块的存储对其他热数据的读取速度的影响,则将第一数据块存储在容量层,减少对性能层的资源占用。
在一种可能的实现方式中,所述方法还包括:在所述第一特征的特征长度发生变化时,所述第一装置更新所述访问频率预测模型,获得更新后的访问频率预测模型。在该实现方式中,当第一特征的特征长度发生变化时,为保证预测的准确性,需要对访问频率预测模型进行更新,以利用更新后的访问频率预测模型对第一数据块的访问频率进行预测。
在一种可能的实现方式中,所述方法还包括:所述第一装置获取第一数据块的第二特征,所述第一特征的特征长度与所述第二特征的特征长度不同;所述第一装置将所述第二特征输入所述更新后的访问频率预测模型,获得所述更新后的访问频率预测模型输出的所述第一数据块的访问频率。在该实现方式中,在对访问频率预测模型进行更新后,为提高预测第一数据块的访问频率的准确性,重新获取第一数据块的特征,即第二特征,以利用第二特征以及更新后的访问频率预测模型获得第二数据块的访问频率。
在一种可能的实现方式中,所述方法还包括:在第一特征的特征长度发生变化时,第一装置接收第二装置发送的中断信号,该中断信号用于指示第一装置暂停使用访问频率预测模型对第一数据块的访问频率进行预测。
在本申请实施例第二方面,提供了一种数据处理方法,所述方法还包括:第二装置获取第一数据块的访问频率,所述第一数据块的访问频率是基于访问频率预测模型以及所述第一数据块的第一特征获取的,所述第一特征包括与所述第一数据块相关的读/写特征以及与所述第一数据块相邻的第二数据块的读取特征,所述第一数据块为存储空间中任一数据块;所述第二装置根据各所述第一数据块的访问频率确定所述存储空间中各所述第一数据块的类型。在该实现方式中,第二装置在需要确定出存储空间中各个数据块的类型时,可以获取各个数据块对应的未来访问频率,并根据各个数据块的访问频率来确定各个数据块的类型。
在一种可能的实现方式中,所述第二装置根据各所述第一数据块的访问频率确定所述存储空间中各所述第一数据块的类型,包括:所述第二装置根据各所述第一数据块的访问频率进行排序,获得排序结果;所述第二装置根据所述排序结果确定所述存储空间中各所述第一数据块的类型。在该实现方式中,第二装置在获取到各第一数据块的访问频率后,可以按照访问频率从大到小的顺序对各第一数据块进行排序,然后根据排序结果确定各第一数据块的类型。例如,将前k个第一数据块确定为热数据,剩余的第一数据块为冷数据。
在本申请实施例第一方面或第二方面所述的数据处理方法的一种可能的实现方式中,所述第一特征包括所述第一数据块在T个历史时间点中每个历史时间点的特征,所述第一数据块在T个历史时间点的读/写特征之间具有时空相关性,所述T为正整数。在该实现方式中,第一特征包括了第一数据块在多个历史时间点的特征,从而在对第一数据块的访问频率进行预测时,考虑了历史特征的影响,提高预测的准确性。
在本申请实施例第一方面或第二方面所述的数据处理方法的一种可能的实现方式中,所述与所述第一数据块相关的读/写特征包括以下至少一种:读/写所述第一数据块的频率特征、读取所述第一数据块的长度特征、读取所述第一数据块的排布特征。
在本申请实施例第一方面或第二方面所述的数据处理方法的一种可能的实现方式中,所述与所述第一数据块相关的读/写特征满足以下至少一种:
所述读/写所述第一数据块的频率特征包括以下至少一种:一个或多个访问接口对所述第一数据块的读频率、一个或多个访问接口对所述第一数据块的写频率以及一个或多个访问接口对所述第一数据块的读写总频率;和/或,所述读取所述第一数据块的长度特征包括以下至少一种:最大读取长度、最小读取长度或平均读取长度;和/或,所述读取所述第一数据块的排布特征包括以下至少一种:读取第一长度的次数、读取第二长度的次数、所述读取第一长度的次数占总读取次数的比例或所述读取第二长度的次数占总读取次数的比例中的一种,所述第一长度是指长度为2n,所述第二长度是指长度非2n,n为正整数。
在本申请实施例第一方面或第二方面所述的数据处理方法的一种可能的实现方式中,所述与所述第一数据块的相邻的第二数据块的读取特征至少包括L个所述第二数据块各自对应的读取频率特征,所述L为正整数。
在本申请实施例第一方面或第二方面所述的数据处理方法的一种可能的实现方式中,所述访问频率预测模型包括第一子模型和第二子模型,所述第一子模型用于提取所述第一特征的时间特性,第二子模型用于提取所述第一特征的空间特性。
在本申请实施例第一方面或第二方面所述的数据处理方法的一种可能的实现方式中,所述访问频率预测模型还包括第三子模型,所述第三子模型用于获取所述第一特征的特征尺度。
在本申请实施例第一方面或第二方面所述的数据处理方法的一种可能的实现方式中,所述访问频率预测模型是根据待训练特征以及所述待训练特征对应的标签训练生成的,所述待训练特征对应的标签为访问频率。
在一种可能的实现方式中,所述方法还包括:在所述第一特征的特征长度发生变化时,所述第二装置获取新的待训练数据集合,所述新的待训练数据集合包括多个待训练特征以及所述多个待训练特征对应的标签,所述新的待训练数据集合用于对所述访问频率预测模型进行训练,获得更新后的访问频率预测模型。在该实现方式中,当第一特征的特征长度发生变化时,说明进行访问频率预测所依据的特征发生了变化,为保证预测的准确性,需要对访问频率预测模型进行更新,则第二装置获得新的待训练数据集合,以利用该新的待训练数据集合对访问频率预测模型进行训练,获得更新后的访问频率预测模型,以便利用更新后的访问频率预测模型对第一数据的访问频率进行预测。
在一种可能的实现方式中,所述方法还包括:在所述第一特征的特征长度发生变化时,第二装置还可以向第一装置发送中断信号,该中断信号用于指示第一装置停止使用当前的访问频率预测模型对所述第一数据块的访问频率进行预测。
在一种可能的实现方式中,所述方法还包括:所述第二装置向所述第三装置发送所述新的待训练数据集合,以使得所述第三装置利用所述新的待训练数据集合对所述访问频率预测模型进行训练,获得所述更新后的访问频率预测模型;所述第二装置接收所述第三装置发送的所述更新后的访问频率预测模型。在该实现方式中,第二装置在获取到新的待训练数据集合后,可以将该新的待训练数据集合发送给第三装置,以由第三装置利用新的待 训练数据集合对访问频率预测模型进行重新训练,以得到训练后的访问频率预测模型。
在一种可能的实现方式中,所述方法还包括:所述第二装置向所述第一装置发送所述更新后的访问频率预测模型。
在一种可能的实现方式中,所述第一特征的特征长度发生变化包括历史时间点个数T发生变化和/或与所述第一数据块的相邻的第二数据块的个数L发生变化,所述T为正整数,所述L为正整数。
在一种可能的实现方式中,所述第二装置获取所述第一数据块的访问频率,包括:所述第二装置获取第一数据块的所述第一特征,并向所述第一装置发送所述第一数据块的第一特征;所述第二装置接收所述第一装置发送的所述第一数据块的访问频率,所述第一数据块的访问频率由所述第一装置利用所述访问频率预测模型以及所述第一特征获取的。
在本申请实施例第三方面,提供了一种访问频率预测模型训练生成方法,该方法包括:第三装置获取待训练数据集合,所述待训练数据集合包括多个待训练特征以及所述多个待训练特征对应的标签,所述标签为访问频率,所述多个待训练特征中每个待训练特征包括与待训练数据块相关的读/写特征以及与所述待训练数据块相邻的第三数据块的读取特征,所述待训练数据块为存储空间中的任一数据块;所述第三装置利用所述待训练数据集合对初始网络模型进行训练,生成访问频率预测模型。
在一种可能的实现方式中,所述待训练特征包括所述待训练数据块在T个历史时间点中每个历史时间点的特征,所述待训练数据块在T个历史时间点的读/写特征之间具有时空相关性,所述T为正整数。
在一种可能的实现方式中,所述访问频率预测模型包括第一子模型和第二子模型,所述第一子模型用于提取所述待训练特征的时间特性,所述第二子模型用于提取所述待训练特征的空间特性。
在一种可能的实现方式中,所述访问频率预测模型还包括第三子模型,所述第三子模型用于处理时间序列的模型。
在一种可能的实现方式中,所述与所述待训练数据块相关的读/写特征包括以下至少一种:读/写所述待训练数据块的频率特征、读取所述待训练数据块的长度特征、读取所述待训练数据块的排布特征。
在一种可能的实现方式中,所述与所述待训练数据块相关的读/写特征满足以下至少一种:所述读/写所述待训练数据块的频率特征包括以下至少一种:所有访问接口对所述待训练数据块的读频率、所有访问接口对所述待训练数据块的写频率以及所有访问接口对所述待训练数据块的读写总频率;和/或,所述读取所述待训练数据块的长度特征包括以下至少一种:最大读取长度、最小读取长度或平均读取长度;和/或,所述读取所述待训练数据块的排布特征包括以下至少一种:读取第一长度的次数、读取第二长度的次数、所述读取第一长度的次数占总读取次数的比例或所述读取第二长度的次数占总读取次数的比例,所述第一长度是指长度为2n,所述第二长度是指长度非2n,n为正整数。
在一种可能的实现方式中,所述与所述待训练数据块的相邻的第三数据块的读取特征至少包括L个所述第三数据块各自对应的读取频率特征,所述L为正整数。
在一种可能的实现方式中,所述第三装置获取待训练数据集合,包括:所述第三装置接收第二装置发送的待训练数据集合。
在一种可能的实现方式中,所述方法还包括:所述第三装置向第一装置发送所述访问频率预测模型。
在一种可能的实现方式中,所述方法还包括:所述第三装置向第二装置发送所述访问频率预测模型,以通过所述第二装置将所述访问频率预测模型转发给第一装置。
在一种可能的实现方式中,所述方法还包括:所述第三装置接收所述第二装置发送的变化后的特征长度以及新的待训练数据集合;所述第三装置利用所述变化后的特征长度对所述新的待训练数据集合进行解析,获得多个新的待训练特征;所述第三装置利用所述多个新的待训求特征以及所述多个新的待训练特征对应的标签继续对所述访问频率预测模型进行训练,获得更新后的访问频率预测模型。
在本申请实施例第四方面,提供了一种数据处理装置,所述装置包括:第一获取单元,用于获取第一数据块的第一特征,所述第一数据块为存储空间中的任一数据块,所述第一特征包括与所述第一数据块相关的读/写特征以及与所述第一数据块相邻的第二数据块的读取特征;处理单元,用于将所述第一特征输入访问频率预测模型,获得所述访问频率预测模型输出的所述第一数据块的访问频率,所述第一数据块的访问频率用于确定所述第一数据块的类型。
在一种可能的实现方式中,所述装置还包括:接收单元,用于接收第二装置发送的所述第一数据块的第一特征;所述第一获取单元,具体用于获取所述接收单元接收的所述第一特征。
在一种可能的实现方式中,所述装置还包括:发送单元,用于向所述第二装置发送所述第一数据块的访问频率。
在一种可能的实现方式中,所述第一数据块的类型为冷数据或热数据。
在一种可能的实现方式中,所述处理单元,还用于在所述第一特征的特征长度发生变化时,更新所述访问频率预测模型,获得更新后的访问频率预测模型。
在一种可能的实现方式中,所述第一获取单元,还用于获取第一数据块的第二特征,所述第一特征的长度与所述第二特征的长度不同;所述处理单元,还用于将所述第二特征输入所述更新后的访问频率预测模型,获得所述更新后的访问频率预测模型输出的所述第一数据块的访问频率。
在本申请实施例第五方面,提供了一种数据处理装置,所述装置还包括:第二获取单元,用于获取第一数据块的访问频率,所述第一数据块的访问频率是基于访问频率预测模型以及所述第一数据块的第一特征获取的,所述第一特征包括与所述第一数据块相关的读/写特征以及与所述第一数据块相邻的第二数据块的读取特征,所述第一数据块为存储空间中任一数据块;处理单元,用于根据各所述第一数据块的访问频率确定所述存储空间中各所述第一数据块的类型。
在一种可能的实现方式中,所述处理单元,具体用于根据各所述第一数据块的访问频率进行排序,获得排序结果;根据所述排序结果确定所述存储空间中各所述第一数据块的 类型。
在本申请实施例第四方面或第五方面所述的数据处理装置的一种可能的实现方式中,所述第一特征包括所述第一数据块在T个历史时间点中每个历史时间点的特征,所述第一数据块在T个历史时间点的读/写特征之间具有时空相关性,所述T为正整数。
在本申请实施例第四方面或第五方面所述的数据处理装置的一种可能的实现方式中,所述与所述第一数据块相关的读/写特征包括以下至少一种:读/写所述第一数据块的频率特征、读取所述第一数据块的长度特征、读取所述第一数据块的排布特征。
在本申请实施例第四方面或第五方面所述的数据处理装置的一种可能的实现方式中,所述与所述第一数据块相关的读/写特征满足以下至少一种:所述读/写所述第一数据块的频率特征包括以下至少一种:所有访问接口对所述第一数据块的读频率、所有访问接口对所述第一数据块的写频率以及所有访问接口对所述第一数据块的读写总频率;和/或,所述读取所述第一数据块的长度特征包括以下至少一种:最大读取长度、最小读取长度或平均读取长度;和/或,所述读取所述第一数据块的排布特征包括以下至少一种:读取第一长度的次数、读取第二长度的次数、所述读取第一长度的次数占总读取次数的比例或所述读取第二长度的次数占总读取次数的比例,所述第一长度是指长度为2n,所述第二长度是指长度非2n,n为正整数。
在本申请实施例第四方面或第五方面所述的数据处理装置的一种可能的实现方式中,所述与所述第一数据块相邻的第二数据块的读取特征至少包括L个所述第二数据块各自对应的读取频率特征,所述L为正整数。
在本申请实施例第四方面或第五方面所述的数据处理装置的一种可能的实现方式中,所述访问频率预测模型包括第一子模型和第二子模型,所述第一子模型用于提取所述第一特征的时间特性,第二子模型用于提取所述第一特征的空间特性。
在本申请实施例第四方面或第五方面所述的数据处理装置的一种可能的实现方式中,所述访问频率预测模型还包括第三子模型,所述第三子模型用于获取所述第一特征的特征尺度。
在本申请实施例第四方面或第五方面所述的数据处理装置的一种可能的实现方式中,所述访问频率预测模型是根据待训练特征以及所述待训练特征对应的标签训练生成的,所述待训练特征对应的标签为访问频率。
在一种可能的实现方式中,所述装置还包括:第三获取单元,用于在所述第一特征的特征长度发生变化时,获取新的待训练数据集合,所述新的待训练数据集合包括多个待训练特征以及所述多个待训练特征对应的标签,所述新的待训练数据集合用于对所述访问频率预测模型进行训练,获得更新后的访问频率预测模型。
在一种可能的实现方式中,所述装置还包括:发送单元,用于向所述第三装置发送所述新的待训练数据集合,以使得所述第三装置利用所述新的待训练数据集合对所述访问频率预测模型进行训练,获得所述更新后的访问频率预测模型;接收单元,用于接收所述第三装置发送的所述更新后的访问频率预测模型。
在一种可能的实现方式中,所述装置还包括:发送单元,用于向所述第一装置发送所 述更新后的访问频率预测模型。
在一种可能的实现方式中,所述第一特征的特征长度发生变化包括历史时间点个数T发生变化和/或与所述第一数据块的相邻的第二数据块的个数L发生变化,所述T为正整数,所述L为正整数。
在一种可能的实现方式中,所述装置还包括:接收单元和发送单元,所述第二获取单元,具体用于获取第一数据块的所述第一特征;所述发送单元,用于向所述第一装置发送所述第一数据块的第一特征;所述接收单元,用于接收所述第一装置发送的所述第一数据块的访问频率,所述第一数据块的访问频率由所述第一装置利用所述访问频率预测模型以及所述第一特征获取的。
在本申请实施例第六方面,提供了一种访问频率预测模型训练生成装置,该装置包括:第四获取单元,用于获取待训练数据集合,所述待训练数据集合包括多个待训练特征以及所述多个待训练特征对应的标签,所述标签为访问频率,所述多个待训练特征中每个待训练特征包括与待训练数据块相关的读/写特征以及与所述待训练数据块相邻的第三数据块的读取特征,所述待训练数据块为存储空间中的任一数据块;生成单元,用于利用所述待训练数据集合对初始网络模型进行训练,生成访问频率预测模型。
在一种可能的实现方式中,所述待训练特征包括所述待训练数据块在T个历史时间点中每个历史时间点的特征,所述待训练数据块在T个历史时间点的读/写特征之间具有时空相关性,所述T为正整数。
在一种可能的实现方式中,所述访问频率预测模型包括第一子模型和第二子模型,所述第一子模型用于提取所述待训练特征的时间特性,所述第二子模型用于提取所述待训练特征的空间特性。
在一种可能的实现方式中,所述访问频率预测模型还包括第三子模型,所述第三子模型用于处理时间序列的模型。
在一种可能的实现方式中,所述与所述待训练数据块相关的读/写特征包括至少以下一种:读/写所述待训练数据块的频率特征、读取所述待训练数据块的长度特征、读取所述待训练数据块的排布特征。
在一种可能的实现方式中,所述与所述待训练数据块相关的读写特征满足以下至少一种:所述读/写所述待训练数据块的频率特征包括以下至少一种:一个或多个访问接口对所述待训练数据块的读频率、一个或多个访问接口对所述待训练数据块的写频率以及一个或多个访问接口对所述待训练数据块的读写总频率;和/或,所述读取所述待训练数据块的长度特征包括以下至少一种:最大读取长度、最小读取长度或平均读取长度;和/或,所述读取所述待训练数据块的排布特征包括以下至少一种:读取第一长度的次数、读取第二长度的次数、所述读取第一长度的次数占总读取次数的比例或所述读取第二长度的次数占总读取次数的比例,所述第一长度是指长度为2n,所述第二长度是指长度非2n,n为正整数。
在一种可能的实现方式中,所述与所述待训练数据块的相邻的第三数据块的读取特征至少包括L个所述第三数据块各自对应的读取频率特征,所述L为正整数。
在一种可能的实现方式中,所述第四获取单元,具体用于接收第二装置发送的待训练 数据集合。
在一种可能的实现方式中,所述装置还包括:发送单元,用于向第一装置发送访问频率预测模型。
在一种可能的实现方式中,所述装置还包括:发送单元,用于向第二装置发送所述访问频率预测模型,以通过所述第二装置将所述访问频率预测模型转发给第一装置。
在一种可能的实现方式中,所述装置还包括:接收单元,用于接收所述第二装置发送的变化后的特征长度以及新的待训练数据集合;所述第三装置利用所述变化后的特征长度对所述新的待训练数据集合进行解析,获得多个新的待训练特征;所述生成单元,还用于利用所述多个新的待训求特征以及所述多个新的待训练特征对应的标签继续对所述访问频率预测模型进行训练,获得更新后的访问频率预测模型。
在本申请实施例第七方面,提供一种数据处理系统,所述系统包括:第一装置、第二装置;所述第一装置,用于执行第一方面所述的数据处理方法;所述第二装置,用于执行第二方面所述的数据处理方法.
在一种可能的实现方式中,所述系统还包括:第三装置;
所述第三装置,用于执行第三方面所述的方法,用于训练生成访问频率预测模型。
在本申请实施例第八方面,提供了一种数据处理装置,包括与存储器耦合的处理器,所述处理器用于执行所述存储器中的计算机指令,使得所述装置第一方面其任一种可能的实现、第二方面其任一种可能的实现或第三方面其任一种可能的实现所述的方法。
在本申请实施例第九方面,提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行以上第一方面及其任一种可能的实现、第二方面及其任一种可能的实现或第三方面及其任一种可能的实现所述的方法。
在本申请实施例第十方面,提供了一种计算机程序产品,所述计算机程序产品在设备上运行时,使得所述设备执行第一方面及其任一种可能的实现、第二方面及其任一种可能的实现或第三方面及其任一种可能的实现方法。
在本申请实施例第十一方面,提供了一种第一装置,包括处理器,用于执行存储器中存储的计算机程序(或计算机可执行指令),当计算机程序(或计算机可执行指令)被执行时,使得该装置执行如第一方面及第一方面各个可能的实现中的方法。
在一种可能的实现中,处理器和存储器集成在一起;
在另一种可能的实现中,上述存储器位于该第一装置之外。
该第一装置还包括通信接口,该通信接口用于该第一装置与其他设备进行通信,例如数据和/或信号的发送或接收。示例性的,通信接口可以是收发器、电路、总线、模块或其它类型的通信接口。
在本申请实施例第十二方面,还提供了一种第二装置,包括处理器,用于执行存储器中存储的计算机程序(或计算机可执行指令),当计算机程序(或计算机可执行指令)被执行时,使得该装置执行如第二方面及第二方面各个可能的实现中的方法。
在一种可能的实现中,处理器和存储器集成在一起;
在另一种可能的实现中,存储器位于该第二装置之外。
该第二装置还包括通信接口,该通信接口用于该第二装置与其他设备进行通信,例如数据和/或信号的发送或接收。示例性的,通信接口可以是收发器、电路、总线、模块或其它类型的通信接口。
在本申请实施例第十三方面,提供了一种第三装置,包括处理器,用于执行存储器中存储的计算机程序(或计算机可执行指令),当计算机程序(或计算机可执行指令)被执行时,使得该装置执行如第三方面及第三方面各个可能的实现中的方法。
在一种可能的实现中,处理器和存储器集成在一起;
在另一种可能的实现中,上述存储器位于该第三装置之外。
该第三装置还包括通信接口,该通信接口用于该第三装置与其他设备进行通信,例如数据和/或信号的发送或接收。示例性的,通信接口可以是收发器、电路、总线、模块或其它类型的通信接口。
在本申请实施例第十四方面,还提供一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现上述第一方面及其任一种可能的实现、第二方面及其任一种可能的实现或第三方面及其任一种可能的实现中所述的方法。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
通过本申请实施例提供的技术方案,通过预先训练生成访问频率预测模型,第一装置获取存储空间中每个数据块的第一特征,即第一数据块的第一特征。该第一特征包括与第一数据块自身相关的读/写特征以及与第一数据块相邻的第二数据块的读取特征。第一装置将第一数据块的第一特征输入访问频率预测模型,获得对于第一数据块在未来时刻所对应的访问频率。其中,第一数据块对应的访问频率用于确定第一数据块的类型。例如,根据第一数据块对应的访问频率确定第一数据块属于冷数据还是热数据。即,本申请实施例通过获取第一数据块的多维特征(第一数据块自身的读/写特征,以及相邻数据块的读取特征),并利用该多维特征对第一数据块的未来访问频率进行预测,不仅提高预测准确性,还可以根据第一数据块的访问频率确定第一数据块的类型,进而对该第一数据块进行相应的处理,提高存储系统的使用性能。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种存储分层结构图;
图2为本申请实施例提供的一种数据处理方法流程图;
图3a为本申请实施例提供的一种数据块存储示意图;
图3b为本申请实施例提供的一种访问频率预测模型结构图;
图4为本申请实施例提供的另一种数据处理方法流程图;
图5为本申请实施例提供的又一种数据处理方法流程图;
图6为本申请实施例提供的一种数据处理框架图;
图7为本申请实施例提供的一种仿真结构比较图;
图8为本申请实施例提供的一种数据处理装置结构图;
图9为本申请实施例提供的另一种数据处理装置结构图;
图10为本申请实施例提供的一种网络设备结构图;
图11为本申请实施例提供的一种数据处理装置结构图。
具体实施方式
为了使本技术领域的人员更好地理解本申请中的方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
随着大数据、云计算等技术的发展,为了满足海量数据的存储、利用、计算与分析,传统的做法是提供十几个不同的数据库产品以分别满足相应的需求。但随着数据量的不断增多、数据类型的不断丰富,整体维护性和数据一致性管理成本大大增加,进而影响到整个系统的使用。
为解决数据的存储问题,提出了一种存储模式,即智能多模存储,其根据业务数据的价值、生命周期等因素,准确地将数据存储在合适的介质,以优化数据访问效率。其中,数据流动是智能多模存储的关键技术,支撑数据同层或跨层流动任务的管理和调度,以实现数据高效流动。冷热数据识别为数据流动中重要一环,通过对存储的数据识别,以根据识别结果确定该数据流动方案,使得热数据尽可能放置在性能层、冷数据放置在容量层,提高存储介质的使用效率。其中,性能层是指存储器件中较为靠顶层的存储层,该存储层传输速率较高,容量层是指存储器中较为靠底层的存储层,该存储层传输速率较低。如图1所示,存储分层结构,将靠近中央处理器(central processing unit,CPU)的存储区间作为性能层,将远离CPU的存储区间作为容量层。因此,如何准确地识别数据的类型,以对不同类型的数据进行处理,提高存储系统的性能,是实现智能多模存储的关键。目前,针对数据类型的识别,提供了以下几种方案:一种方案是,利用指数平滑的方式对数据的访问频率进行预测,但该方案只能进行短期的访问频率进行准确预测,无法对长期的访问频率进行准确预测。一种方案是,利用马尔科夫链对数据的访问频率进行预测,该方案能够学习状态的静态分布,并采用分层结构使得转移概率矩阵规模减小,但由于马尔科夫假设的局限性,状态空间爆炸,模型难以收敛,复杂度较高。一种方案是,利用生成对抗网络来对数据的访问频率进行预测,该方案能够学习特征的分布,对访问频率进行预测,但模型复杂度稿、难收敛,推断时延较高。再一种方案是,利用聚类和神经网络模型来对数据进行分类,该方案可以利用地址空间的特征进行聚类,并针对每个簇建模,减小神经网络模型的大小,但地址覆盖率较低,预测粒度较大。可见,传统的数据类型识别方法存在各种问题,如无法兼顾长期预测准确度、模型复杂度高以及预测延时大等。
基于此,本申请实施例提供了一种数据处理方法,利用神经网络模型来识别数据的类型,进而根据该数据的类型进行不同的处理,不仅提高识别的准确性,还可以提高系统处理数据的性能。具体为,预先训练访问频率预测模型,当需要对存储空间的数据进行处理时,获取存储空间中每个数据块(第一数据块)的第一特征,该第一特征包括与第一数据块相关的读/写特征以及与第一数据块相邻的第二数据块的读取特征。将获取的第一特征输 入访问频率预测模型,以获得第一数据块的访问频率,以根据该第一数据块的访问频率确定第一数据块的类型,进而进行匹配的处理。可见,通过获取与第一数据块相关的多维特征,以通过该多维特征对第一数据块未来的访问频率进行预测,提高预测的准确性。同时,根据第一数据块的访问频率确定第一数据块的类型,进而对该第一数据块进行相应的处理,提高系统的处理性能。
数据存在的价值,在于其被查询或更新的频率,在不同的业务系统中,用户对于不同数据存在不同的使用需求。例如,在互联网数据库的流量热度分析系统中,该系统所记录的某些数据块对于分析至关重要,则系统对该些数据块的查询次数较为频繁,而另一些数据块对于分析无关紧要,系统对该些数据的查询次数较为稀疏。为便于系统快速查询,需要将频繁查询的数据块存储在性能层,较少查询的数据块存储在容量层。因此,可以采用本申请实施例提供的方法,获取系统所存储的各数据块的第一特征,将数据块的第一特征输入访问频率预测模型,利用该访问频率预测模型获得该数据块的访问频率。在获取各个数据块的访问频率后,基于访问频率的大小确定该数据块所存储的位置。
为便于理解本申请实施例提供的方案,下面将结合附图进行说明。
参见图2,该图为本申请实施例提供的一种数据处理方法流程图,如图2所示,该方法可以包括:
S201:第一装置获取第一数据块的第一特征。
本实施例中,对于存储空间中的任一数据块,为了得到该数据块(第一数据块)在未来时间内的访问频率,获取该第一数据块的第一特征。该第一特征包括与第一数据块相关的读/写特征以及与第一数据块相邻的第二数据块的读取特征。也就是,第一特征不仅包括第一数据块自身相关的特征,还包括邻居数据块的特征,从而多维度地反映第一数据块的特点。
其中,与第一数据块相关的读/写特征包括:读/写第一数据块的频率特征、读取第一数据块的长度特征、读取第一数据块的排布特征中的一种或多种。读/写第一数据块的频率特征可以包括但不限于一个或多个访问接口对第一数据块的读频率、一个或多个访问接口对第一数据块的写频率、以及一个或多个访问接口对第一数据块的读写总频率。其中,
一种可能的实现中,所有访问接口对第一数据块的写频率是指对第一数据块的数据进行增加、删除、修改等频率。例如,所有访问接口读取第一数据块的频率为x、所有访问接口写入第一数据块的频率为y,则所有访问接口对第一数据块的读写总频率为x+y。
一种可能的实现中,读取第一数据块的长度特征包括但不限于最大读取长度、最小读取长度和平均读取长度。
一种可能的实现中,读取第一数据块的排布特征包括但不限于读取第一长度的次数、读取第二长度的次数、读取第一长度的次数占总读取次数的比例以及读取第二长度的次数占总读取次数的比例。其中,第一长度是指长度为2n,第二长度为非2n,n为正整数。
其中,与第一数据块相邻的第二数据块的读取特征至少包括L个第二数据块各自对应的读取频率,L为正整数。其中,L个第二数据块可以为以第一数据块为基准,可以向前 取L/2个第二数据块以及向后取L/2个第二数据块。当L/2为非整数时,可以向上取整。其中,与第一数据块相邻的第二数据块可以包括与第一数据块直接相邻的第二数据块,也包括与第一数据块间接相邻的第二数据块。其中,数据块可以为存储空间中两个存储地址之间所存储的数据。通常情况下,可以将存储空间划分为多个数据块。例如图3a所示,将存储器件划分为多个存储空间,其中,从地址000000-011111用于存储数据块1、地址100000-101111用于存储数据块2、地址110000-111000用于存储数据块3等。其中,第一数据块可以为数据块2、第二数据块分别为数据块1和数据块3。
需要说明的是,对于第一数据块的访问情况在时间上具有相关性,为提高预测的准确性,第一特征可以包括第一数据块在T个历史时间点中每个历史时间点的特征,第一数据块在T个历史时间点的读/写频率特征之间具有时空相关性,T为正整数。其中,T个历史时间点为位于进行预测的当前时间点之前的时间。其中,第一数据块在T个历史时间点的读/写频率特征之间具有时间相关性,是指不同时间点的读/写特征具有关联关系。其中,不同时间点的关联关系可以为两个时间点的读/写频率同方向变化。例如,如果在t1时间点读取了第一数据块,则在t2时间点也读取第一数据块,其中,t2时间点晚于t1时间点。或者不同时间点的关联关系为两个时间点的读/写频率反方向变化。例如,在t1时间点读取了第一数据块,则在t2时间点不读取第一数据块,其中,t2时间点晚于t1时间点。其中,第一数据块在T个历史时间点的读/写频率特征之间具有空间相关性,是指第一数据块的读/写频率与第二数据块的读/写频率在每个历史时间点上相互依赖。例如,在对第二数据块进行读取分析时,先获取第一数据块才可以对第二数据块进行分析,该情况下,第一数据块的读取频率受到第二数据块的读取频率的影响。
通过前述可知,对于给定的时间点,第一数据块在该时间点对应的特征长度可以为(10+L),则T个时间点对应的特征长度N=T*(10+L)。其中,10为与第一数据块相关的读/写特征的数量,如读/写第一数据块的3个频率特征、读取第一数据块的3个长度特征以及读取第一数据块的4个排布特征。
需要说明的是,由于在不同的时间点,具有相关性的历史时间点的个数可能发生变化,即T的取值是变化的。同理,在不同的时间点,具有相关性的邻居数据块的个数也可能发生变化,即L的取值也是变化的。因此,第一数据块对应的第一特征的长度也是在不断变化的。具体地,可以利用自相关函数获得第一数据块在不同时间点上的读/写频率之间的第一相关系数,以根据第一相关系数确定T。或者,可以利用协方差函数获得第一数据块的读/写频率与第二数据块的读/写频率之间的第二相关系数,并根据第二相关系数确定L。
S202:第一装置将第一特征输入访问频率预测模型,获得访问频率预测模型输出的第一数据块的访问频率。
在第一装置获取到第一数据块的第一特征后,将该第一特征输入访问频率预测模型,以通过该访问频率预测模型获得第一数据块的访问频率。其中,第一数据块的访问频率是指第一数据块的读取频率,可以用于确定第一数据块的类型。其中,第一数据块的类型可以为冷数据或热数据,或者,第一数据块的类型可以为垃圾数据或非垃圾数据等。其中,垃圾数据可以指在当前应用场景下对于系统的正常运行没有影响的数据,具体将根据实际 应用场景的需求进行设定。例如系统只需查询当年内的数据,则去年及更早的数据称为垃圾数据,或者系统升级或改造后,某些功能被去除,与该功能相关的部分表可能不会再用到,则该部分表的数据称为垃圾数据。其中,访问频率预测模型是根据待训练特征以及待训练特征对应的标签训练生成的,待训练特征对应的标签为访问频率。
一种可能的实现中,当第一装置具备数据处理功能时,第一装置在获取到各个第一数据块的访问频率时,可以根据各个第一数据块的访问频率确定各个第一数据块的类型,以便针对不同类型的数据块进行匹配的处理操作。例如,将属于热数据的第一数据块搬移至存储器的性能层,将属于冷数据的第一数据块搬移至存储器的容量层。
一种可能的实现中,第一装置根据各数据块的访问频率确定各个第一数据块的类型,具体可以为,第一装置按照访问频率从大到小的顺序进行排序,将前K个第一数据块确定为热数据,剩余的第一数据块为冷数据。或者,第一装置将访问频率大于等于访问频率阈值的第一数据块确定为热数据,将小于访问频率阈值的第一数据块确定为冷数据。
可选的,第一装置在获取到第一数据块的访问频率后,还可以根据第一数据块的访问频率确定该第一数据块是否为垃圾数据,如果是,则回收该第一数据块。或者,第一装置根据各第一数据块的访问频率进行碎片处理。其中,碎片处理是指通过系统软件或者专业的碎片整理软件对存储器件在长期使用过程中产生的碎片数据重新整理,从而使得相关的数据存在连续的扇区,提高存储性能和数据读取速度。或者,第一装置根据第一数据块的访问频率确定是否对第一数据块进行预取。其中,预取是指在CPU使用之前将所需的数据提取到与CPU更近的存储层,以便在真正需要时能及时的拿到,可以减少延时。
通过前述可知,第一特征不仅可以包括第一数据块在不同时间点的读/写特征,还可以包括第二数据块的读取特征,因此,访问频率预测模型可以提取第一特征中不同特征之间的时/空相关性。具体地,访问频率预测模型可以包括第一子模型和第二子模型,该第一子模型用于提取第一特征的时间特性,第二子模型用于提取第一特征的空间特性。例如,第一子模型为卷积神经网络模型(convolutional neural networks,CNN),第二子模型为长短期记忆网络(long short-term memory,LSTM)。另外,由于神经网络对于输入的尺度不敏感,为解决尺度不敏感问题,访问频率预测模型还可以包括第三子模型,该第三子模型用于获取第一特征的特征尺度。例如,第三子模型为自回归模型。其中,第一特征的特征尺度是指第一特征中每个特征的量纲的规模。关于访问频率预测模型的结构,可以参见图3b,该模型包括CNN层、LSTM层、全连接层以及自回归层。其中,全连接层的作用在于将前述各层学习到的第一特征综合起来,确定第一数据块的访问频率。
在一种可能的实现中,当第一特征的特征长度发生变化时,为保证预测的准确性,需要重新对访问频率预测模型进行训练,以获得更新后的访问频率预测模型。也就是,利用新的待训练特征以及该新的待训练特征对访问频率预测模型进行训练。其中,新的待训练特征的长度为变化后的特征长度。在获取更新后的访问频率预测模型后,第一装置将获取第一数据块的第二特征,该第二特征的特征长度与第一特征的特征长度不同;第一装置将第二特征输入更新后的访问频率预测模型,获得更新后的访问频率预测模型输出的第一数据块的访问频率。其中,第二特征所包括的特征种类与第一特征所包括的特征种类可以一 致,但二者的特征长度不同。其中,第一特征的特征长度发生变化可以包括历史时间点的个数T发生变化,和/或相邻第二数据块的个数L发生变化。在具体实现时,为避免频繁训练所带来的资源开销,通常情况当T或L的变化量大于预设阈值时,再对访问频率预测模型进行重新训练的操作。其中,T的变化量所对应的预设阈值以及L的变化量所对应的预设阈值可以根据实际需求进行设定。
通过获取与第一数据块相关的多维特征,以根据该多维特征对第一数据块未来的访问频率进行预测,提高预测的准确性。同时,根据第一数据块的访问频率确定第一数据块的类型,进而对该第一数据块进行相应的处理,提高系统的处理性能。
在一种可能的实现中,当第一装置具备获取第一数据块的第一特征的能力时,其可以自行采集第一数据块的第一特征。第一装置还可以接收第二装置发送的第一数据块的第一特征。其中,第二装置具备获取第一数据块的第一特征的能力。例如,第二装置为主机,主机中存储有大量的数据,第一装置为具备训练访问频率预测模型和利用访问频率预测模型对输入的第一特征进行访问频率预测的器件。进一步地,第一装置在获取到第一数据块的访问频率后,可以向第二装置发送该第一数据块的访问频率,由第二装置根据第一数据块的访问频率执行相关操作。
为便于理解,下面将结合附图对第一装置与第二装置之间的交互过程进行说明。
参见图4,该图为本申请实施例提供的一种数据处理交互流程图,如图4所示,该方法可以包括:
S401:第二装置获取第一数据块的第一特征。
本实施例中,第二装置负责采集存储空间中各个第一数据块的第一特征,该第一特征包括与第一数据块自身相关的读/写特征以及与第一数据块相邻的第二数据块的读取特征。其中,关于第一特征的具体描述可以参见S201中第一特征的相关描述,本实施例在此不再赘述。
S402:第二装置向第一装置发送第一数据块的第一特征,相应的,第一装置接收第二装置发送的第一数据块的第一特征值。
本实施例中,当第二装置采集到第一数据块的第一特征后,第二装置向第一装置发送第一数据块的第一特征,以通过第一装置对第一数据块的访问频率进行预测。
S403:第一装置将第一特征输入访问频率预测模型,获得访问频率预测模型输出的第一数据块的访问频率。
第一装置在获得第一数据块的第一特征后,将该第一特征输入到所存储的访问频率预测模型中,以通过该访问频率预测模型获得第一数据块在未来时刻的访问频率。其中,关于第一装置利用访问频率预测模型获得第一数据块的访问频率的实现,可以参见S202的相关描述,本实施例在此不再赘述。
其中,访问频率预测模型是根据待训练特征以及待训练特征对应的标签训练生成的,该待训练特征对应的标签为访问频率。第一装置中的访问频率预测模型可以由第一装置训练生成,也可以由第二装置训练生成,并发送给第一装置。在第一装置不具备数据采集能力时,第一装置在训练生成访问频率预测模型所使用的待训练特征以及待训练特征对应的 标签也是由第二装置发送的。
在一种可能的实施方式中,第二装置可以监测第一特征的特征长度的变化,在监测到第一特征的特征长度发生变化时,第二装置获取新的待训练数据集合,该新的待训练数据集合包括多个待训练特征以及多个待训练特征对应的标签。其中,新的待训练数据集合用于对访问频率预测模型进行训练,以获得更新后的访问频率预测模型。具体地,第二装置可以利用新的待训练数据集合对访问频率模型进行训练,获得更新后的访问频率预测模型,并向第一发送装置发送更新后的访问频率预测模型,相应的,第一装置接收第二装置发送的更新后的访问频率预测模型。或者,第二装置向第一装置发送新的待训练数据集合,相应的,第一装置接收第二装置发送的新的待训练数据集合,以利用新的待训练数据集合对访问频率模型进行训练,获得更新后的访问频率预测模型。其中,第一特征的特征长度的变化包括历史时间点个数T发生变化,和/或与第一数据块的相邻的第二数据块的个数L发生变化。T和L为正整数。在具体实现时,为避免频繁训练所带来的资源开销,通常情况在当T或L的变化量大于预设阈值时,再对访问频率预测模型进行重新训练的操作。其中,T的变化量对应的预设阈值以及L的变化量对应的预设阈值可以根据实际需求进行设定。
进一步地,当第二装置监测到第一特征的特征长度发生变化时,如果继续利用当前版本的访问频率预测模型进行访问频率预测,可能导致预测准确性降低。一种可能的实现中,当第二装置监测到第一特征的特征长度发生变化时,第二装置向第一装置发送中断信号,相应的,第一装置接收第二装置发送的中断信号。第一装置接收该中断信号后第一装置,不再利用当前的访问频率预测模型对第一数据块的访问频率进行预测。当第一装置获取到更新后的访问频率预测模型后,获取第一数据块的第二特征,将该第二特征输入更新后的访问频率预测模型,以对第一数据块的访问频率进行预测,从而保证预测的准确性。其中,第二特征的长度为变化后的长度,与第一特征的长度不同。
S404:第一装置向第二装置发送第一数据块的访问频率,相应的,第二装置接收第一装置发送的第一数据块的访问频率。
S405:第二装置根据各第一数据块的访问频率确定存储空间中各第一数据块的类型。
针对存储空间中的第一数据块,在第一装置获得该第一数据块的访问频率后,向第二装置发送第一数据块的访问频率,相应的,第二装置接收第一装置发送的第一数据块的访问频率。第二装置获得存储空间中各个第一数据块的访问频率后,可以根据各第一数据块的访问频率确定存储空间中各第一数据块的类型。其中,第一数据块的类型可以包括热数据或冷数据,还可以为垃圾数据或非垃圾数据。
另外,第二装置在获取第一数据块的访问频率后,还可以执行其他处理操作,例如根据第一数据块的访问频率确定是否对该数据块进行预取操作,或者根据各第一数据块的访问频率进行碎片整理操作等。
其中,关于本实施例中S405的实现,可以参见图2所示方法实施例中的相关描述,本实施例在此不再赘述。
在一些应用场景中,训练生成访问频率预测模型的器件也可以由第三装置来执行,第三装置在训练生成后,将该第三装置发送给第一装置。需要说明的是,第一装置、第二装 置和第三装置可以为独立的设备,也可以为同一设备上的不同的功能模块,本实施例对于上述三个模块的具体形态不进行限定。为便于理解,下面将结合附图进行说明。
参见图5,该图为本申请实施例提供的一种数据处理方法的交互图,如图5所示,该方法可以包括:
S501:第二装置获取待训练数据集合。
本实施例中,第二装置负责构建待训练数据集合,该待训练数据集合中可以包多个待训练特征以及多个待训练特征对应的标签,该标签可以为访问频率。其中,多个待训练特征中每个待训练特征包括与待训练数据块相关的读/写特征以及与待训练数据块相邻的第三数据块的读取特征。其中,待训练数据块可以为存储空间中的任一数据块。
其中,与待训练数据块相关的读/写特征包括但不限于读/写待训练数据块的频率特征、读取待训练数据块的长度特征、读取待训练数据块的排布特征。所述读/写所述待训练数据块的频率特征包括但不限于一个或多个访问接口对所述待训练数据块的读频率、一个或多个访问接口对所述待训练数据块的写频率以及一个或多个访问接口对所述待训练数据块的读写总频率。读取待训练数据块的长度特征包括但不限于最大读取长度、最小读取长度或平均读取长度。读取所述待训练数据块的排布特征包括但不限于读取第一长度的次数、读取第二长度的次数、读取第一长度的次数占总读取次数的比例、读取第二长度的次数占总读取次数的比例中。其中,第一长度是指长度为2n,第二长度是指长度非2n,n为正整数。
其中,与待训练数据块的相邻的第三数据块的读取特征至少包括L个第三数据块各自对应的读取频率特征,L为正整数。其中,L个第三数据块可以为以待训练数据块为基准,可以向前取L/2个第三数据块以及向后取L/2个第三数据块。
具体地,待训练特征包括待训练数据块在T个历史时间点中每个历史时间点的特征,该待训练数据块在T个历史时间点的读/写频率特征之间具有时空相关性。其中,T为正整数。
S502:第二装置向第三装置发送待训练数据集合,相应的,第三装置接收第二装置发送的待训练数据集合。
S503:第三装置利用待训练数据集合对初始网络模型进行训练,生成访问频率预测模型。
本实施例中,第三装置利用第二装置发送的待训练数据集对初始网络模型进行训练,以训练生成访问频率预测模型。其中,访问频率预测模型可以包括第一子模型和第二子模型,该第一子模型用于提取待训练特征的时间特性,第二子模型用于提取待训练特征的空间特性。例如,第一子模型为CNN网络,第二子模型为LSTM网络。进一步地,为解决神经网络模型的尺度不敏感的问题,访问频率预测模型还可以包括第三子模型,该第三子模型为用于获取待训练特征的特征尺度,待训练特征的特征尺度是指待训练特征中每个特征的量纲的规模。例如,第三子模型为自回归模型。
S504:第三装置向第二装置发送访问频率预测模型,相应的,第二装置接收第三装置发送的访问频率预测模型。
在具体实现时,当第二装置监测到特征长度发生变化时,第二装置获取新的待训练数 据集合,该新的待训练数据集合包括多个待训练特征以及多个待训练特征对应的标签;第二装置向第三装置发送新的待训练数据集合,相应的,第三装置接收第二装置发送的新的待训练数据集合,并利用新的待训练数据集合对访问频率进行预测模型进行训练,获得更新后的访问频率预测模型。第三装置可以向第二装置发送更新后的访问频率预测模型,以通过第二装置将该更新后的访问频率预测模型转发给第一装置。或者,第三装置直接向第一装置发送更新后的访问频率预测模型,相应的,第一装置接收第三装置发送的更新后的访问频率预测模型第一。在具体实现时,为避免频繁训练所带来的资源开销,通常情况在当T或L的变化量大于预设阈值时,再对访问频率预测模型进行重新训练的操作。
具体地,当第二装置监测到特征长度发生变化时,还可以向第三装置发送变化后的特征长度,相应的,第三装置接收第二装置发送的变化后的特征长度,以利用变化后的特征长度对待训练数据集合中的待训练特征进行划分,从而获得多个待训练特征。
另外,第三装置在训练生成访问频率预测模型后,还可以对该访问频率预测模型进行轻量化处理,以缩小访问频率预测模型的体积,减小对内存的占用。例如,对访问频率预测模型进行剪枝、压缩等处理。进一步地,为使得第一装置可以准确解析并利用访问频率预测模型,第三装置还可以将访问频率预测模型的文件转换成第一装置的接口所支持的格式。
S505:第二装置向第一装置转发访问频率预测模型,相应的,第一装置接收第二装置发送的访问频率预测模型。
S506:第二装置获取第一数据块的第一特征。
S507:第二装置向第一装置发送第一数据块的第一特征,相应的,第一装置接收第二装置发送的第一数据块的第一特征。
S508:第一装置将第一数据块的第一特征输入访问频率预测模型,以获得访问频率预测模型输出的第一数据块的访问频率。
S509:第一装置向第二装置发送第一数据块的访问频率,相应的,第二装置接收第一装置发送的第一数据块的访问频率。
需要说明的是,本实施例中关于S506-S509的具体实现可以参见图4所示实施例中S401-S404的相关描述,本实施例在此不再赘述。
其中,关于第二装置获取到第一数据块的访问频率后所执行的操作可以参见图4所示实施例中S405的相关描述,本实施例在此不再赘述。
为便于理解本申请实施例,参见图6所述的数据处理框架图,如图6所示,包括主机、训练卡和推理卡。其中,主机可以实现本申请实施例中第二装置的功能,推理卡可以实现本申请实施例中第一装置的功能,训练卡可以实现本申请实施例中第三装置的功能。其中,主机可以采集待训练数据集合,并将该待训练数据集合发送给训练卡,由训练卡利用待训练数据集合对模型进行训练,获得访问频率预测模型,并通过主机将该访问频率预测模型发送给推理卡。主机在获取到存储空间中某一数据块的第一特征后,主机向推理卡发送给数据块的第一特征,推理卡将该第一特征输入访问频率预测模型,从而获得数据块的访问 频率。推理卡向主机发送数据块的访问频率,以使得主机根据该数据块的访问频率确定数据块的类型。
一种可能的实现中,主机可以包括数据采集模块、特征提取模块、时空特性处理模块以及数据类型识别模块。数据采集模块,用于采集存储空间中的数据块。特征提取模块,用于提取数据块的特征,如果是训练过程,则还需要获得特征所对应的标签,以构建带标签的待训练数据集合;如果是预测过程,则只需提取数据块的特征。时空特性处理模块,用于按照自相关函数计算数据块在时间上的相关性,以确定时间相关参数T,以及按照协方差函数计算数据块在空间上的相关性,以确定空间相关参数L。数据类型识别模块,用于根据数据块的未来的访问频率确定该数据块的类型。
一种可能的实现中,训练卡可以包括训练模块、轻量化处理模块和格式匹配模块。训练模块,用于利用待训练数据集合对初始网络模型进行训练,获得访问频率预测模型。该训练模块可以为CPU或图形处理器(graphics processing unit,GPU)。轻量化处理模块,用于对访问频率预测模型进行轻量化处理,例如进行剪枝、压缩等处理,减小访问频率预测模型的体积,提升预测速度。格式匹配模块,用于将访问频率预测模型的文件格式转换成推理卡接口所支持的格式。
一种可能的实现中,推理卡可以包括模型迭代模块、模型加载模块、推理模块以及返回模块。其中,模型迭代模块,用于更新访问频率预测模型。模型加载模块,用于读入访问频率预测模型文件。推理模块,用于将接收的第一特征输入访问频率预测模型中,获得预测结果。返回模块,用于将预测结果发送给主机。
为进一步说明本申请实施例所提供的访问频率预测模型的预测结果的准确性,参见图7,该图展示了利用不同的模型对数据块的访问频率预测的准确性的比较。其中,图中的横坐标表示待预测的热数据块的个数与总数据块的个数的比例,例如,总共有1000个数据块,横坐标的0.1表示,按照访问频率从大到小排序,将前100个数据块作为热数据块。纵坐标表示预测准确率。从图7由下往上的仿真线依次为基于指数加权平均(exponential moving average,EMA)模型获得的仿真线、基于计数式布隆过滤器(countering bloom filter,CBF)模型获得的仿真线、基于最近最少使用(least recently used,LRU)模型获得的仿真线、基于线性回归(linear regression,LR)模型获得的仿真线、基于LSTM模型获得的仿真线、基于循环神经网络(recurrent neural network,RNN)模型获得的仿真线以及基于本申请实施例所提供的访问频率预测模型(our model)获得的仿真线。通过图7可知,在同一横坐标的情况下,本申请实施例所提供的访问频率预测模型的预测准确率高于其他模型的预测准确率。
基于上述方法实施例,本申请实施例提供了一种数据处理装置,下面将结合附图进行说明。
参见图8,该图为本申请实施例提供的一种数据处理装置结构图,该装置800可以实现上述实施例中第一装置的功能,一种可能的实现中,该装置800可以包括执行上述方法 实施例中第一装置执行的方法/操作/步骤/动作所一一对应的模块或单元,该单元可以是硬件电路,也可是软件,也可以是硬件电路结合软件实现。一种可能的实现中,该装置可以包括:第一获取单元801、处理单元802。
其中,第一获取单元801,用于获取第一数据块的第一特征,所述第一数据块为存储空间中的任一数据块,所述第一特征包括与所述第一数据块相关的读/写特征以及与所述第一数据块相邻的第二数据块的读取特征。其中,关于第一获取单元801的具体实现可以参见S201、S402以及S507的相关描述,本实施例在此不再赘述。
处理单元802,用于将所述第一特征输入访问频率预测模型,获得所述访问频率预测模型输出的所述第一数据块的访问频率,所述第一数据块的访问频率用于确定所述第一数据块的类型。其中,关于处理单元802的具体实现可以参见S202、S403以及S508的相关描述,本实施例在此不再赘述。
在一种可能的实现方式中,所述装置还包括:接收单元,用于具体用于接收第二装置发送的所述第一数据块的第一特征;所述第一获取单元801,具体用于获取所述接收单元接收的所述第一特征。其中,关于接收单元和第一获取单元801的具体实现可以参见S402和S507的相关描述,本实施例在此不再赘述。
在一种可能的实现方式中,所述装置还包括:发送单元(图中未示出);
发送单元,用于向所述第二装置发送所述第一数据块的访问频率。关于发送单元的具体实现可以参见S404和S509的相关描述,本实施例在此不再赘述。
在一种可能的实现方式中,所述第一数据块的类型为冷数据或热数据。
在一种可能的实现方式中,处理单元802,还用于在所述第一特征的特征长度发生变化时,更新所述访问频率预测模型,获得更新后的访问频率预测模型。关于更新单元的具体实现可以参见S202、S403和S504的相关描述,本实施例在此不再赘述。
在一种具体的实现方式中,所述第一获取单元801,还用于获取第一数据块的第二特征,所述第一特征的长度与所述第二特征的长度不同;所述处理单元802,还用于将所述第二特征输入所述更新后的访问频率预测模型,获得所述更新后的访问频率预测模型输出的所述第一数据块的访问频率。关于第一获取单元801的具体实现可以参见S202和S403中的相关描述,本实施例在此不再赘述。
需要说明的是,本实施例中各个单元的具体实现可以参见图2、图4以及图5所述方法实施例中的相关描述,本实施例在此不再赘述。
参见图9,该图为本申请实施例提供的另一种数据处理装置结构图,该装置900可以实现上述方法实施例中第二装置的功能,一种可能的实现中,该装置900可以包括执行上述方法实施例中第二装置执行的方法/操作/步骤/动作所一一对应的模块或单元,该单元可以是硬件电路,也可是软件,也可以是硬件电路结合软件实现。一种可能的实现中,所述装置还包括:第二获取单元901和处理单元902。
其中,第二获取单元901,用于获取第一数据块的访问频率,所述第一数据块的访问频率是基于访问频率预测模型以及所述第一数据块的第一特征获取的,所述第一特征包括 与所述第一数据块相关的读/写特征以及与所述第一数据块相邻的第二数据块的读取特征,所述第一数据块为存储空间中任一数据块。其中,关于第二获取单元901的具体实现可以参见S404以及S509的相关描述,本实施例在此不再赘述。
处理单元902,用于根据各所述第一数据块的访问频率确定所述存储空间中各所述第一数据块的类型。其中,关于处理单元902的具体实现可以参见S202和S405的相关描述,本实施例在此不再赘述。
在一种可能的实现方式中,所述处理单元902,具体用于根据各所述第一数据块的访问频率进行排序,获得排序结果;根据所述排序结果确定所述存储空间中各所述第一数据块的类型。其中,关于处理单元902的具体实现可以参见S202和S405的相关描述,本实施例在此不再赘述。
在一种可能的实现方式中,所述第一特征包括所述第一数据块在T个历史时间点中每个历史时间点的特征,所述第一数据块在T个历史时间点的读/写特征之间具有时空相关性,所述T为正整数。
在一种可能的实现方式中,所述与所述第一数据块相关的读/写特征包括以下至少一种:读/写所述第一数据块的频率特征、读取所述第一数据块的长度特征、读取所述第一数据块的排布特征。
在一种可能的实现方式中,所述读/写所述第一数据块的频率特征包括以下:一个或多个访问接口对所述第一数据块的读频率、一个或多个访问接口对所述第一数据块的写频率以及一个或多个访问接口对所述第一数据块的读写总频率。
在一种可能的实现方式中,所述读取所述第一数据块的长度特征包括以下至少一种:最大读取长度、最小读取长度或平均读取长度。
在一种可能的实现方式中,所述读取所述第一数据块的排布特征包括以下至少一种:读取第一长度的次数、读取第二长度的次数、所述读取第一长度的次数占总读取次数的比例或所述读取第二长度的次数占总读取次数的比例,所述第一长度是指长度为2n,所述第二长度是指长度非2n,n为正整数。
在一种可能的实现方式中,所述与所述第一数据块的相邻的第二数据块的读取特征至少包括L个所述第二数据块各自对应的读取频率特征,所述L为正整数。
在一种可能的实现方式中,所述访问频率预测模型包括第一子模型和第二子模型,所述第一子模型用于提取所述第一特征的时间特性,第二子模型用于提取所述第一特征的空间特性。
在一种可能的实现方式中,所述访问频率预测模型还包括第三子模型,所述第三子模型用于获取所述第一特征的特征尺度。
在一种可能的实现方式中,所述访问频率预测模型是根据待训练特征以及所述待训练特征对应的标签训练生成的,所述待训练特征对应的标签为访问频率。
在一种可能的实现方式中,所述装置还包括:第三获取单元(图中未示出);
第三获取单元,用于在所述第一特征的特征长度发生变化时,获取新的待训练数据集合,所述新的待训练数据集合包括多个待训练特征以及所述多个待训练特征对应的标签, 所述新的待训练数据集合用于对所述访问频率预测模型进行训练,获得更新后的访问频率预测模型。其中,关于第四获取单元的具体实现可以参见S403和S504的相关描述,本实施例在此不再赘述。
在一种可能的实现方式中,所述装置还包括:发送单元和接收单元(图中未示出);
发送单元,用于向所述第三装置发送所述新的待训练数据集合,以使得所述第三装置利用所述新的待训练数据集合对所述访问频率预测模型进行训练,获得所述更新后的访问频率预测模型;接收单元,用于接收所述第三装置发送的所述更新后的访问频率预测模型。关于发送单元和接收单元的具体实现,可以参见S502和S504的相关描述,本实施例在此不再赘述。
在一种具体的实现方式中,所述装置还包括:发送单元(图中未示出);
发送单元,用于向所述第一装置发送所述更新后的访问频率预测模型。关于发送单元的具体实现可以参见S505的相关描述,本实施例在此不再赘述。
在一种具体的实现方式中,所述第一特征的特征长度发生变化包括历史时间点个数T发生变化和/或与所述第一数据块的相邻的第二数据块的个数L发生变化,所述T为正整数,所述L为正整数。
在一种可能的实现方式中,所述装置还包括:接收单元和发送单元(图中未示出);
所述第二获取单元901,具体用于获取第一数据块的所述第一特征;发送单元,用于向所述第一装置发送所述第一数据块的第一特征;接收单元,用于接收所述第一装置发送的所述第一数据块的访问频率,所述第一数据块的访问频率由所述第一装置利用所述访问频率预测模型以及所述第一特征获取的。
需要说明的是,本实施例中各个单元的实现可以参见上述方法实施例中相关描述,本实施例在此不再赘述。
参见图10为本申请实施例提供的一种网络设备的结构示意图,该网络设备例如可以是图2-图5所示实施例中的第一装置、第二装置或第三装置,或者也可以是图8所示实施例中的数据处理装置800的设备或图9所示实施例中的数据处理装置900的设备实现。
请参阅图10所示,网络设备1000至少包括处理器1010。网络设备1000还可以包括通信接口1020和存储器1030。其中网络设备1000中的处理器1010的数量可以一个或多个,图10中以一个处理器为例。本申请实施例中,处理器1010、通信接口1020和存储器1030可通过总线系统或其它方式连接,其中,图10中以通过总线系统1040连接为例。
处理器1010可以是CPU、网络处理器(network processor,NP)、或者CPU和NP的组合。处理器1010还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。
当网络设备为第一装置时,处理器1010可以执行上述方法实施例中将第一特征输入访 问频率预测模型,获得访问频率预测模型输出的第一数据块的访问频率等相关功能。
当网络设备为第二装置时,处理器1010可以执行上述方法实施例中根据各第一数据块的访问频率确定存储空间各第一数据块的类型等相关功能。
当网络设备为第三装置时,处理器1010可以执行上述方法实施例中训练生成访问频率预测模型等相关功能。
通信接口1020用于接收和发送第一特征,具体地,通信接口1020可以包括接收接口和发送接口。其中,接收接口可以用于接收第一特征,发送接口可以用于发送第一特征。通信接口1020的个数可以为一个或多个。
存储器1030可以包括易失性存储器(英文:volatile memory),例如随机存取存储器(random-access memory,RAM);存储器1030也可以包括非易失性存储器(英文:non-volatile memory),例如快闪存储器(英文:flash memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器1030还可以包括上述种类的存储器的组合。存储器1030例如可以存储访问频率预测模型或第一数据的第一特征等。
可选地,存储器1030存储有操作系统和程序、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,程序可包括各种操作指令,用于实现各种操作。操作系统可包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。处理器1010可以读取存储器1030中的程序,实现本申请实施例提供的数据处理方法。
其中,存储器1030可以为网络设备1000中的存储器件,也可以为独立于网络设备1000的存储装置。
总线系统1040可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线系统1040可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
参加图11,本申请实施例还提供一种数据处理装置1100,可用于实现上述方法中第一装置、第二装置或第三装置的功能,该装置1100可以是装置或者装置中的芯片。该数据处理装置包括:
至少一个输入输出接口1110和逻辑电路1120。输入输出接口1110可以是输入输出电路。逻辑电路1120可以是信号处理器、芯片,或其他可以实现本申请方法的集成电路。
其中,至少一个输入输出接口1110用于信号或数据的输入或输出。举例来说,当该装置为第一装置时,输入输出接口1110用于接收第一数据块的第一特征。举例来说,当该装置为第二装置时,输入输出接口1110用于输出第一数据块的第一特征。
其中,逻辑电路1120用于执行本申请实施例提供的任意一种方法的部分或全部步骤。举例来说,当该装置为第一装置时,用于执行上述方法实施例中各种可能的实现方式中第一装置执行的步骤,例如逻辑电路1120用于根据第一数据块的第一特征获取第一数据块的访问频率。当该装置为第二装置时,用于执行上述方法实施例中各种可能的实现方法中第二装置执行的步骤,例如逻辑电路1120用于获取第一数据块的数据类型。
当上述装置为应用于终端的芯片时,该终端芯片实现上述方法实施例中终端的功能。该终端芯片从终端中的其它模块(如射频模块或天线)接收信息,该信息是其他终端或网络设备发送给终端的;或者,该终端芯片向终端中的其它模块(如射频模块或天线)输出信息,该信息是终端发送给其他终端或网络设备的。
当上述装置为应用于网络设备的芯片时,该网络设备芯片实现上述方法实施例中网络设备的功能。该网络设备芯片从网络设备中的其它模块(如射频模块或天线)接收信息,该信息是终端或其他网络设备发送给该网络设备的;或者,该网络设备芯片向网络设备中的其它模块(如射频模块或天线)输出信息,该信息是网络设备发送给终端或其他网络设备的。
本申请还提供一种芯片或芯片系统,该芯片可包括处理器。该芯片还可包括存储器(或存储模块)和/或收发器(或通信模块),或者,该芯片与存储器(或存储模块)和/或收发器(或通信模块)耦合,其中,收发器(或通信模块)可用于支持该芯片进行有线和/或无线通信,存储器(或存储模块)可用于存储程序或一组指令,该处理器调用该程序或该组指令可用于实现上述方法实施例、方法实施例的任意一种可能的实现方式中由终端或者网络设备执行的操作。该芯片系统可包括以上芯片,也可以包含上述芯片和其他分立器件,如存储器(或存储模块)和/或收发器(或通信模块)。
本申请实施例提供了一种计算机可读存储介质,包括指令或计算机程序,当其在计算机上运行时,使得计算机执行以上实施例提供的数据处理方法。
本申请实施例还提供了一种包含指令或计算机程序的计算机程序产品,当其在计算机上运行时,使得计算机执行以上实施例提供的数据处理方法。
基于与上述方法实施例相同构思,本申请还提供一种数据处理系统,该数据处理系统可包括以上第一装置和第二装置。该数据处理系统可用于实现上述方法实施例、方法实施例的任意一种可能的实现方式中由第一装置或者第二装置执行的操作。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑业务划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合 或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各业务单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件业务单元的形式实现。
集成的单元如果以软件业务单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请所描述的业务可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些业务存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。
以上的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上仅为本申请的具体实施方式而已。
以上,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (30)

  1. 一种数据处理方法,其特征在于,所述方法包括:
    第一装置获取第一数据块的第一特征,所述第一数据块为存储空间中的任一数据块,所述第一特征包括与所述第一数据块相关的读/写特征以及与所述第一数据块相邻的第二数据块的读取特征;
    所述第一装置将所述第一特征输入访问频率预测模型,获得所述访问频率预测模型输出的所述第一数据块的访问频率,所述第一数据块的访问频率用于确定所述第一数据块的类型。
  2. 根据权利要求1所述的方法,其特征在于,所述第一装置获取第一数据块的第一特征,包括:
    所述第一装置接收第二装置发送的所述第一数据块的第一特征。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    所述第一装置向所述第二装置发送所述第一数据块的访问频率。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:
    在所述第一特征的特征长度发生变化时,所述第一装置更新所述访问频率预测模型,获得更新后的访问频率预测模型。
  5. 一种数据处理方法,其特征在于,所述方法还包括:
    第二装置获取第一数据块的访问频率,所述第一数据块的访问频率是基于访问频率预测模型以及所述第一数据块的第一特征获取的,所述第一特征包括与所述第一数据块相关的读/写特征以及与所述第一数据块相邻的第二数据块的读取特征,所述第一数据块为存储空间中任一数据块;
    所述第二装置根据各所述第一数据块的访问频率确定所述存储空间中各所述第一数据块的类型。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述第一特征包括所述第一数据块在T个历史时间点中每个历史时间点的特征,所述第一数据块在T个历史时间点的读/写特征之间具有时空相关性,所述T为正整数。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述与所述第一数据块相关的读/写特征包括以下至少一种:读/写所述第一数据块的频率特征、读取所述第一数据块的长度特征、读取所述第一数据块的排布特征。
  8. 根据权利要求7所述的方法,其特征在于,所述与所述第一数据块相关的读/写特征满足以下至少一种:
    所述读/写所述第一数据块的频率特征包括以下至少一种:访问接口对所述第一数据块的读频率、访问接口对所述第一数据块的写频率以及访问接口对所述第一数据块的读写总频率;和/或,
    所述读取所述第一数据块的长度特征包括以下至少一种:最大读取长度、最小读取长度或平均读取长度;和/或,
    所述读取所述第一数据块的排布特征包括以下至少一种:读取第一长度的次数、读取 第二长度的次数、所述读取第一长度的次数占总读取次数的比例或所述读取第二长度的次数占总读取次数的比例,所述第一长度是指长度为2n,所述第二长度是指长度非2n,n为正整数。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述与所述第一数据块的相邻的第二数据块的读取特征至少包括L个所述第二数据块各自对应的读取频率特征,所述L为正整数。
  10. 根据权利要求1-9任一项所述的方法,其特征在于,所述访问频率预测模型包括第一子模型和第二子模型,所述第一子模型用于提取所述第一特征的时间特性,第二子模型用于提取所述第一特征的空间特性。
  11. 根据权利要求5-10任一项所述的方法,其特征在于,所述方法还包括:
    在所述第一特征的特征长度发生变化时,所述第二装置获取新的待训练数据集合,所述新的待训练数据集合包括多个待训练特征以及所述多个待训练特征对应的标签,所述新的待训练数据集合用于对所述访问频率预测模型进行训练,获得更新后的访问频率预测模型。
  12. 根据权利要求11所述的方法,其特征在于,所述第一特征的特征长度发生变化包括历史时间点个数T发生变化和/或与所述第一数据块的相邻的第二数据块的个数L发生变化,所述T为正整数,所述L为正整数。
  13. 根据权利要求5-12任一项所述的方法,其特征在于,所述第二装置获取所述第一数据块的访问频率,包括:
    所述第二装置获取第一数据块的所述第一特征,并向所述第一装置发送所述第一数据块的第一特征;
    所述第二装置接收所述第一装置发送的所述第一数据块的访问频率,所述第一数据块的访问频率由所述第一装置利用所述访问频率预测模型以及所述第一特征获取的。
  14. 一种数据处理装置,其特征在于,所述装置包括:
    第一获取单元,用于获取第一数据块的第一特征,所述第一数据块为存储空间中的任一数据块,所述第一特征包括与所述第一数据块相关的读/写特征以及与所述第一数据块相邻的第二数据块的读取特征;
    处理单元,用于将所述第一特征输入访问频率预测模型,获得所述访问频率预测模型输出的所述第一数据块的访问频率,所述第一数据块的访问频率用于确定所述第一数据块的类型。
  15. 根据权利要求14所述的装置,其特征在于,还包括:
    接收单元,用于接收第二装置发送的所述第一数据块的所述第一特征;
    所述第一获取单元,具体用于获取所述接收单元接收的所述第一特征。
  16. 根据权利要求15所述的装置,其特征在于,所述装置还包括:
    发送单元,用于向所述第二装置发送所述第一数据块的访问频率。
  17. 根据权利要求14-16任一项所述的装置,其特征在于,
    所述处理单元,还用于在所述第一特征的特征长度发生变化时,更新所述访问频率预 测模型,获得更新后的访问频率预测模型。
  18. 一种数据处理装置,其特征在于,所述装置还包括:
    第二获取单元,用于获取第一数据块的访问频率,所述第一数据块的访问频率是基于访问频率预测模型以及所述第一数据块的第一特征获取的,所述第一特征包括与所述第一数据块相关的读/写特征以及与所述第一数据块相邻的第二数据块的读取特征,所述第一数据块为存储空间中任一数据块;
    处理单元,用于根据各所述第一数据块的访问频率确定所述存储空间中各所述第一数据块的类型。
  19. 根据权利要求14-18任一项所述的装置,其特征在于,所述第一特征包括所述第一数据块在T个历史时间点中每个历史时间点的特征,所述第一数据块在T个历史时间点的读/写特征之间具有时空相关性,所述T为正整数。
  20. 根据权利要求14-19任一项所述的装置,其特征在于,所述与所述第一数据块相关的读/写特征包括以下至少一种:读/写所述第一数据块的频率特征、读取所述第一数据块的长度特征、读取所述第一数据块的排布特征。
  21. 根据权利要求20所述的装置,其特征在于,所述与所述第一数据块相关的读/写特征满足以下至少一种:
    所述读/写所述第一数据块的频率特征包括以下至少一种:所有访问接口对所述第一数据块的读频率、所有访问接口对所述第一数据块的写频率以及所有访问接口对所述第一数据块的读写总频率;和/或,
    所述读取所述第一数据块的长度特征包括以下至少一种:最大读取长度、最小读取长度或平均读取长度;和/或,
    所述读取所述第一数据块的排布特征包括以下至少一种:读取第一长度的次数、读取第二长度的次数、所述读取第一长度的次数占总读取次数的比例或所述读取第二长度的次数占总读取次数的比例,所述第一长度是指长度为2n,所述第二长度是指长度非2n,n为正整数。
  22. 根据权利要求14-21任一项所述的装置,其特征在于,所述与所述第一数据块的相邻的第二数据块的读取特征至少包括L个所述第二数据块各自对应的读取频率特征,所述L为正整数。
  23. 根据权利要求14-22任一项所述的装置,其特征在于,所述访问频率预测模型包括第一子模型和第二子模型,所述第一子模型用于提取所述第一特征的时间特性,第二子模型用于提取所述第一特征的空间特性。
  24. 根据权利要求18-23任一项所述的装置,其特征在于,所述装置还包括:
    第三获取单元,用于在所述第一特征的特征长度发生变化时,获取新的待训练数据集合,所述新的待训练数据集合包括多个待训练特征以及所述多个待训练特征对应的标签,所述新的待训练数据集合用于对所述访问频率预测模型进行训练,获得更新后的访问频率预测模型。
  25. 根据权利要求24所述的装置,其特征在于,所述第一特征的特征长度发生变化包 括历史时间点个数T发生变化和/或与所述第一数据块的相邻的第二数据块的个数L发生变化,所述T为正整数,所述L为正整数。
  26. 根据权利要求18-25任一项所述的装置,其特征在于,所述装置还包括:接收单元和发送单元,所述第二获取单元,具体用于获取第一数据块的所述第一特征;所述发送单元,用于向所述第一装置发送所述第一数据块的第一特征;所述接收单元,用于接收所述第一装置发送的所述第一数据块的访问频率,所述第一数据块的访问频率由所述第一装置利用所述访问频率预测模型以及所述第一特征获取的。
  27. 一种数据处理系统,其特征在于,所述系统包括:第一装置、第二装置;
    所述第一装置,用于执行权利要求1-4任一项所述的数据处理方法;
    所述第二装置,用于执行权利要求5-13任一项所述的数据处理方法。
  28. 一种数据处理装置,包括与存储器耦合的处理器,所述处理器用于执行所述存储器中的计算机指令,使得所述装置执行权利要求1-13任一项所述的方法。
  29. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行以上权利要求1-13任一项所述的数据处理方法。
  30. 一种计算机程序产品,其特征在于,所述计算机程序产品在设备上运行时,使得所述设备执行权利要求1-13任一项所述的数据处理方法。
PCT/CN2022/115437 2021-08-31 2022-08-29 一种数据处理方法、装置及系统 WO2023030227A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22863363.2A EP4383057A1 (en) 2021-08-31 2022-08-29 Data processing method, apparatus and system
US18/588,775 US20240192880A1 (en) 2021-08-31 2024-02-27 Data processing method, apparatus, and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111018067.4 2021-08-31
CN202111018067.4A CN115730210A (zh) 2021-08-31 2021-08-31 一种数据处理方法、装置及系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/588,775 Continuation US20240192880A1 (en) 2021-08-31 2024-02-27 Data processing method, apparatus, and system

Publications (1)

Publication Number Publication Date
WO2023030227A1 true WO2023030227A1 (zh) 2023-03-09

Family

ID=85291849

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/115437 WO2023030227A1 (zh) 2021-08-31 2022-08-29 一种数据处理方法、装置及系统

Country Status (4)

Country Link
US (1) US20240192880A1 (zh)
EP (1) EP4383057A1 (zh)
CN (1) CN115730210A (zh)
WO (1) WO2023030227A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011186554A (ja) * 2010-03-04 2011-09-22 Toshiba Corp メモリ管理装置及び方法
CN110362277A (zh) * 2019-07-19 2019-10-22 重庆大学 基于混合存储系统的数据分类存储方法
CN111078126A (zh) * 2018-10-19 2020-04-28 阿里巴巴集团控股有限公司 分布式存储系统及其存储方法
CN111124295A (zh) * 2019-12-11 2020-05-08 成都信息工程大学 一种基于三元影响因子的农业数据存储处理系统及方法
CN112379842A (zh) * 2020-11-18 2021-02-19 深圳安捷丽新技术有限公司 一种预测数据冷热属性的方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011186554A (ja) * 2010-03-04 2011-09-22 Toshiba Corp メモリ管理装置及び方法
CN111078126A (zh) * 2018-10-19 2020-04-28 阿里巴巴集团控股有限公司 分布式存储系统及其存储方法
CN110362277A (zh) * 2019-07-19 2019-10-22 重庆大学 基于混合存储系统的数据分类存储方法
CN111124295A (zh) * 2019-12-11 2020-05-08 成都信息工程大学 一种基于三元影响因子的农业数据存储处理系统及方法
CN112379842A (zh) * 2020-11-18 2021-02-19 深圳安捷丽新技术有限公司 一种预测数据冷热属性的方法和装置

Also Published As

Publication number Publication date
EP4383057A1 (en) 2024-06-12
CN115730210A (zh) 2023-03-03
US20240192880A1 (en) 2024-06-13

Similar Documents

Publication Publication Date Title
EP4080416A1 (en) Adaptive search method and apparatus for neural network
US11435953B2 (en) Method for predicting LBA information, and SSD
Xu et al. esDNN: deep neural network based multivariate workload prediction in cloud computing environments
US11449230B2 (en) System and method for Input/Output (I/O) pattern prediction using recursive neural network and proaction for read/write optimization for sequential and random I/O
CN111083933B (zh) 数据存储及获取方法和装置
CN110018997B (zh) 一种基于hdfs的海量小文件存储优化方法
CN112667528A (zh) 一种数据预取的方法及相关设备
CN115129621B (zh) 一种内存管理方法、设备、介质及内存管理模块
WO2020224414A1 (zh) 一种固态硬盘的数据处理方法及装置
CN110427404A (zh) 一种区块链跨链数据检索系统
US20230161811A1 (en) Image search system, method, and apparatus
CN113837492B (zh) 物品供应量的预测方法、设备、存储介质及程序产品
Chao Web cache intelligent replacement strategy combined with GDSF and SVM network re-accessed probability prediction
CN108829343B (zh) 一种基于人工智能的缓存优化方法
US20200082241A1 (en) Cognitive storage device
CN117235088B (zh) 一种存储系统的缓存更新方法、装置、设备、介质及平台
WO2023030227A1 (zh) 一种数据处理方法、装置及系统
CN116127400B (zh) 基于异构计算的敏感数据识别系统、方法及存储介质
CN113971225A (zh) 图像检索系统、方法和装置
TWI684131B (zh) 資料傳輸裝置、資料傳輸方法以及非暫態電腦可讀取記錄媒體
WO2024045319A1 (zh) 人脸图像聚类方法、装置、电子设备及存储介质
US11416152B2 (en) Information processing device, information processing method, computer-readable storage medium, and information processing system
CN116185300A (zh) 一种在主机端基于深度学习完成固态硬盘高效垃圾回收的软硬件实现办法
EP3846037B1 (en) Storage device configured to support multi-streams and operation method thereof
CN114401496A (zh) 一种基于5g边缘计算的视频信息快速处理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22863363

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022863363

Country of ref document: EP

Effective date: 20240307

NENP Non-entry into the national phase

Ref country code: DE