WO2020037511A1 - 数据存储及获取方法和装置 - Google Patents

数据存储及获取方法和装置 Download PDF

Info

Publication number
WO2020037511A1
WO2020037511A1 PCT/CN2018/101597 CN2018101597W WO2020037511A1 WO 2020037511 A1 WO2020037511 A1 WO 2020037511A1 CN 2018101597 W CN2018101597 W CN 2018101597W WO 2020037511 A1 WO2020037511 A1 WO 2020037511A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
difference
current data
mapping relationship
current
Prior art date
Application number
PCT/CN2018/101597
Other languages
English (en)
French (fr)
Inventor
陈明
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP18931202.8A priority Critical patent/EP3822795B1/en
Priority to CN201880013245.4A priority patent/CN111083933B/zh
Priority to PCT/CN2018/101597 priority patent/WO2020037511A1/zh
Priority to JP2021509809A priority patent/JP7108784B2/ja
Publication of WO2020037511A1 publication Critical patent/WO2020037511A1/zh
Priority to US17/179,591 priority patent/US11960467B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3002Conversion to or from differential modulation
    • H03M7/3044Conversion to or from differential modulation with several bits only, i.e. the difference between successive samples being coded by more than one bit, e.g. differential pulse code modulation [DPCM]
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3068Precoding preceding compression, e.g. Burrows-Wheeler transformation
    • H03M7/3071Prediction
    • H03M7/3073Time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks

Definitions

  • the embodiments of the present application relate to the field of data processing technologies, and in particular, to methods and devices for data storage and acquisition.
  • AI artificial intelligence
  • big data big data
  • Internet of Things the amount of data that needs to be stored has increased dramatically. If only relying on increasing the capacity of the storage device to store the sharply increased data, it will lead to higher purchase costs and management costs of the storage device, and the storage device occupies more space and energy, which brings a larger business Cost burden. Therefore, there is a need to provide effective data storage solutions.
  • the embodiments of the present application provide a method and a device for storing and retrieving data, which help to save storage overhead.
  • the embodiments of the present application also provide data compression and decompression methods and devices, which are helpful for saving compression or decompression time.
  • an embodiment of the present application provides a data storage method applied to a storage device.
  • the method may include: obtaining current data and historical data of the current data; using the historical data to predict the current data to obtain a first prediction Data; the first prediction data is the data after the current data is predicted based on the change rule of historical data; obtaining the first difference between the current data and the first prediction data; when the storage space occupied by the first difference is less than the current data When the storage space is occupied, information used to recover the current data is stored; wherein, the information used to recover the current data includes the first difference or a value obtained by compressing the first difference.
  • the difference is a parameter used to characterize the difference between the current data and the prediction data of the current data (such as the first prediction data or the second prediction data described below).
  • the difference may be a difference, a ratio, a multiple, or a percentage.
  • the historical data may be one or more data before the current data in a sequence formed by at least two data to be stored.
  • the data to be stored refers to the original data that needs to be stored.
  • the current data is current data to be stored among at least two data to be stored.
  • the algorithm used to perform the prediction includes an AI neural algorithm.
  • AI neural algorithms include any of the following: normalized least mean square adaptive filtering (NLMS) type, single layer perceptron (SLP) type, and multi-layer perception (multi- layer perceptron (MLP) type, or recurrent neural networks (RNN) type.
  • NLMS normalized least mean square adaptive filtering
  • SLP single layer perceptron
  • MLP multi-layer perception
  • RNN recurrent neural networks
  • the storage device does not store prediction data for the current data. This saves storage overhead. Based on this, the storage device predicts the current data according to the historical information to obtain the first predicted data; and then, restores the current data according to the first predicted data and the stored information used to restore the current data, for details, refer to the following second Aspect of the technical solution provided.
  • the algorithm used to perform the compression includes a dictionary compression algorithm and / or a deduplication algorithm.
  • the algorithm used to perform the compression includes a dictionary compression algorithm.
  • the dictionary of the dictionary compression algorithm includes at least two sets, and each set includes one or more mapping relationships.
  • Each mapping relationship refers to a For the mapping relationship between the first data and the second data, the storage space occupied by the first data is larger than the storage space occupied by the second data; each set corresponds to a range of hit rates, and different sets have different ranges of hit rates.
  • the method further includes: obtaining a hit ratio of the first difference; determining a target set among at least two sets according to the hit ratio of the first difference; wherein the hit ratio of the first difference is used to determine where the first difference is located
  • the hit ratio of the target mapping relationship the determined hit ratio of the target mapping relationship belongs to the hit ratio range corresponding to the target set; the first difference is found in the first data of the target set to determine the first difference corresponding to the first difference Two data; the second data corresponding to the first difference is a value obtained by compressing the first difference.
  • mapping relationships included in the storage device are classified into different sets, so that the set where the data to be compressed is located can be directly locked according to the hit rate of the data to be compressed (that is, the first difference), which helps to reduce the search for Compress the range of data, saving time when performing compression.
  • the storage medium of the storage device includes a cache, a memory, and a hard disk;
  • the algorithm used to perform the compression includes a dictionary compression algorithm, and the dictionary of the dictionary compression algorithm includes one or more mapping relationships, each mapping relationship Refers to the mapping relationship between a first data and a second data, the storage space occupied by the first data is greater than the storage space occupied by the second data;
  • the hit ratio of the mapping relationship in the cache is greater than or equal to the mapping in memory The hit ratio of the relationship.
  • the hit ratio of the mapping relationship in memory is greater than or equal to the hit ratio of the mapping relationship in the hard disk.
  • the method further includes: obtaining a hit ratio of the first difference; determining a target storage medium according to the hit ratio of the first difference; wherein the hit ratio of the first difference is used to determine a target mapping relationship where the first difference is located. Hit ratio.
  • the target storage medium is a cache; when the determined hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in the cache.
  • the target storage medium is memory; when the determined hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in memory, the target storage medium is the hard disk; at the target The first difference in the first data of the storage medium is searched to determine the second data corresponding to the first difference; the second data corresponding to the first difference is a value obtained by compressing the first difference.
  • the storage medium with the highest read-write performance where the data to be compressed is located can be directly locked.
  • the cache read and write performance is higher than the memory read and write performance, and the memory read and write performance is higher than the hard disk read and write performance. This helps to narrow down the search for data to be compressed, thereby saving time when performing compression.
  • the method further includes: when the storage space occupied by the first difference is greater than or equal to the storage space occupied by the current data, storing the current data or storing a value obtained by compressing the current data.
  • the method further includes: when the storage space occupied by the first difference is greater than or equal to the storage space occupied by the current data, storing the identification information.
  • the identification information is used to indicate that the information used to recover the current data is a first difference or a value obtained by compressing the first difference.
  • the identification information may be used as identification information of information for recovering current data, or may be information carried by the information for recovering current data. This technical solution helps the storage device identify the stored information used to recover the current data.
  • an embodiment of the present application provides a data acquisition method applied to a storage device.
  • the method may include: reading information used to recover current data; and the information used to recover current data includes a difference or a difference The value obtained by compression; the difference is the difference between the current data and the predicted data of the current data; the predicted data is the data after the current data is predicted based on the change rule of the historical data; the historical data is used to predict the current data To obtain the predicted data; determine the current data based on the information used to recover the current data and the predicted data.
  • historical data is one or more pieces of data that have been acquired.
  • the algorithm used to perform the decompression includes at least one of a dictionary-type decompression algorithm and / or a deduplication algorithm.
  • the information used to recover the current data includes the value obtained by compressing the difference; determining the current data according to the information used to recover the current data and the predicted data of the current data includes: compressing the difference The value is decompressed to obtain the difference; the current data is determined according to the difference and the predicted data of the current data.
  • the algorithm used to perform the decompression includes a dictionary-type decompression algorithm.
  • the dictionary of the dictionary-type decompression algorithm includes at least two sets, and each set includes one or more mapping relationships, and each mapping relationship Refers to the mapping relationship between a first data and a second data.
  • the storage space occupied by the first data is larger than the storage space occupied by the second data.
  • Each set corresponds to a range of hit rates, and the hit rate of different sets corresponds. The range is different.
  • Decompressing the value obtained by compressing the difference to obtain the difference may include: obtaining a hit ratio of the value obtained by compressing the difference; and determining a target in at least two sets according to the hit ratio of the value obtained by compressing the difference.
  • the hit ratio of the value obtained by compression of the difference is used to determine the hit ratio of the target mapping relationship where the value obtained by the compression of the difference is located, and the hit ratio of the determined target mapping relationship belongs to the range of hit ratios corresponding to the target set;
  • the value obtained by compressing the difference is searched to determine the first data corresponding to the value obtained by compressing the difference; the first data corresponding to the value obtained by compressing the difference is the difference.
  • the mapping relationships included in the storage device are classified into different sets. In this way, according to the hit rate of the data to be decompressed (specifically, the value obtained by compressing the difference), the set to which the data to be decompressed can be directly locked. , Which helps to narrow down the scope of finding data to be decompressed, thereby saving time in performing decompression.
  • the storage medium of the storage device includes a cache, a memory, and a hard disk;
  • the algorithm used to perform the decompression includes a dictionary-type decompression algorithm, and the dictionary of the dictionary-type decompression algorithm includes one or more mapping relationships.
  • a mapping relationship refers to a mapping relationship between a first data and a second data. The storage space occupied by the first data is larger than the storage space occupied by the second data; the hit ratio of the mapping relationship in the cache is greater than or equal to the memory The hit ratio of the mapping relationship in.
  • the hit ratio of the mapping relationship in memory is greater than or equal to the hit ratio of the mapping relationship in the hard disk.
  • decompressing the value obtained by compressing the difference to obtain the difference may include: obtaining a hit ratio of the value obtained by compressing the difference; and determining the target storage according to the hit ratio of the value obtained by compressing the difference. Medium; the hit ratio of the difference compressed value is used to determine the hit ratio of the target mapping relationship where the difference compressed value is located.
  • the target storage medium is a cache; when the determined hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in the cache, but belongs to the hit ratio range of the mapping relationship in the memory, the target storage medium is the memory; when it is determined When the hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in memory, the target storage medium is a hard disk; in the second data of the target storage medium, the value obtained by compressing the difference is determined to determine that the difference is compressed.
  • the first data corresponding to the obtained value; the first data corresponding to the value obtained by compressing the difference is the difference.
  • the storage medium with the highest read-write performance where the data to be decompressed can be directly locked. This helps to narrow down the search for the data to be decompressed, thereby saving time in performing the compression.
  • the information used to restore the current data read by the storage device does not carry identification information
  • the information used to restore the current data includes a difference or a value obtained by compressing the difference.
  • the information for recovering the current data read by the storage device carries identification information
  • the information for recovering the current data includes the current data or a value obtained by compressing the current data.
  • Solution 1 Read the information used to restore the current data, the information used to restore the current data carries identification information, and the information used to restore the current data includes the current data.
  • Solution 2 Read the information used to restore the current data, the information used to restore the current data carries identification information, and the information used to restore the current data includes the value obtained by compressing the current data; then, compress the current data. The value is decompressed to get the current data.
  • the technical solution provided in the second aspect can be combined with Option 1 / Option 2 to form a new technical solution.
  • the second aspect or the replacement solution of the second aspect corresponds to the technical solution provided by the first aspect and its corresponding design solution, so its specific implementation and beneficial effects can refer to the description in the first aspect.
  • an embodiment of the present application provides a data storage method applied to a storage device.
  • the method may include: obtaining current data and historical data of the current data; using the historical data to predict the current data to obtain the current data.
  • Prediction data which is the data after predicting the current data based on the change law of historical data; obtaining the difference between the current data and the predicted data; when the absolute value of the difference is less than or equal to a preset threshold, the prediction data is stored Set the data.
  • the storage space occupied by the preset data is smaller than the storage space occupied by the current data.
  • the preset data is predefined by the storage device.
  • the preset data may be an identifier, and the identifier is used to indicate that the predicted data of the current data can be (or approximately be) the current data.
  • the storage space occupied by the preset data is less than the storage space occupied by most or all of the data to be stored.
  • the compression process in this technical solution is specifically a lossless compression process.
  • the compression process in this technical solution is specifically a lossy compression process.
  • the preset threshold can be set based on actual needs (such as an acceptable lossy compression rate requirement). This technical solution can be applied to scenarios that allow a certain amount of data loss, such as scenes such as playing videos.
  • the method further includes: when the absolute value of the difference is greater than a preset threshold, storing current data or a value obtained by compressing the current data.
  • the algorithm used to perform the compression may be, for example, but not limited to, a dictionary compression algorithm and / or a deduplication algorithm.
  • the method further includes: when the absolute value of the difference is greater than a preset threshold, storing identification information, the identification information is used to indicate that the stored information used to recover the current data is obtained by compressing the current data When storing the current data, the identification information is used to indicate that the stored information used to restore the current data is the current data.
  • the identification information may be used as identification information of information used to recover current data, or information carried by the information used to recover current data. This technical solution helps the storage device to identify the type of information stored for recovering the current data. The type may be a "preset data” type or a "current data or a value obtained by compressing the current data” type, thereby Helps implement the data acquisition process.
  • the method further includes: storing a correspondence between the information used to recover the current data and the parameters of the AI neural algorithm used to perform the prediction. This will help restore the current data properly. For example, after each update of the parameters of the AI neural algorithm, the storage device performs a snapshot operation to record the correspondence between the information used to restore the current data and the parameters of the AI neural algorithm used to perform the prediction.
  • the method further includes: updating the parameters of the AI neural algorithm through adaptive learning; and updating the parameters used to restore the current AI neural algorithm according to the parameters of the updated AI neural algorithm. Data information. This will help restore the current data properly.
  • the parameters of the AI neural algorithm used for performing the prediction above are marked as the first parameters of the AI neural algorithm, and the parameters obtained by updating the first parameters of the AI neural algorithm are marked as the parameters of the AI neural algorithm.
  • updating the information used to restore the current data according to the parameters of the updated AI neural algorithm includes: reading the information used to restore the current data; according to the first parameter of the AI neural algorithm (i.e.
  • the AI neural algorithm before the update Parameters
  • read the information used to restore the current data and historical data of the current data restore the current data
  • the second parameter of the AI neural algorithm that is, the parameter of the updated AI neural algorithm
  • the history of the current data Data predicting the current data to obtain the second prediction data
  • the second prediction data is the data after the current data is predicted based on the change law of the historical data and the second parameter of the AI neural algorithm; obtaining the current data and the second prediction data
  • the second difference when the storage space occupied by the second difference is less than the storage space occupied by the current data, the stored information used to restore the current data is updated to the second difference or the second difference is compressed Get the value.
  • the storage device includes an AI calculation card.
  • the above-mentioned using the historical data to predict the current data to obtain the first prediction data includes: using the AI calculation card to use the historical data to predict the current data to obtain the first prediction. data.
  • the storage device includes memory. Obtaining the current data and the historical data of the current data in the at least two data to be stored includes: obtaining the current data and the historical data of the current data in the at least two data to be stored from the memory.
  • the data to be stored is deleted from the memory to save the storage overhead of the memory .
  • an embodiment of the present application provides a data acquisition method, which is applied to a storage device.
  • the method may include: reading information used to recover current data; when the information used to recover current data includes preset data, The historical data is used to predict the current data to obtain the predicted data of the current data.
  • the predicted data is the data after the current data is predicted based on the change law of the historical data; the predicted data is used as the current data.
  • historical data is one or more pieces of data that have been acquired.
  • the information used to restore the current data read by the storage device does not carry identification information
  • the information used to restore the current data includes preset data.
  • the information for recovering the current data read by the storage device carries identification information
  • the information for recovering the current data includes the current data or a value obtained by compressing the current data.
  • the fourth aspect may be replaced with the scheme 1 or the scheme 2 described above.
  • the technical solution provided in the fourth aspect can be combined with Option 1 / Option 2 to form a new technical solution.
  • the fourth aspect or the replacement solution of the fourth aspect corresponds to the technical solution provided by the third aspect and its corresponding design solution, so its implementation and beneficial effects can refer to the third aspect.
  • the storage device includes a memory, and before the current data is predicted using historical data to obtain the predicted data of the current data, the method further includes: obtaining historical data from the memory.
  • the method further includes: the storage device stores the current data in the memory as historical data of other data to be obtained.
  • the method further includes: when the acquired data is no longer used as historical data of the data to be acquired, the storage device may delete the acquired data from the memory to save memory storage overhead.
  • the method further includes: obtaining parameters of an AI neural algorithm used to predict the current data according to a correspondence between the information for recovering the current data and the parameters of the AI neural algorithm;
  • the method of using the historical data to predict the current data and obtaining the predicted data includes: using the historical data to predict the current data according to the parameters of the obtained AI neural algorithm to obtain the predicted data.
  • the storage device includes an AI calculation card
  • the above-mentioned using the historical data to predict the current data includes: using the AI calculation card to use the historical data to predict the current data.
  • an embodiment of the present application provides a data compression method that is applied to a storage device.
  • the storage device stores at least two sets, and each set includes one or more mapping relationships.
  • Each mapping relationship refers to a first The mapping relationship between a piece of data and a piece of second data.
  • the storage space occupied by the first data is larger than the storage space occupied by the second data.
  • Each set corresponds to a range of hit rates, and different sets have different ranges of hit rates.
  • the method may include: obtaining a hit rate of the data to be compressed; determining a target set among at least two sets according to the hit rate of the data to be compressed; the hit rate of the data to be compressed is used to determine a hit of a target mapping relationship where the data to be compressed is located Rate, the hit rate of the determined target mapping relationship belongs to the range of hit rates corresponding to the target set; find the data to be compressed in the first data of the target set to determine the second data corresponding to the data to be compressed, and correspond to the data to be compressed
  • the second data is used as a value obtained by compressing the data to be compressed.
  • an embodiment of the present application provides a data decompression method, which is applied to a storage device.
  • the storage device stores at least two sets, and each set includes one or more mapping relationships, and each mapping relationship refers to a The mapping relationship between the first data and a second data.
  • the storage space occupied by the first data is larger than the storage space occupied by the second data.
  • Each set corresponds to a range of hit rates, and different sets have different ranges of hit rates.
  • the method may include: obtaining a hit rate of the data to be decompressed; determining a target set among at least two sets according to the hit rate of the data to be decompressed; the hit rate of the data to be decompressed is used to determine a target where the data to be decompressed is located The hit ratio of the mapping relationship.
  • the determined hit ratio of the target mapping relationship belongs to the hit ratio range corresponding to the target set.
  • the second data of the target set is searched for the data to be decompressed to determine the first corresponding to the data to be decompressed. Data, and use the first data corresponding to the data to be decompressed as the value obtained by decompressing the data to be decompressed.
  • an embodiment of the present application provides a data compression method that is applied to a storage device.
  • the storage medium of the storage device includes a cache, a memory, and a hard disk.
  • the hit ratio of the mapping relationship in the cache is greater than or equal to the mapping relationship in the memory.
  • Hit ratio, the hit ratio of the mapping relationship in memory is greater than or equal to the hit ratio of the mapping relationship in the hard disk; each mapping relationship refers to the mapping relationship between a first data and a second data, the first data occupies The storage space is larger than the storage space occupied by the second data.
  • the method includes: obtaining a hit ratio of the data to be compressed; determining a target storage medium according to the hit ratio of the data to be compressed; the hit ratio of the data to be compressed is used to determine a hit ratio of a target mapping relationship where the data to be compressed is located; when the determined target is When the hit ratio of the mapping relationship belongs to the hit ratio range of the mapping relationship in the cache, the target storage medium is the cache; when the hit ratio of the determined target mapping relationship does not belong to the hit ratio range of the mapping relationship in the cache, but belongs to the mapping in memory When the hit ratio range of the relationship, the target storage medium is memory; when the determined hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in memory, the target storage medium is a hard disk; in the first data of the target storage medium Find the data to be compressed to determine the second data corresponding to the data to be compressed, and use the second data corresponding to the data to be compressed as the value obtained by compressing the data to be compressed.
  • an embodiment of the present application provides a data decompression method, which is applied to a storage device.
  • the storage medium of the storage device includes a cache, a memory, and a hard disk.
  • the hit ratio of the mapping relationship in the cache is greater than or equal to the mapping relationship in the memory.
  • the hit ratio of the mapping relationship in memory is greater than or equal to the hit ratio of the mapping relationship in the hard disk; each mapping relationship refers to the mapping relationship between a first data and a second data, the first data occupies Storage space is larger than the storage space occupied by the second data.
  • the method may include: obtaining a hit ratio of the data to be decompressed; determining a target storage medium according to the hit ratio of the data to be decompressed; the hit ratio of the data to be decompressed is used to determine a hit ratio of a target mapping relationship where the data to be decompressed is located ,
  • the target storage medium is a cache; when the determined hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in the cache, but When it belongs to the hit ratio range of the mapping relationship in memory, the target storage medium is memory; when the determined hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in memory, the target storage medium is a hard disk; in the target storage medium
  • the first data corresponding to the data to be decompressed is searched for the second data, and the first data corresponding to the data to be decompressed is used as a value obtained by decompressing the data to be decompressed.
  • the beneficial effects that can be achieved by the data compression method provided in the fifth or seventh aspect can refer to the description of the first aspect.
  • beneficial effects that can be achieved by the data decompression method provided by the sixth or eighth aspect refer to the description of the second aspect.
  • the mapping relationship described in the fifth and seventh aspects may be a mapping relationship contained in a dictionary of a dictionary-type compression algorithm.
  • an embodiment of the present application provides a storage device, and the storage device may be configured to execute any one of the methods provided in the first to eighth aspects.
  • the storage device may be divided into functional modules according to any one of the methods provided in the first aspect to the eighth aspect, for example, each functional module may be divided corresponding to each function, or two Or two or more functions are integrated in one processing module.
  • the storage device includes a memory and a processor.
  • the memory is used to store program code
  • the processor is used to call the program code to perform any of the methods provided in the first aspect to the eighth aspect.
  • the memory and the processor described in this application may be integrated on a single chip, or may be separately provided on different chips. This application does not limit the type of the memory and the manner of setting the memory and the processor.
  • An embodiment of the present application further provides a computer-readable storage medium, including program code, where the program code includes instructions for performing part or all of the steps of any of the methods provided by the first aspect to the eighth aspect.
  • An embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored, and when the computer program is run on the computer, the computer is caused to execute any one of the possible aspects provided in the first aspect to the eighth aspect above. method.
  • An embodiment of the present application further provides a computer program product, which, when run on a computer, causes any of the methods provided by the first aspect to the eighth aspect to be executed.
  • any of the storage devices, computer-readable storage media, or computer program products provided above is used to execute the corresponding methods provided above. Therefore, for the beneficial effects that can be achieved, refer to the corresponding methods. The beneficial effects in the description are not repeated here.
  • FIG. 1 is a schematic diagram of a system architecture applicable to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a hardware structure of a storage device applicable to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a hardware structure of a storage device applicable to another embodiment of the present application.
  • FIG. 4 is a schematic diagram of an AI neural algorithm applicable to an embodiment of the present application.
  • FIG. 5 is a first schematic diagram of a data storage method according to an embodiment of the present application.
  • 5A is a schematic diagram of data to be stored and information actually stored according to an embodiment of the present application.
  • FIG. 6 is a second schematic diagram of a data storage method according to an embodiment of the present application.
  • 6A is a schematic diagram of information stored in a memory and a hard disk at a certain moment according to an embodiment of the present application
  • FIG. 7 is a first schematic diagram of a data acquisition method according to an embodiment of the present application.
  • FIG. 7A is a schematic diagram of data to be obtained and information actually stored according to an embodiment of the present application.
  • FIG. 8 is a second schematic diagram of a data acquisition method according to an embodiment of the present application.
  • FIG. 9 is a first schematic diagram of a data compression method according to an embodiment of the present application.
  • FIG. 10 is a first schematic diagram of a data decompression method according to an embodiment of the present application.
  • FIG. 11 is a second schematic diagram of a data compression method according to an embodiment of the present application.
  • FIG. 12 is a second schematic diagram of a data decompression method according to an embodiment of the present application.
  • FIG. 13 is a third schematic diagram of a data storage method according to an embodiment of the present application.
  • FIG. 14 is a third schematic diagram of a data acquisition method according to an embodiment of the present application.
  • FIG. 15 is a first schematic diagram of a storage device according to an embodiment of the present application.
  • FIG. 16 is a second schematic diagram of a storage device according to an embodiment of the present application.
  • FIG. 17 is a third schematic diagram of a storage device according to an embodiment of the present application.
  • FIG. 18 is a fourth schematic diagram of a storage device according to an embodiment of the present application.
  • FIG. 19 is a fifth schematic diagram of a storage device according to an embodiment of the present application.
  • FIG. 20 is a sixth schematic diagram of a storage device according to an embodiment of the present application.
  • FIG. 1 it is a schematic diagram of a system architecture applicable to an embodiment of the present application.
  • the system architecture shown in FIG. 1 includes a client 100 and a storage device 200.
  • the client 100 is configured to send a write request to the storage device 200.
  • the write request includes one or more data to be written, and address information of each data to be written.
  • the storage device 200 stores each to-be-written data in the storage space indicated by the address information of the to-be-written data in order, or processes the to-be-written data (such as prediction and difference in the following)
  • the data obtained by one or more of the data, compression, etc. is stored in the storage space indicated by the address information of the data to be written.
  • the client 100 is configured to send a read request to the storage device, where the read request includes address information of one or more data to be read.
  • the storage device 200 After the storage device 200 receives the read request, it sequentially reads data from the storage space indicated by the address information of each data to be read, and then feeds back the read data to the client 100 or sends the read data to the client 100.
  • the obtained data is processed (such as one or more of prediction, difference, and decompression in the following), and the data is fed back to the client 100.
  • system architecture shown in FIG. 1 is only an example of a system architecture applicable to the embodiment of the present application, and it does not limit the system architecture applicable to the embodiment of the present application.
  • system architecture applicable to the embodiments of the present application may include one storage device 200 and multiple clients 100; or, it may include one client and multiple storage devices 200, and so on.
  • FIG. 1 illustrates that the client 100 is independent of the storage device 200 as an example.
  • the client 100 may be integrated on a device independent of the storage device 200 in terms of hardware implementation.
  • the client 100 may serve as a logical function module in the storage device 200.
  • the client 100 may be jointly implemented by a storage medium (such as a memory) and a processor (such as a central processing unit (CPU)) in the storage device 200.
  • a program instruction is stored in the storage medium, and when the program instruction is called by the processor, the processor is caused to execute a function implementable by the client 100.
  • the client 100 may also be implemented by a storage medium (such as a memory), a processor, and other hardware in the storage device 200, which is not limited in the embodiment of the present application. If no description is provided, the following description is based on the example in which the technical solution provided in the embodiment of the present application is applied to the system architecture shown in FIG. 1.
  • FIG. 2 it is a schematic diagram of a hardware structure of a storage device 200 applicable to an embodiment of the present application.
  • the storage device shown in FIG. 2 includes an interface card 201, a processor 202, a main memory (such as a memory) 203, an auxiliary memory 204 (such as a hard disk, etc.), a protocol conversion module 205, and a bus 206. See Figure 2 for the connection relationship between these devices.
  • the hard disk includes, but is not limited to, a storage medium such as a hard disk drive (HDD) or a solid-state disk (SSD).
  • HDD hard disk drive
  • SSD solid-state disk
  • the main storage 203 is the memory (labeled as the memory 203), and the auxiliary storage 204 is specifically the hard disk (labeled as the hard disk 204), and the protocol conversion module 205 is specifically the hard disk protocol conversion module (labeled as the hard disk
  • the protocol conversion module 205) is described as an example. It is unified here and will not be described in detail below.
  • the interface card 201, the processor 202, the memory 203, the hard disk 204, and the hard disk protocol conversion module 205 can be connected to each other through a bus 206.
  • the bus 206 may include at least one of the following: a peripheral component interconnect (PCI) bus, a PCIE (PCI Express) bus, a serial attached SCSI (SAS), a SATA serial hard disk (advanced technology) attachment (SATA), extended industry standard architecture (EISA) bus, etc.
  • SCSI is the English abbreviation of small computer system interface.
  • the bus 206 may include one or more of an address bus, a data bus, a control bus, and the like. For ease of representation, the arrowed lines are used to represent the bus 206 in FIG. 2, but it does not mean that there is only one bus or one type of bus.
  • the interface card 201 may also be referred to as a front-end protocol conversion module, and is configured to perform transmission protocol conversion on the received information. For example, the information received from the client 100 using the optical network communication protocol or the Ethernet communication protocol is converted into information using the PCIE protocol. As another example, the information received from the processor 202 by using the PCIE protocol is converted into information using an optical network communication protocol or an Ethernet communication protocol.
  • the interface card 201 may include at least one of the following: a Fibre Channel (FC) interface card, a Gigabit Ethernet (GE) interface card, an interface bus (IB) interface card, and the like.
  • the processor 202 is a control center of the storage device 200, and can be used to control other devices in the storage device 200 such as the memory 203, the hard disk 204, and the hard disk protocol conversion module 205 to implement the technical solution provided by the embodiment of the present application. Specifically, See below for examples.
  • the processor 202 may include a CPU, and specifically may include one or more CPUs.
  • the processor 202 may include a CPU and a cache (ie, a CPU cache).
  • the cache is a high-speed memory between the CPU and the memory 203, which is mainly used to improve the read and write performance of the storage device 200.
  • the data stored in the cache may be a part of the data stored in the memory 203. If the cache includes data to be accessed (such as data to be read or data obtained after processing the data to be read, etc.), the CPU can obtain the data to be accessed from the cache instead of obtaining the data to be accessed from the memory 203, thereby Speeds up data reads.
  • the memory 203 can be used for buffering information (such as information carried by a write request or a read request) from the interface card 201, so that the processor 202 can call the information buffered in the memory 203, thereby realizing the information provided by the embodiments of the present application.
  • Technical solution or, used to cache information (such as data to be read) from the processor 202, so that the processor 202 can call the information cached in the memory 203 and send it to the interface card 201 so that the interface card 201 is Transform the buffered information into a transport protocol.
  • the memory 203 is a memory between the processor 202 and the hard disk 204, and is used to improve the read and write performance of the storage device 200.
  • the data stored in the memory 203 may be a part of the data stored in the hard disk 203. If the memory includes data to be accessed, the CPU can obtain the data to be accessed from the memory 203 without acquiring the data to be accessed from the hard disk 204, thereby speeding up the data reading rate.
  • the hard disk 204 is used for storing data.
  • the hard disk 204 may include at least one of the following: a SAS hard disk (or a SAS cascading frame), a PCIE hard disk, a SATA hard disk, and the like.
  • the hard disk protocol conversion module 205 which may also be referred to as a back-end protocol conversion module, is located between the processor 202 and the hard disk 204, and is used to perform transmission protocol conversion on the received information.
  • the information received from the processor 202 using the PCIE protocol is converted into information applicable to a hard disk 204 protocol such as the SAS protocol or the SATA protocol.
  • the information received from the hard disk 204 using the SAS protocol or the SATA protocol is converted into information using a protocol applicable to the processor 202 such as the PCIE protocol.
  • the hard disk protocol conversion module 205 may specifically be a SAS protocol conversion chip, or a SAS interface card.
  • the processor 202 may be configured to perform steps such as prediction, difference calculation, compression and decompression described below. For specific examples, refer to the following. In this case, it can be considered that the processor 202 executes steps such as prediction, difference calculation, compression and decompression by calling a program.
  • the storage device 200 shown in FIG. 2 is only an example of a storage device applicable to the embodiment of the present application, and it does not limit the storage device applicable to the embodiment of the present application.
  • the storage device applicable to the embodiment of the present application may further include more or fewer devices than the storage device 200.
  • the storage device 200 may not include the hard disk protocol conversion module 205.
  • the storage device 200 may further include an AI calculation card 207, which is used to implement AI calculation under the control of the processor 202 Functions, such as the steps of performing prediction and difference calculation as described in the following, specific examples can refer to the following.
  • the AI computing card may be, for example, an AI computing chip, of course, the embodiment of the present application is not limited thereto.
  • the processor 202 may not need to perform steps such as prediction and difference.
  • the storage device 200 may further include a compression and decompression module, which is configured to perform steps such as compression and decompression under the control of the processor 202.
  • a compression and decompression module which is configured to perform steps such as compression and decompression under the control of the processor 202.
  • the processor 202 may not need to perform steps such as compression and decompression.
  • the compression and decompression module described here may be a hardware such as a chip.
  • the storage device 200 may include both an AI computing card 207 and a compression and decompression module.
  • the hardware structure of the storage device 200 described above is described based on the system architecture shown in FIG. 1 as an example.
  • the hardware structure of any of the storage devices 200 provided above may not include the interface card 201 and the bus between the interface card 201 and the processor 202 206.
  • the processor used to implement the function of the client 100 may be the same processor as the processor 202 described above, or may be a different processor.
  • the AI neural algorithm may include an input layer 31, a hidden layer 32, and an output layer 33. among them:
  • the input layer 31 is used to receive the value of the input variable, and send the value of the received input variable to the hidden layer 32 directly or after processing.
  • the role of processing is to obtain information that can be identified by the hidden layer 32.
  • the input variable is one or more data before the data to be predicted.
  • the number of input variables of the input layer 31 and which input variables are specifically the data before the data to be predicted can be flexibly adjusted according to the prediction accuracy requirements. For example, if the data to be predicted is the n-th data and labeled as X (n), the input variable can be n-1 data (labeled as X (1), X (2) before the data to be predicted X (n). ) ... Any one or more of X (n-1)). n ⁇ 1, n is an integer.
  • the hidden layer 32 is configured to predict the prediction data according to the value of the input variable received from the input layer 31, and send the prediction result to the output layer 33.
  • the hidden layer 32 is composed of a y-layer neural network, y ⁇ 1, and y is an integer. The value of y can be adjusted according to the prediction accuracy requirements.
  • Each layer of the neural network includes one or more neurons, and the number of neurons included in different layers of the neural network may be the same or different.
  • the neurons included in the first layer of the neural network can be expressed as S 11 , S 12 , S 13 ..., and the neurons included in the second layer of the neural network can be expressed as S 21 , S 22 , S 23 ...
  • the neurons included in the network can be expressed as S y1 , S y2 , S y3 — Any two neurons included in the hidden layer may or may not be connected.
  • Each connection has a weight, and the weight of the i-th connection can be expressed as wi. i ⁇ 1, i is an integer.
  • parameters such as y, wi, and the number of neurons included in each layer of the neural network may be initialized and assigned when the storage device is started. The value assigned during initialization can be obtained by training and verifying the stored data (such as a large amount of stored data) by an offline machine under a certain prediction accuracy requirement.
  • online learning can be selectively enabled to adjust the value of one or more of y, wi, and the number of neurons included in each layer of the neural network, thereby improving prediction accuracy.
  • the output layer 33 is used to output the prediction result of the hidden layer 32 directly or after processing.
  • the role of processing is to obtain information that can be identified by the device / module that receives the prediction result.
  • the prediction result includes prediction data obtained by performing prediction on the data to be predicted.
  • the type of the AI neural algorithm may include any one of the following: an NLMS type, an SLP type, an MLP type, or an RNN type.
  • the RNN-type AI neural algorithm can include Google's rapid and accurate image super-resolution (RAISR) algorithm, or object motion trajectory prediction technology and algorithms in intelligent driving, such as Baidu's Apollo ( Apollo) intelligent driving algorithm.
  • RAISR rapid and accurate image super-resolution
  • object motion trajectory prediction technology such as Baidu's Apollo ( Apollo) intelligent driving algorithm.
  • Google's RAISR algorithm can be described as: through machine learning pictures, to obtain the internal law of picture changes, where the internal law can be determined by parameters in the algorithm (such as the above y, wi and the number of neurons included in each layer of the neural network, etc. (One or more of)); then, based on the obtained parameter values and known pixel values in the picture, predict the pixel value of each missing pixel in the picture in the original high-resolution picture To restore low-resolution pictures to high-resolution pictures.
  • the role of Google's RAISR algorithm is to predict the missing part through machine learning.
  • Baidu's Apollo intelligent driving algorithm can be described as: learning the motion parameters of an object to obtain the internal law of the motion parameters of the object.
  • the internal law can be determined by the parameters in the algorithm (such as the above y, wi and each layer of neural network). Including one or more of the number of neurons included), and then predicting future motion parameters of the object through the obtained values of the parameters and current and / or historical motion parameters of the object.
  • a computer it is to predict a set of known binary data that will change in position or specific value in future binary data.
  • the AI neural algorithm used in the embodiment of the present application can be described as: storing stored data through machine learning to obtain the internal law of the stored data change, wherein the internal law can be determined by parameters in the algorithm (such as the above y , Wi, and one or more of the number of neurons included in each layer of the neural network) to characterize; then, based on the obtained parameter values and known stored data, predict unknown unknown Storing data.
  • the internal law can be determined by parameters in the algorithm (such as the above y , Wi, and one or more of the number of neurons included in each layer of the neural network) to characterize; then, based on the obtained parameter values and known stored data, predict unknown unknown Storing data.
  • the algorithm such as the above y , Wi, and one or more of the number of neurons included in each layer of the neural network
  • Dictionary compression technology is currently recognized in the industry as an efficient storage technology. Its basic principle is: a dictionary is pre-stored in the storage device.
  • the dictionary includes at least two mapping relationships, each mapping relationship refers to a first data and a second data A mapping relationship between them, wherein the storage space occupied by the first data of each mapping relationship is greater than the storage space occupied by the second data of the mapping relationship.
  • each mapping relationship is a mapping relationship between a complex symbol (or complex data) and a simple symbol (or simple data).
  • any two first data in the dictionary are different, and any two second data are different.
  • the storage device may compare the data to be compressed with the first data in the dictionary. If the first data in the dictionary includes the data to be compressed, the corresponding data to be compressed is stored. If the first data in the dictionary does not include the data to be compressed, the data to be compressed is stored.
  • the information stored in the storage device (that is, used to restore the The information of the data to be compressed) can be: Iam, 00, 01, Iam from 02.
  • the basic principle of dictionary-type decompression technology is: the storage device compares the data to be decompressed (such as data read from the storage space) with the second data in the dictionary. Data, the first data corresponding to the data to be decompressed is used as the decompressed data; if the second data in the dictionary does not include the data to be decompressed, the data to be decompressed is used as the decompressed data itself.
  • the term "plurality” in this application means two or more.
  • the term “and / or” in this application is merely an association relationship describing an associated object, which means that there can be three kinds of relationships, for example, A and / or B can mean: A exists alone, and A and B exist simultaneously, There are three cases of B alone.
  • the character "/" in this article generally indicates that the related objects are an "or” relationship. When the character "/” is used in a formula, it generally indicates that the related objects are a "divide” relationship. For example, the formula A / B represents A divided by B.
  • the terms “first” and “second” in the present application are used to distinguish different objects, and do not limit the order of the different objects.
  • FIG. 5 it is a schematic diagram of a data storage method according to an embodiment of the present application.
  • the method shown in FIG. 5 may include the following steps:
  • S100 The storage device obtains current data and historical data of the current data.
  • the storage device obtains at least two pieces of current data (that is, the data to be stored) and historical data of the current data (that is, historical data to be stored); the historical data is a sequence of at least two pieces of data to be stored. One or more data before the current data.
  • the storage device uses the historical data to predict the current data to obtain predicted data of the current data.
  • the prediction data of the current data is data obtained by predicting the current data based on a change rule of the historical data.
  • the change law of historical data is specifically the change law of content or value of historical data.
  • each data in the sequence (that is, the data to be stored) is: X (1), X (2), X (3) ... X (n) ... X (N), where 1 ⁇ n ⁇ N, and N ⁇ 2, both n and N are integers.
  • the historical data can be any one or more data before X (n).
  • the historical data is discontinuous data starting from X (n-1) and before X (n-1).
  • the historical data of the current data is specifically which data or data before the current data may be related to the algorithm used to perform the prediction in S101.
  • the embodiment of the present application does not limit the algorithm used for performing prediction.
  • the algorithm may include an AI neural algorithm.
  • the input variables of the AI neural algorithm shown in Figure 4 are X (n-2), X (n-4), X (n-6), X (n-8), and X (n-10) .
  • the historical data is X (48), X (46), X (44), X (42), and X (40).
  • the storage device has obtained the values of the parameters of the AI neural algorithm (such as y, wi and the number of neurons included in each layer of the neural network, etc.).
  • the values of the parameters of the AI neural algorithm can be obtained by storing the data through offline and / or online training.
  • the storage device may predict the current data according to the values of the parameters of the obtained AI neural algorithm and the historical data to obtain the predicted data of the current data.
  • the storage device may obtain at least two data to be stored according to the data to be written carried by the one or more write requests, where the one or more The data to be written carried by the write request is data for the same or the same type of subject.
  • the subject can be the same article, or the same picture, or multiple pictures of the same type.
  • sort the at least two pieces of to-be-stored data to obtain a sequence formed by the at least two pieces of to-be-stored data, and then sequentially use each piece of to-be-stored data in the sequence as some or all of the to-be-stored data, Execute S100 ⁇ S105.
  • Obtaining at least two pieces of data to be stored according to the data to be written carried by the one or more write requests may include: using each piece of data to be written carried by the one or more write requests as one piece of data to be stored, or The data to be written carried by multiple write requests is recombined and / or divided into at least two data to be stored. That is, the granularity of the data to be written received by the storage device and the granularity of the storage device processing (including one or more of prediction, difference calculation, storage, etc.) may be the same or different.
  • each data to be written included in the one or more write requests is 8 bits
  • each data to be stored is one data to be stored
  • each data to be stored is When it is 16 bits
  • each data to be stored can be obtained by combining two data to be written
  • each data to be stored is 4 bits
  • every two data to be stored can be obtained by dividing one data to be written.
  • the following description uses each to-be-written data as an example of to-be-stored data.
  • the sorting rule on which the sorting is performed in this example is related to the prediction algorithm used to perform the prediction, such as an AI neural algorithm.
  • the sorting rule on which the sorting is performed and the storage device obtains the values of the parameters of the AI neural algorithm (the above y, wi and the number of neurons included in each layer of the neural network, etc.) participate in
  • the sorting rules on which the data is stored during training are the same.
  • the sorting rule may be the order of the characters in the article in the article or the reverse order of the order.
  • the sorting rule may be a rule of sorting each pixel point in the picture row by row or column by column, or dividing the picture into multiple parts, A rule for sorting each pixel in the new picture after combining similar parts, row by row or column by column.
  • the storage device may obtain at least two data to be stored in the write data carried by the write request, and the order of the at least two data to be written
  • the sequence formed in sequence is regarded as a sequence formed by at least two data to be stored; then, each piece of to-be-stored data in part or all of the to-be-stored data in the sequence is sequentially used as current data, and S101 to S105 are performed.
  • the storage device may not perform the sorting step of the data to be written.
  • An application scenario of this example may be: in the process of obtaining the values of the parameters of the AI neural algorithm (such as the above y, wi and the number of neurons included in each layer of the neural network, etc.), the storage device is The order of the stored data is the order in which the storage device receives the stored data sent by the client.
  • the parameters of the AI neural algorithm such as the above y, wi and the number of neurons included in each layer of the neural network, etc.
  • the prediction step may be the default. For example, if the historical data is 10 consecutive data starting from X (n-1) and before X (n-1), then for the 1st to 10th data to be stored, the prediction step may be the default of.
  • the storage device may store the stored data according to the technical solutions provided in the prior art, such as direct storage, or perform compression after storing according to algorithms such as dictionary compression algorithms and / or deduplication algorithms. Understandably, in this case, S102 to S104 may also be the default.
  • Solution 2 When the storage device predicts different data to be stored, the parameters of the prediction algorithm used may be the same or different.
  • the input variable of the AI neural algorithm may be 5 consecutive data starting from X (n-1) and before X (n-1), that is to say , The number of input variables is 5; for the 10th and subsequent data to be stored, the input variables of the AI neural algorithm can be continuous from X (n-1) and before X (n-1) 10 data, that is, the number of input variables is 6.
  • the storage device obtains a difference between the current data and the predicted data of the current data.
  • the difference is a parameter used to characterize the difference between the current data and the predicted data of the current data.
  • the difference may be a difference, a ratio, a multiple, or a percentage, and the embodiments of the present application are not limited thereto.
  • the difference may be a difference obtained by subtracting the predicted data of the current data from the current data or a difference obtained by subtracting the current data from the predicted data of the current data.
  • What kind of difference value can be predefined is not limited in the embodiment of the present application. Understandably, since the predicted data of the current data may be greater than, equal to, or less than the current data, the difference may be a value greater than, equal to, or less than 0.
  • the difference is a ratio, a multiple, or a percentage, the specific implementation of the difference and the principle of the value are similar, and they are not listed here one by one.
  • S103 The storage device determines whether the storage space occupied by the difference is smaller than the storage space occupied by the current data. If yes, execute S104; if not, execute S105.
  • S103 can be implemented in one of the following ways:
  • Method 1 The storage device determines whether the number of bits of the difference is smaller than the number of bits of the current data.
  • Method 2 The storage device compresses the difference and the current data separately (for example, using a dictionary compression algorithm or a deduplication algorithm to compress), and determines whether the number of bits of the value obtained by compressing the difference is smaller than that obtained by compressing the current data. The number of bits in the value.
  • the storage device stores the difference or a value obtained by compressing the difference.
  • the value obtained by compressing the specific storage difference or the storage difference may be predefined, of course, the embodiment of the present application is not limited thereto.
  • the embodiment of the present application does not limit the compression algorithm used for performing the compression, for example, it may include at least one of a dictionary compression algorithm and a deduplication algorithm.
  • the specific algorithm or algorithms to be used may be predefined.
  • the embodiments of the present application are not limited thereto.
  • the storage device stores current data or a value obtained by compressing the current data. Whether to store the compressed value of the current data or to store the current data may be predefined, and of course, the embodiment of the present application is not limited thereto.
  • the compression algorithm used by the storage device is consistent with the compression algorithm used by the compression in S104, and of course, the embodiment of the present application is not limited thereto.
  • the data to be read may be determined.
  • the method may further include the following S105A:
  • the storage device stores the first identification information, and the first identification information is used to indicate that the stored information for restoring the current data is the information stored in S105 (that is, the current data or a value obtained by compressing the current data).
  • the first identification information may be used as identification information of information used to restore current data, or may be information carried by the information used to restore current data.
  • S105A can be replaced with the following S104A after executing S104; or, in the case of executing S105A, after executing S104, the following S104A can also be executed:
  • the storage device stores the second identification information, and the second identification information is used to indicate that the stored information used to recover the current data is the information stored in S104 (that is, the difference or the value obtained by compressing the difference).
  • the prediction data of the data to be stored can be approximated to the data to be stored, thereby helping to realize the storage occupied by the difference between most of the data to be stored and the data to be stored.
  • the space is less than the storage space occupied by the data to be stored; therefore, in specific implementation, executing S105A and not executing S104A (that is, storing the first identification information and not storing the second identification information) can implement the storage device to distinguish the stored usage While recovering whether the information of the current data is "a value or a difference obtained by compressing the difference" or "a value or a current value obtained by compressing the current data", the storage overhead is saved.
  • FIG. 6 to FIG. 8 are described below based on the example of executing S105A and not executing S104A in the data storage method.
  • historical data is used to predict the current data, and when the storage space occupied by the difference between the current data and the predicted data of the current data is less than the storage space occupied by the current data, the difference is stored.
  • the value obtained by compressing the amount or storage difference because the storage space occupied by the difference is smaller than the storage space occupied by the current data, the process of predicting and calculating the difference can be considered as a data compression process. In this way, compared to the prior art, whether the difference is directly stored or the difference is stored, Compressed values can save storage overhead.
  • the technical solution of storing the compressed value of the difference can further save the storage overhead.
  • the technical solution for storing the compressed value of the difference shown in FIG. 5 can be understood as follows: before the traditional compression algorithm is used to process the data, a prediction algorithm is introduced, wherein the prediction algorithm is based on the data content rule, development trend, and internal relationship Etc .; use the prediction algorithm and the data already input to the storage device to predict the content of the data to be input to the storage device (or to the previous data in a sequence of multiple data that has been input to the storage device to predict the subsequent data ), Then, for predicting accurate content or similar content, only the traditional data compression algorithm is called to compress the difference between the true value and the predicted value, without storing the predicted accurate content or similar content. In order to achieve the goal of increasing the compression rate and actively reducing the fluctuation range of the input value of the traditional compression algorithm, the current compression algorithm is optimized to achieve breakthroughs in decompression rate and decompression speed.
  • the storage object is a binary sequence for the storage device side
  • the actual stored sequence may be ⁇ 10,101,1010,0,0,0,0,0, 0,0,0,0,01 ⁇ .
  • the prediction step is the default.
  • the storage device may further store the first identification information.
  • the stored "01" is a difference between the data to be stored 171 and the prediction data 170 of the data to be stored. It can be seen that the range of data to be compressed and stored is significantly reduced, and the probability of data duplication is significantly increased. Therefore, the data compression ratio and compression efficiency can be significantly improved.
  • each piece of data to be stored is compared with the predicted data of the data to be stored.
  • One of the following situations may occur: identical, partially identical, completely different; storage situations may be saved for identical or partially identical situations; for completely different situations, the effect is equivalent to the corresponding method used in the prior art. Therefore, overall, you can save storage space. And, for the technical solution of storing the compressed value, the data compression ratio and compression efficiency can be significantly improved.
  • FIG. 5A a schematic diagram of data to be stored and information actually stored (that is, information for recovering data to be stored) is provided according to an embodiment of the present application.
  • the historical data is before the current data, and five consecutive data to be stored starting from the previous data to be stored are taken as an example. Therefore, for the first five data to be stored in the sequence,
  • the corresponding actually stored information is the data to be stored (or a value obtained by compressing the data to be stored) and the first identification information.
  • Each shaded small square represents the actual stored information corresponding to one piece of data to be stored, and the corresponding relationship is shown by the dotted arrow. "A" indicates the first identification information.
  • FIG. 6 an example of the data storage method shown in FIG. 5 may be shown in FIG. 6.
  • the method shown in FIG. 6 may include the following steps:
  • the storage device receives the write request sent by the client through the interface card, and the write request includes address information of at least two pieces of data to be written and each piece of data to be written.
  • the interface card will perform transmission protocol conversion on the at least two data to be written and the address information of each data to be written. For example, the information using the Ethernet communication protocol is converted into information using the PCIE protocol.
  • the interface card sends at least two data to be written and address information of each data to be written to the processor after the transmission protocol is converted.
  • S204 The processor treats each of the at least two pieces of data to be written received from the interface card as one piece of data to be stored, and sorts the obtained at least two pieces of data to be stored.
  • S205 The processor stores the sorted sequence and the address information of each data to be stored (that is, each data to be written) into the memory. Subsequently, the processor may sequentially use each piece of the to-be-stored data in the sequence as the current data, and execute the following S206 to S219. It can be understood that, for any two pieces of data to be stored in the sequence, the to-be-stored data located earlier in the sequence can be used as historical data of the to-be-stored data located later.
  • S204 and S205 can be replaced by: the processor treats the received at least two data to be written from the interface card as one data to be stored, and writes the obtained data to be stored and the address information of the data to be stored into In memory. Then, the processor may sort at least two data to be stored written into the memory; or, the processor may sequentially use the sequence of the at least two data to be written received by the interface card as the sequence of the at least two data to be stored Sequence to form a sequence, and each of the to-be-stored data in the sequence as the current data is sequentially executed, and the following S206 to S219 are performed.
  • the processor may delete the data to be stored from the memory when the data to be stored is no longer used as historical data of other data to be stored, so as to save memory. Storage overhead.
  • FIG. 6A it is a schematic diagram of information stored in the memory and the hard disk at a certain time.
  • FIG. 6A is drawn based on FIG. 5A. Therefore, for explanation of various figures or arrows in FIG. 6A, refer to FIG. 5A.
  • the sequence formed by the data to be stored in the memory may only include the historical data of the current data, the current data, and the data to be stored after the current data, which can save the storage overhead of the memory.
  • S206 The processor obtains current data and historical data of the current data from the memory.
  • S207 The processor uses the historical data to predict the current data to obtain the predicted data of the current data.
  • S208 The processor obtains a difference between the current data and the predicted data of the current data.
  • S209 The processor determines whether the storage space occupied by the difference is smaller than the storage space occupied by the current data.
  • S211 The processor sends the compressed value of the difference and the address information of the current data obtained from the memory to the hard disk protocol conversion module.
  • the hard disk protocol conversion module performs transmission protocol conversion on the compressed value received and the address information of the current data, such as converting from the PCIE protocol to the SAS protocol.
  • the hard disk protocol conversion module sends the compressed value of the difference converted by the transmission protocol and the address information of the current data to a hard disk, such as a SAS hard disk.
  • the hard disk stores the compressed value in the storage space indicated by the address information of the current data. After executing S214, the storage process for the current data ends.
  • S215 The processor compresses the current data.
  • the processor sends the first identification information, the compressed value of the current data, and the address information of the current data obtained from the memory to the hard disk protocol conversion module.
  • the first identification information is used to indicate that the stored information used to restore the current data is a value obtained by compressing the current data.
  • the hard disk protocol conversion module performs transmission protocol conversion on the received first identification information, the value obtained by compressing the current data, and the address information of the current data, such as converting from the PCIE protocol to the SAS protocol.
  • the hard disk protocol conversion module sends the first identification information converted by the transmission protocol, the compressed value of the current data, and the address information of the current data to the hard disk (such as a SAS hard disk).
  • the hard disk stores the first identification information and the value obtained by compressing the current data in a storage space indicated by the address information of the current data. After executing S219, the storage process for the current data ends.
  • an example of the data storage method shown in FIG. 5 may be an embodiment obtained by modifying the above-mentioned embodiment shown in FIG. 6 as follows: First, the above S207 to S209 are implemented by AI calculation card is executed. Second, after executing S206 and before executing S207, the method further includes: the processor sends historical data and current data obtained from the memory to the AI computing card. Third, after executing S209 and before executing S210, the above method further includes: the AI calculation card sends the difference to the processor. Fourth, after executing S209 and before executing S215, the method further includes: the AI computing card sends the current data to the processor.
  • FIG. 7 it is a schematic diagram of a data acquisition method according to an embodiment of the present application.
  • This embodiment corresponds to the data storage method shown in FIG. 5. Therefore, for explanation of related content in this embodiment, reference may be made to the embodiment shown in FIG. 5.
  • the method shown in FIG. 7 may include the following steps:
  • the storage device reads information for recovering current data (that is, data to be acquired currently).
  • the information used to recover the current data includes "the difference or the value obtained by compressing the difference” or "the current data or the value obtained by compressing the current data”.
  • the difference is the difference between the current data and the predicted data of the current data.
  • the predicted data of the current data is the data after the current data is predicted based on the change rule of the historical data.
  • the historical data is one or more data that have been acquired.
  • the storage device may obtain address information of at least two pieces of data to be obtained according to the data to be read requested by the one or more read requests, and then, based on the at least Address information of two pieces of data to be obtained, and reading information for recovering the at least two pieces of data to be obtained.
  • the data requested by the one or more read requests is data for the same subject.
  • the granularity of the data to be read may be the same as or different from that of the data to be obtained. For example, if one piece of data to be read is 8 bits, one piece of data to be obtained may be 4 bits, 8 bits, or 16 bits.
  • each data to be read is one to be acquired.
  • the storage device may use each of the at least two to-be-obtained data as current data, thereby performing S301 to S306.
  • the information used to recover the current data includes "the difference or the value obtained by compressing the difference", then whether the difference or the value obtained by compressing the difference may be predefined, of course, this application is not limited thereto.
  • the information used to recover the current data includes "the current data or the value obtained by compressing the current data", then whether the current data or the value obtained by compressing the current data may be predefined, of course, this application is not limited to this.
  • the storage device determines whether the information used to recover the current data carries the first identification information.
  • the determination result of S302 is NO, it means that the information used to restore the current data includes the difference or a value obtained by compressing the difference. Based on this, when the information used to restore the current data is a value obtained by compressing the difference, S303 is performed; when the information used to restore the current data is a difference, S304 is performed.
  • step S306 is performed; when the information used to restore the current data is the current data, the acquisition process for the current data ends.
  • S303 The storage device decompresses the value obtained by compressing the difference to obtain the difference.
  • the decompression algorithm used for performing the decompression in S303 corresponds to the compression algorithm used for performing the compression in S104.
  • the dictionary compression algorithm is used to perform the compression in S104
  • the dictionary decompression algorithm is used to perform the decompression in S303.
  • the deduplication algorithm is used to perform the compression in S104.
  • S304 The storage device uses the historical data to predict the current data, and obtains the predicted data of the current data.
  • the historical data is one or more data that the storage device has obtained. Whether the historical data is one or more data obtained, and which one or more data is related to the prediction algorithm. For a specific implementation manner, reference may be made to the embodiment shown in FIG. 5, and details are not described herein again.
  • FIG. 7A a schematic diagram of data to be acquired and information actually stored (that is, information for recovering data to be acquired) is provided according to an embodiment of the present application.
  • the actually stored information in FIG. 7A is the same as the actually stored information shown in FIG. 5A, so the interpretation of related graphics or arrows can be as shown in FIG. 5A.
  • S303 can be executed before S304, or S304 can be executed before S303, or S303 and S304 can be executed simultaneously.
  • S305 The storage device determines the current data according to the difference and the predicted data of the current data.
  • the difference is the difference between the current data and the predicted data of the current data
  • the sum of the difference and the predicted data of the current data is used as the current data.
  • the difference is a ratio of the current data divided by the predicted data of the current data
  • the product of the difference and the predicted data of the current data is used as the current data.
  • Other examples are not listed one by one.
  • S306 The storage device decompresses the value obtained by compressing the current data to obtain the current data.
  • the data acquisition method provided in this embodiment corresponds to the data storage method shown in FIG. 5. Therefore, for the beneficial effects in this embodiment, reference may be made to the beneficial effects described in the embodiment shown in FIG. 5, which will not be repeated here. .
  • FIG. 8 may include the following steps:
  • the storage device receives the read request sent by the client through the interface card, and the read request includes address information of one or more data to be read.
  • the interface card performs transmission protocol conversion on the address information of the one or more data to be read, for example, converts the address information of the one or more data to be read using the Ethernet communication protocol into information using the PCIE protocol.
  • the interface card sends address information of one or more data to be read converted by the transmission protocol to the processor.
  • S404 The processor uses the address information of each of the received address information of the one or more data to be read as the address information of the data to be obtained.
  • S405 The processor stores the address information of each of the data to be acquired into a memory.
  • the processor may sequentially perform each of some or all of the to-be-obtained data as current data, and execute S406 to S415.
  • the current data may be stored in the memory, so that the current data can be used as historical data of other current data in the future.
  • the processor reads information for recovering the current data from the storage space indicated by the address information of the current data of the hard disk, and sends the read information for recovering the current data to the hard disk protocol conversion module for transmission.
  • Protocol conversion such as converting information used to recover current data using the SAS protocol to information using the PCIE protocol.
  • S407 The processor determines whether the information used to recover the current data carries the first identification information.
  • the information used to restore the current data includes the difference or the value obtained by compressing the difference. Based on this, when the information used to restore the current data is the value obtained by compressing the difference, S408 is executed; When the information for recovering the current data is a difference, S409 is executed.
  • step S412 when the information used to restore the current data is the compressed value of the current data, S412 is performed; When the information to restore the current data is the current data, step S413 is performed.
  • S408 The processor decompresses the value obtained by compressing the difference to obtain the difference.
  • S409 The processor obtains historical data from the memory.
  • S410 The processor uses the historical data to predict the current data to obtain the predicted data of the current data.
  • S411 The processor determines the current data according to the difference and the predicted data of the current data.
  • S412 The processor decompresses the value obtained by compressing the current data to obtain the current data.
  • S413 The processor sends the current data to the interface card.
  • the processor may also store the current data in the memory as historical data of other data to be obtained. Further optionally, when the acquired data is no longer used as historical data of the data to be acquired, the processor may delete the acquired fetched data from the memory to save memory storage overhead. For example, suppose the current data is X (n), and the historical data is 10 consecutive data starting from X (n-1) and before X (n-1) (that is, X (n-10) ⁇ X (n -1)), then X (n-11) and the previous data are no longer used as historical data of the data to be acquired, so the processor can delete the acquired data from the memory.
  • the interface card performs transmission protocol conversion on the current data, for example, converts the PCIE protocol to an Ethernet communication protocol.
  • S415 The interface card feeds back the current data using the Ethernet communication protocol to the client.
  • an example of the data storage method shown in FIG. 7 may be an embodiment obtained by modifying the above-mentioned embodiment shown in FIG. 8 as follows: First, the above S410 to S411 are Executed by AI calculation card. Second, after executing S409 and before executing S410, the method further includes: the processor sends historical data obtained from the memory to the AI computing card. Third, after executing S411 and before executing S413, the method further includes: the AI computing card sends the current data to the processor.
  • the mapping relationships in the dictionary of the dictionary compression (or decompression) algorithm are arranged according to the hit rate from high to low.
  • the first data in the dictionary is searched for the data to be compressed in the order of the hit ratio of the mapping relationship, and the second data corresponding to the data to be compressed is used as the data to be compressed. After getting the value.
  • the second data in the dictionary is searched for the data to be decompressed in order of the hit ratio of the mapping relationship from high to low, and the first data corresponding to the data to be decompressed is used as the to be decompressed The value obtained after decompressing the data. In this way, when the hit ratio of the mapping relationship where the data to be compressed / decompressed is low, it takes a long time to perform compression / decompression.
  • each mapping relationship refers to a relationship between a first data and a second data. Between the mapping data, the storage space occupied by the first data of each mapping relationship is greater than the storage space occupied by the second data of the mapping relationship, each set corresponds to a range of hit rates, and different sets have different ranges of hit rates, The hit ratio of the mapping relationship in each set belongs to the hit ratio range corresponding to the set.
  • the number of collections included in the storage device and the hit ratio range corresponding to each collection may be predefined or updated according to the stored data.
  • the mapping relationship can also be updated.
  • each mapping relationship may be a mapping relationship in a dictionary of a dictionary compression algorithm.
  • the mapping relationships included in the at least two sets may be some or all mapping relationships in a dictionary.
  • the mapping relationship contained in the at least two sets may be a mapping relationship stored in any storage medium (such as a cache, a memory, or a hard disk) in the storage device. If the storage medium is a cache or a memory, the mapping relationship included in the at least two sets may be a partial mapping relationship in a dictionary; if the storage medium is a hard disk, the mapping relationship included in the at least two sets may be a dictionary All mappings in.
  • each set stored in the dictionary and its corresponding hit ratio range can be shown in Table 2:
  • the hit ratio of each mapping relationship may be the hit ratio of the first data of the mapping relationship, for example, the hit ratio of the first data may be according to the first data within a preset time period. The number of times the data has been compressed divided by the total number of times it was compressed.
  • the hit ratio of each mapping relationship may be the hit ratio of the second data of the mapping relationship, for example, the hit ratio of the second data may be based on the decompression of the second data within a preset time period. The number of times divided by the total number of times that decompression was performed.
  • the obtaining mechanism of the hit ratio of the same mapping relationship when applied to a data compression scenario and a decompression scenario, the sets included in the storage device may be the same or different; and the hit ranges corresponding to the same set may be the same or different.
  • a storage device includes 100 mapping relationships and is applied to a data compression scenario.
  • Each of the 100 mapping relationships may belong to one of the sets A1 and A2.
  • Each of the 100 mapping relationships may belong to one of the set B1, the set B2, and the set B3.
  • the hit ratio of each mapping relationship may be obtained according to the hit ratio of the first data of the mapping relationship and the second data of the mapping relationship. For example, assuming that the ratio of write data to read data of a storage device is 3: 7, and for a mapping relationship, during the writing of data, the hit rate of the first data of the mapping relationship is 10%. In the process, the hit ratio of the second data of the mapping relationship is 50%, and then the hit ratio of the mapping relationship can be obtained according to 0.3 * 10% + 0.7 * 50%.
  • the embodiments of the present application are not limited to this.
  • the storage devices when applied to a data compression scenario and a decompression scenario, the obtaining mechanism of the hit ratio of the same mapping relationship is different. Therefore, when design solution 1 is applied to a data compression and decompression scenario, the storage devices include the same sets, and the hit ratio corresponding to the same set decompresses the same scene. For example, suppose that the storage device includes 100 mapping relationships. When applied to data compression scenarios and decompression scenarios, each of the 100 mapping relationships may belong to one of the sets A1 and A2.
  • FIG. 9 it is a schematic diagram of a data compression method according to an embodiment of the present application.
  • the method shown in FIG. 9 may include the following steps:
  • the storage device obtains a hit ratio of the data to be compressed.
  • a manner of acquiring the hit ratio of the data to be compressed refer to the foregoing manner of acquiring the hit ratio of the first data, and of course, the embodiment of the present application is not limited thereto.
  • the data to be compressed may be the above-mentioned difference or current data, of course, the embodiment of the present application is not limited thereto.
  • the storage device determines a target set among at least two sets according to a hit rate of the data to be compressed.
  • the hit ratio of the data to be compressed is used to determine the hit ratio of the mapping relationship (hereinafter referred to as the target mapping relationship) in which the data to be compressed is located.
  • the determined hit ratio of the target mapping relationship belongs to the hit ratio range corresponding to the target set.
  • the hit ratio of the mapping relationship may be the hit ratio of the first data of the mapping relationship, or the hit ratio of the first data and the second data according to the mapping relationship.
  • the hit rate gets equal.
  • the hit ratio of the mapping relationship may be the hit ratio of the first data of the mapping relationship.
  • the hit rate of the data to be compressed is 75%
  • the hit rate of the target mapping relationship where the data to be compressed is 75%.
  • the target set is set 2.
  • the storage device searches for the data to be compressed in the first data of the target set, so as to find the mapping relationship where the data to be compressed is located, so as to determine the second data corresponding to the data to be compressed according to the mapping relationship, and
  • the two data are the values obtained by compressing the data to be compressed.
  • the storage device may directly find the data to be compressed from the first data in the set 2, so as to achieve compression of the data to be compressed. Instead of searching for the difference from the first data in the order of the hit ratio of the mapping relationship from high to low as in the prior art, this can save time for performing compression.
  • the second data corresponding to the data to be compressed may be a value obtained by compressing the difference described above. If the data to be compressed is the current data described above, the second data corresponding to the data to be compressed may be a value obtained by compressing the current data described above.
  • mapping relationships included in the storage device are classified into different sets. In this way, according to the hit rate of the data to be compressed, the set where the data to be compressed is located can be directly locked. To find the range of data to be compressed, you can save time when performing compression.
  • FIG. 10 it is a schematic diagram of a data decompression method provided by an embodiment of the present application.
  • the method shown in FIG. 10 may include the following steps:
  • the storage device obtains a hit rate of the data to be decompressed.
  • a method for acquiring the hit ratio of the data to be decompressed refer to the method for acquiring the hit ratio of the second data above.
  • the embodiment of the present application is not limited to this.
  • the data to be decompressed may be a value obtained by compressing the difference described above, or a value obtained by compressing the current data.
  • the embodiment of the present application is not limited thereto.
  • the storage device determines the target set from at least two sets according to the hit rate of the data to be decompressed; the hit rate of the data to be decompressed is used to determine the hit rate of the target mapping relationship where the difference compressed value is located.
  • the hit rate of the determined target mapping relationship belongs to the hit rate range corresponding to the target set.
  • the hit ratio of the mapping relationship when applied to a data decompression scenario, may be the hit ratio of the second data of the mapping relationship, or the hit ratio of the first data and the second data according to the mapping relationship.
  • the hit rate of the data is obtained.
  • the hit ratio of the mapping relationship may be the hit ratio of the second data of the mapping relationship.
  • the hit rate of the data to be decompressed is 75%
  • the hit rate of the target mapping relationship where the data to be decompressed is 75%.
  • the target set is set 2.
  • the storage device searches for the data to be decompressed in the second data of the target set, so as to find the mapping relationship where the data to be decompressed is located, so as to determine the first data corresponding to the data to be decompressed according to the mapping relationship, and decompresses the data to be decompressed.
  • the first data corresponding to the data is used as a value obtained by decompressing the data to be decompressed.
  • the first data corresponding to the data to be decompressed may be the difference described above. If the data to be decompressed is a value obtained by compressing the current data above, the first data corresponding to the data to be decompressed may be the current data described above.
  • the mapping relationships included in the storage device are classified into different sets. In this way, according to the hit rate of the data to be decompressed, the set where the data to be decompressed is directly locked, which is in line with the prior art. Compared with this, the scope of finding the data to be decompressed is narrowed, so the data decompression time can be saved.
  • the storage medium of the storage device may include a cache, a memory, and a hard disk.
  • the data stored in the cache is a part of the data stored in the memory
  • the data stored in the memory is a part of the data stored in the hard disk.
  • the process for the CPU to read data is as follows: the CPU first looks up the data to be accessed from the cache, and then reads it directly; if not, it looks up the data to be accessed from the memory. Further, if it is found, it is read directly; if it is not found, the data to be accessed is searched from the hard disk.
  • the storage technology as a dictionary compression technology as an example
  • the data stored in the cache, memory, and hard disk can be the mapping relationship in the dictionary.
  • mapping relationships contained in a storage medium such as cache or memory
  • mapping relationship where the data to be compressed / decompressed is not in the storage medium
  • the embodiment of the present application provides a design solution 2:
  • the storage medium of the storage device includes a cache, a memory, and a hard disk;
  • the hit ratio of the mapping relationship in the cache is greater than or equal to the hit ratio of the mapping relationship in the memory, and the mapping relationship in the memory
  • the hit ratio is greater than or equal to the hit ratio of the mapping relationship in the hard disk;
  • each mapping relationship refers to the mapping relationship between a first data and a second data, and the storage space occupied by the first data is larger than that occupied by the second data Storage space.
  • the range of the hit ratio of the mapping relationship in each storage medium may be preset, or may be updated according to the stored data.
  • each mapping relationship may be a mapping relationship in a dictionary of a dictionary compression algorithm.
  • each storage medium of the storage device and its corresponding hit ratio range can be shown in Table 3.
  • FIG. 11 it is a schematic diagram of a data compression method according to an embodiment of the present application.
  • the method shown in FIG. 11 may include the following steps:
  • S701 The storage device obtains a hit rate of the data to be compressed.
  • the storage device determines a target storage medium according to a hit rate of the data to be compressed.
  • the hit ratio of the data to be compressed is used to determine the hit ratio of the target mapping relationship where the data to be compressed is located.
  • the target storage medium is a cache; when the determined hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in the cache, but belongs to When the hit ratio range of the mapping relationship in memory, the target storage medium is memory; when the determined hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in memory, the target storage medium is a hard disk.
  • the hit rate of the data to be compressed is 90%
  • the hit rate of the target mapping relationship where the data to be compressed is 90%.
  • the target storage medium is a cache.
  • the target storage medium may be memory; if the hit rate of the data to be compressed is 30%, the target storage medium may be a hard disk.
  • the storage device searches for the data to be compressed in the first data of the target storage medium, thereby searching for the mapping relationship in which the data to be compressed is located, to determine the second data corresponding to the data to be compressed according to the mapping relationship, and The second data is used as a value obtained by compressing the data to be compressed.
  • mapping relationship included in the target storage medium is as shown in the first design solution
  • specific implementation process of S703 may refer to the above S501 to S503.
  • S703 can also be implemented according to the methods in the prior art.
  • the storage device does not find the data to be compressed in the first data of the target storage medium, then: when the next-level storage medium of the target storage medium does not exist in the storage device, the data to be compressed itself is compressed as the data to be compressed.
  • the next-level storage medium of the cache is memory
  • the next-level storage medium of memory is a hard disk.
  • the storage medium with the highest read-write performance where the data to be compressed is located can be directly locked, and the cache read and write The performance is higher than the read and write performance of the memory, and the read and write performance of the memory is higher than the read and write performance of the hard disk.
  • the range of searching for data to be compressed is narrowed, and therefore, data compression time can be saved.
  • FIG. 12 it is a schematic diagram of a data decompression method provided by an embodiment of the present application.
  • the method shown in FIG. 12 may include the following steps:
  • S801 The storage device obtains a hit ratio of the data to be decompressed.
  • the storage device determines a target storage medium according to a hit rate of the data to be decompressed.
  • the hit ratio of the data to be decompressed is used to determine the hit ratio of the target mapping relationship where the data to be decompressed is located.
  • the target storage medium is the cache; when the hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in the cache, but belongs to the memory
  • the hit ratio range of the mapping relationship is, the target storage medium is memory; when the hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in memory, the target storage medium is a hard disk.
  • the hit rate of the data to be decompressed is 90%
  • the hit rate of the target mapping relationship where the data to be decompressed is 90%.
  • the target storage medium is a cache.
  • the target storage medium may be memory; if the hit rate of the data to be decompressed is 30%, the target storage medium may be a hard disk.
  • the storage device searches for the data to be decompressed in the second data of the target storage medium, so as to find the mapping relationship where the data to be decompressed is located, so as to determine the first data corresponding to the data to be decompressed according to the mapping relationship, and decompress the data to be decompressed.
  • the first data corresponding to the data is used as a value obtained by decompressing the data to be decompressed.
  • mapping relationship included in the target storage medium is as shown in the first design solution above, the specific implementation process of S803 may refer to the above S601 to S603.
  • S803 can also be implemented according to the method in the prior art.
  • the storage medium with the highest read-write performance where the data to be decompressed is located can be directly locked and cached.
  • the read and write performance of the memory is higher than that of the memory, and the read and write performance of the memory is higher than that of the hard disk.
  • FIG. 13 it is a schematic diagram of a data storage method according to an embodiment of the present application.
  • the method shown in FIG. 13 may include the following steps:
  • the storage device obtains the current data (that is, the current to-be-stored data) and the historical data of the current data (that is, the historical-to-be-stored data) in the at least two data to be stored; the historical data is in a sequence composed of at least two data to be stored One or more data before the current data.
  • the storage device uses the historical data to predict the current data, and obtains the predicted data of the current data.
  • the predicted data of the current data is the data after the current data is predicted based on the change rule of the historical data.
  • S902 The storage device obtains a difference between the current data and the predicted data of the current data.
  • the storage device determines whether the absolute value of the difference is less than or equal to a preset threshold. For example, assuming the difference is a, the absolute value of the difference can be expressed as
  • the storage device stores preset data.
  • the storage space occupied by the preset data is smaller than the storage space occupied by the current data.
  • the preset data is predefined by the storage device.
  • the preset data may be an identifier, and the identifier is used to indicate that the predicted data of the current data can be (or approximately be) the current data.
  • the preset data is a binary number "0" or "1", and the like.
  • the storage space occupied by the preset data is less than the storage space occupied by most or all of the data to be stored.
  • the storage device may not need to judge the size relationship between the storage space occupied by the preset data and the storage space occupied by the current data. Instead, the storage device can set the preset data to the occupied storage based on the principle that the storage space occupied by the preset data is less than the storage space occupied by most or all of the data to be stored when the preset data is defined in advance. Smaller identifiers. In this way, even for a certain current data, the "storage space occupied by the preset data is less than the storage space occupied by the current data" is not satisfied.
  • the storage device may predefine preset data based on factors such as storage overhead.
  • the compression process in this technical solution is specifically a lossless compression process.
  • the compression process in this technical solution is specifically a lossy compression process.
  • the storage device stores current data or a value obtained by compressing the current data.
  • the method may further include the following S905A:
  • identification information When the absolute value of the difference is greater than a preset threshold, identification information is stored, and the identification information is used to indicate that the stored information used to restore the current data is a value obtained by compressing the current data.
  • identification information When the absolute value of the difference is greater than a preset threshold, identification information is stored, and the identification information is used to indicate that the stored information used to restore the current data is a value obtained by compressing the current data; when the current data is stored, the identification information is used to Indicates that the stored information used to restore the current data is current data.
  • the identification information may be used as identification information of information used to recover current data, or information carried by the information used to recover current data.
  • historical data is used to predict the current data, and when the absolute value of the difference between the current data and the predicted data of the current data is less than or equal to a preset threshold, the preset data is stored. Since the storage space occupied by the preset data is smaller than the storage space occupied by the current data, compared with the technical solution of directly storing the current data in the prior art, storage overhead can be saved. This technical solution can be applied to scenarios that allow a certain amount of data loss, such as scenes such as playing videos.
  • the sequence of the actual stored data may be ⁇ 10, 101, 1010, ⁇ , ⁇ , ⁇ , ⁇ , ⁇ , ⁇ , ⁇ , ⁇ , ⁇ , ⁇ , ⁇ .
  • is preset data.
  • the prediction step is the default.
  • the storage device may further store the first identification information. It can be seen that the scale of the data that needs to be compressed and stored is significantly reduced, and the probability of data duplication is significantly increased. Therefore, the data compression ratio and compression efficiency can be significantly improved.
  • FIG. 14 it is a schematic diagram of a data acquisition method according to an embodiment of the present application.
  • the method shown in FIG. 14 may include the following steps:
  • the storage device reads information for recovering current data.
  • the information for recovering the current data includes “preset data” or “current data or a value obtained by compressing the current data”.
  • the predicted data of the current data is data obtained by predicting the current data based on a change rule of the historical data; the historical data is one or more data that have been acquired.
  • S1002 The storage device determines whether the information used to recover the current data carries identification information.
  • the determination result of S1002 is NO, it means that the information for restoring the current data includes preset data, and S1003 is executed. If the judgment result of S1002 is yes, it means that the information used to restore the current data includes the current data or the value obtained by compressing the current data; when the information used to restore the current data is the value obtained by compressing the current data, execute S1004; when When the information used to restore the current data is current data, the acquisition process for the current data ends.
  • the storage device uses the historical data to predict the current data, obtains the predicted data of the current data, and uses the predicted data of the current data as the current data; the historical data is one or more data that has been obtained.
  • S1004 The storage device decompresses the value obtained by compressing the current data to obtain the current data.
  • the data acquisition method provided in this embodiment corresponds to the data storage method shown in FIG. 13. Therefore, for the beneficial effects in this embodiment, reference may be made to the beneficial effects described in the embodiment shown in FIG. 13, and details are not described herein again. .
  • the same prediction algorithm is used in the prediction (that is, the values of the parameters of the prediction algorithm are the same).
  • the prediction algorithm is an AI neural algorithm
  • the embodiments of the present application provide the following optional Implementation:
  • the storage device may also store the correspondence between the values of the parameters of the AI neural algorithm and the information used to recover the data.
  • the storage device may perform a snapshot operation after each update of the parameters of the AI neural algorithm to record the correspondence between the information used to restore the current data and the parameters of the AI neural algorithm used to perform prediction.
  • the embodiments of the present application are not limited to this.
  • the parameters of the AI neural algorithm at time t1 are the first parameters, and the parameters of the AI neural algorithm at time t2 are updated from the first parameter to the second parameter;
  • the information stored for recovering the current data in the time period after time t2 is the information: information 1 to 100, information 101 to 500; then, the storage device can store the information between the information 1 to 100 and the first parameter.
  • Optional implementation manner 2 After storing the information used to restore the current data, update the parameters of the AI neural algorithm through adaptive learning; update the information used to restore the current data according to the parameters of the updated AI neural algorithm.
  • updating the information used to restore the current data includes: reading the information used to restore the current data; according to the first parameter of the AI neural algorithm (That is, the parameters of the AI neural algorithm before the update), the information read to restore the current data and the historical data of the current data, and restore the current data; according to the second parameter of the AI neural algorithm (that is, the updated AI neural algorithm) Parameters) and historical data of the current data, to predict the current data to obtain the second predicted data; the second predicted data is the data that is predicted based on the current data's change rule and the second parameter of the AI neural algorithm; The second difference between the current data and the second predicted data; when the storage space occupied by the second difference is less than the storage space
  • the parameters of the AI neural algorithm before the update are used to perform a data acquisition process. After the current data is obtained, the parameters of the updated AI neural algorithm are used to execute the data storage process. This helps to implement the execution of the current data.
  • the parameters of the AI neural algorithm used in the data storage process are the latest parameters.
  • the storage device may also store the second parameter of the AI neural algorithm.
  • the first parameter of the stored AI neural algorithm is updated to the second parameter, that is, the latest parameter of the AI neural algorithm is stored in the storage device.
  • the optional implementation method 1 can be applied to a scenario where there is more data stored in the storage device; the optional implementation method 2 can be applied to the case where there is less data stored in the storage device.
  • the optional implementation method 2 can further improve the data compression efficiency.
  • the above-mentioned optional implementation manners 1 and 2 can be used in combination to form a new technical solution. For example, for a part of the stored data, the storage device may perform the above-mentioned optional implementation method 1; for another part of the stored data, the storage device may perform the implementation method 2 described above.
  • the storage device can obtain the AI neural network used to predict the current data according to the correspondence between the information used to recover the current data and the parameters of the AI neural algorithm.
  • the parameters of the algorithm may include: using the historical data to predict the current data according to the parameters of the obtained AI neural algorithm, to obtain the predicted data of the current data.
  • the “parameters of the AI neural algorithm” in the corresponding relationship refers to the method used in the process of storing “information for recovering current data” in the corresponding relationship. Parameters of the AI neural algorithm.
  • the obtained “parameters of the AI neural algorithm used to predict the current data” is the “parameters of the AI neural algorithm” in the mapping relationship.
  • the information used to restore the current data is information 99
  • the "parameters of the AI neural algorithm" in the corresponding relationship is the first parameter
  • the information is the information 200
  • the "parameters of the AI neural algorithm" in the corresponding relationship is the second parameter.
  • the storage device may obtain current data according to the embodiment shown in FIG. 7 above. If the foregoing optional implementation manner 1 is applied to the data storage process shown in FIG. 13, in the data acquisition process, the storage device may obtain current data according to the embodiment shown in FIG. 14 provided above.
  • the storage device may execute the embodiment shown in FIG. 7 or FIG. 14 according to the latest parameters of the stored AI neural algorithm.
  • the storage device may be divided into functional modules according to the foregoing method example.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules may be implemented in the form of hardware or software functional modules. It should be noted that the division of the modules in the embodiments of the present application is schematic, and is only a logical function division. In actual implementation, there may be another division manner.
  • FIG. 15 is a schematic structural diagram of a storage device according to an embodiment of the present application.
  • the storage device 150 shown in FIG. 15 may be used to execute the data storage method shown in FIG. 5 or FIG. 6.
  • the storage device 150 may include a first acquisition unit 1500, a prediction unit 1501, a second acquisition unit 1502, and a storage unit 1503.
  • the first obtaining unit 1500 is configured to obtain current data and historical data of the current data.
  • the prediction unit 1501 is configured to use the historical data to predict the current data to obtain a first prediction data of the current data.
  • the second obtaining unit 1502 is configured to obtain a first difference between a difference between the current data and the first prediction data.
  • the storage unit 1503 is configured to store the first difference or the value obtained by compressing the first difference when the storage space occupied by the first difference is smaller than the storage space occupied by the current data.
  • the first obtaining unit 1500 may be used to execute S100
  • the prediction unit 1501 may be used to execute S101.
  • the second obtaining unit 1502 may be configured to execute S102.
  • the storage unit 1503 may be used to execute S104.
  • the first obtaining unit 1500 is specifically configured to obtain the current data and the historical data from the memory of the storage device 150.
  • the storage unit 1503 is further configured to store a correspondence between the information used to recover current data and parameters of an AI neural algorithm used to perform prediction.
  • the storage device 150 further includes an update unit 1504 for updating the parameters of the AI neural algorithm through adaptive learning; and updating the information for restoring the current data according to the parameters of the updated AI neural algorithm.
  • an update unit 1504 for updating the parameters of the AI neural algorithm through adaptive learning; and updating the information for restoring the current data according to the parameters of the updated AI neural algorithm.
  • the updating unit 1504 is specifically configured to: read information for restoring the current data; restore the current according to the parameters of the AI neural algorithm used to perform the prediction, the information for restoring the current data, and the historical data of the current data.
  • Data according to the parameters of the updated AI neural algorithm and the historical data of the current data, the current data is predicted to obtain the second predicted data; the second predicted data is based on the change law of the historical data and the updated AI neural algorithm
  • the data after the current data is predicted by the parameters of the parameter; obtain the second difference between the current data and the second predicted data; when the storage space occupied by the second difference is less than the storage space occupied by the current data,
  • the information for restoring the current data is updated to a second difference or a value obtained by compressing the second difference.
  • the storage device 150 includes an AI calculation card, and the prediction unit 1501 is specifically configured to use the historical calculation data to predict the current data through the AI calculation card to obtain the first prediction data.
  • the algorithm used to perform the compression includes a dictionary compression algorithm.
  • the dictionary of the dictionary compression algorithm includes at least two sets, and each set includes one or more mapping relationships.
  • Each mapping relationship refers to a first data and For a mapping relationship between second data, the storage space occupied by the first data is greater than the storage space occupied by the second data; each set corresponds to a range of hit rates, and different sets correspond to different ranges of hit rates; the storage device 150 also The method includes: a third obtaining unit 1505, configured to obtain a hit ratio of the difference first difference; a determining unit 1506, configured to determine a target set among at least two sets according to the hit ratio of the difference first difference; wherein, The hit ratio of the first difference is used to determine the hit ratio of the target mapping relationship where the first difference is located.
  • the hit ratio of the determined target mapping relationship belongs to the hit ratio range corresponding to the target set; the compression unit 1507 uses Find the first difference of the difference in the first data of the target set to determine the second data corresponding to the first difference of the difference; Second data value difference is a difference between a first compressed obtained.
  • the storage medium of the storage device 150 includes a cache, a memory, and a hard disk;
  • the algorithm used to perform the compression includes a dictionary compression algorithm, and the dictionary of the dictionary compression algorithm includes one or more mapping relationships, each mapping relationship refers to a The mapping relationship between the first data and a second data, the storage space occupied by the first data is larger than the storage space occupied by the second data;
  • the hit ratio of the mapping relationship in the cache is greater than or equal to the hit of the mapping relationship in the memory Rate, the hit ratio of the mapping relationship in the memory is greater than or equal to the hit ratio of the mapping relationship in the hard disk;
  • the storage device 150 further includes: a third obtaining unit 1505, configured to obtain the hit ratio of the first difference;
  • the determining unit 1506 Used to determine the target storage medium according to the hit ratio of the first difference;
  • the hit ratio of the first difference is used to determine the hit ratio of the target mapping relationship where the first difference is located; when the determined target is When the hit ratio of the mapping relationship belongs to the hit ratio range
  • the storage unit 1503 is further configured to store the current data or a value obtained by compressing the current data when the storage space occupied by the first difference is greater than or equal to the storage space occupied by the current data.
  • the storage unit 1503 may be used to execute S105.
  • the storage unit 1503 is further configured to store identification information when the storage space occupied by the first difference is greater than or equal to the storage space occupied by the current data; wherein, when the value obtained by compressing the current data is stored, The identification information is used to indicate that the stored information used to restore the current data is a value obtained by compressing the current data; when the current data is stored, the identification information is used to indicate that the stored information used to restore the current data is current data.
  • the storage unit 1503 may be used to execute S105A.
  • the first acquisition unit 1500, the prediction unit 1501, the second acquisition unit 1502, the update unit 1504, the third acquisition unit 1505, the determination unit 1506, and the compression unit 1507 may all be implemented by the processor 202.
  • the storage unit 1503 may be implemented by the hard disk 204.
  • the prediction unit 1501 may be implemented by an AI calculation card 207.
  • the first obtaining unit 1500, the second obtaining unit 1502, the updating unit 1504, the third obtaining unit 1505, the determining unit 1506, and the compression unit 1507 can all be implemented by the processor 202.
  • the storage unit 1503 may be implemented by the hard disk 204.
  • FIG. 16 it is a schematic structural diagram of a storage device 160 according to an embodiment of the present application.
  • the storage device 160 shown in FIG. 16 may be used to execute the data acquisition method shown in FIG. 7 or FIG. 8.
  • the storage device 160 may include a reading unit 1601, a prediction unit 1602, and a determination unit 1603.
  • the reading unit 1601 is configured to read information used to recover the current data; the information used to recover the current data includes the difference or the value obtained by compressing the difference; the difference is the difference between the current data and the predicted data of the current data ;
  • the forecast data of the current data is the data after the current data is predicted based on the change law of the historical data.
  • the prediction unit 1602 is configured to predict the current data using historical data to obtain predicted data of the current data.
  • a determining unit 1603 is configured to determine the current data according to the information used to recover the current data and the predicted data of the current data. For example, in conjunction with FIG. 7, the reading unit 1601 may be used to execute S301. The prediction unit 1602 may be used to perform S304. The determining unit 1603 may be configured to execute S305.
  • the storage device 160 further includes an obtaining unit 1604, configured to obtain historical data from the memory of the storage device 150.
  • the storage device 160 further includes an obtaining unit 1604, configured to obtain the parameters of the AI neural algorithm used to predict the current data according to the correspondence between the information used to recover the current data and the parameters of the AI neural algorithm.
  • the prediction unit 1602 is specifically configured to: according to the obtained parameters of the AI neural algorithm, use the historical data to predict the current data to obtain predicted data of the current data.
  • the storage device 160 includes an AI calculation card
  • the prediction unit 1602 is specifically configured to use the AI calculation card to predict the current data using the historical data to obtain prediction data of the current data.
  • the information used to recover the current data includes a value obtained by compressing the difference.
  • the determining unit 1603 includes a decompression module 1603-1 for decompressing the compressed value of the difference to obtain the difference; and a determination module 1603-2 for predicting the difference based on the difference and the current data. Data to determine the current data.
  • the decompression module 1603-1 may be used to execute S303.
  • the determination module 1603-2 may be used to execute S304.
  • the algorithm used to perform the decompression includes a dictionary-type decompression algorithm.
  • the dictionary of the dictionary-type decompression algorithm includes at least two sets, and each set includes one or more mapping relationships.
  • Each mapping relationship refers to a first The mapping relationship between a piece of data and a piece of second data.
  • the storage space occupied by the first data is larger than the storage space occupied by the second data.
  • Each set corresponds to a range of hit rates, and different sets have different ranges of hit rates.
  • the decompression module 1603-1 is specifically configured to: obtain the hit ratio of the difference-compressed value; determine the target set among at least two sets according to the hit ratio of the difference-compressed value; the difference
  • the hit ratio of the compressed value is used to determine the hit ratio of the target mapping relationship where the differential compressed value is located.
  • the determined hit ratio of the target mapping relationship belongs to the range of hit ratios corresponding to the target set.
  • the value obtained by compressing the difference is searched to determine the first data corresponding to the value obtained by compressing the difference; the first data corresponding to the value obtained by compressing the difference is the difference.
  • the storage medium of the storage device 160 includes a cache, a memory, and a hard disk;
  • the algorithm used to perform the decompression includes a dictionary-type decompression algorithm, and the dictionary of the dictionary-type decompression algorithm includes one or more mapping relationships, and each mapping relationship Refers to the mapping relationship between a first data and a second data, the storage space occupied by the first data is greater than the storage space occupied by the second data; the hit ratio of the mapping relationship in the cache is greater than or equal to the mapping in memory The hit ratio of the relationship.
  • the hit ratio of the mapping relationship in memory is greater than or equal to the hit ratio of the mapping relationship in the hard disk.
  • the decompression module 1603-1 is specifically configured to: obtain the hit ratio of the difference compressed value; determine the target storage medium according to the hit ratio of the difference compressed value; wherein the difference is obtained by compression
  • the hit ratio of the value is used to determine the hit ratio of the target mapping relationship where the difference compressed value is located.
  • the target storage medium is the cache.
  • the target storage medium is the memory; when the determined target mapping relationship hit ratio When it does not belong to the hit ratio range of the mapping relationship in memory, the target storage medium is a hard disk; in the second data of the target storage medium, the value obtained by compressing the difference is found to determine the first value corresponding to the value obtained by compressing the difference. Data; the first data corresponding to the value obtained by compressing the difference is the difference.
  • the reading unit 1601, the prediction unit 1602, and the determination unit 1603 may all be implemented by the processor 202.
  • the prediction unit 1501 may be implemented by an AI calculation card 207. Both the reading unit 1601 and the determining unit 1603 may be implemented by the processor 202.
  • FIG. 17 it is a schematic structural diagram of a storage device 170 according to an embodiment of the present application.
  • the storage device 170 shown in FIG. 17 may be used to execute the data compression method shown in FIG. 9 or FIG. 11.
  • the storage device 170 may include an obtaining unit 1701, a determining unit 1702, and a compression unit 1703.
  • the storage device 170 stores at least two sets, and each set includes one or more mapping relationships.
  • Each mapping relationship refers to a mapping relationship between a first data and a second data.
  • the storage space occupied by the first data is larger than the storage space occupied by the second data, and each set corresponds to a range of hit rates, and different sets have different ranges of hit rates.
  • the obtaining unit 1701 is configured to obtain a hit ratio of data to be compressed.
  • the determining unit 1702 is configured to determine a target set among at least two sets according to the hit rate of the data to be compressed; the hit rate of the data to be compressed is used to determine the hit rate of the target mapping relationship where the data to be compressed is located, and the determined target mapping relationship The hit rate belongs to the hit rate range corresponding to the target set.
  • the compression unit 1703 is configured to find the data to be compressed in the first data of the target set to determine the second data corresponding to the data to be compressed, and use the second data corresponding to the data to be compressed as the value obtained by compressing the data to be compressed.
  • the obtaining unit 1701 may be configured to perform S501 and / or other steps provided in the embodiments of the present application.
  • the determining unit 1702 may be configured to execute S502 and / or other steps provided in the embodiments of the present application.
  • the compression unit 1703 may be configured to perform S503 and / or other steps provided in the embodiments of the present application.
  • the storage medium of the storage device 170 includes a cache, a memory, and a hard disk; the hit ratio of the mapping relationship in the cache is greater than or equal to the hit ratio of the mapping relationship in the memory, and the hit ratio of the mapping relationship in the memory Greater than or equal to the hit rate of the mapping relationship in the hard disk; each mapping relationship refers to the mapping relationship between a first data and a second data, the storage space occupied by the first data is larger than the storage space occupied by the second data .
  • the obtaining unit 1701 is configured to obtain a hit ratio of data to be compressed.
  • the determining unit 1702 is configured to determine the target storage medium according to the hit rate of the data to be compressed; the hit rate of the data to be compressed is used to determine the hit rate of the target mapping relationship where the data to be compressed belongs; when the determined hit ratio of the target mapping relationship belongs to When the hit ratio range of the mapping relationship in the cache, the target storage medium is the cache; when the determined hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in the cache, but belongs to the hit ratio range of the mapping relationship in memory , The target storage medium is memory; when the determined hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in the memory, the target storage medium is a hard disk.
  • the compression unit 1703 is configured to find the data to be compressed in the first data of the target storage medium to determine the second data corresponding to the data to be compressed, and use the second data corresponding to the data to be compressed as the value obtained by compressing the data to be compressed.
  • the obtaining unit 1701 may be configured to perform S701 and / or other steps provided in the embodiments of the present application.
  • the determining unit 1702 may be configured to perform S702 and / or other steps provided in the embodiments of the present application.
  • the compression unit 1703 may be configured to perform S703 and / or other steps provided in the embodiments of the present application.
  • the obtaining unit 1701, the determining unit 1702, and the compression unit 1703 may all be implemented by the processor 202.
  • FIG. 18 it is a schematic structural diagram of a storage device 180 according to an embodiment of the present application.
  • the storage device 180 shown in FIG. 18 may be used to execute the data decompression method shown in FIG. 10 or FIG. 12.
  • the storage device 180 may include an obtaining unit 1801, a determining unit 1802, and a decompressing unit 1803.
  • the storage device 180 stores at least two sets, and each set includes one or more mapping relationships.
  • Each mapping relationship refers to a mapping relationship between a first data and a second data.
  • the storage space occupied by the first data is larger than the storage space occupied by the second data, and each set corresponds to a range of hit rates, and different sets have different ranges of hit rates.
  • the obtaining unit 1801 is configured to obtain a hit ratio of the data to be decompressed.
  • the determining unit 1802 is configured to determine a target set from at least two sets according to the hit rate of the data to be decompressed; the hit rate of the data to be decompressed is used to determine the hit rate of the target mapping relationship where the data to be decompressed is determined.
  • the hit ratio of the target mapping relationship belongs to the hit ratio range corresponding to the target set.
  • the decompression unit 1803 is configured to find the first data corresponding to the data to be decompressed in the second data of the target set to determine the first data corresponding to the data to be decompressed, and use the first data corresponding to the data to be decompressed as the to be decompressed The value obtained by decompressing the data.
  • the obtaining unit 1801 may be configured to execute S601 and / or other steps provided in the embodiments of the present application.
  • the determining unit 1802 may be configured to execute S602 and / or other steps provided in the embodiments of the present application.
  • the decompression unit 1803 may be configured to perform S603 and / or other steps provided in the embodiments of the present application.
  • the storage medium of the storage device 180 includes a cache, a memory, and a hard disk; the hit ratio of the mapping relationship in the cache is greater than or equal to the hit ratio of the mapping relationship in the memory, and the hit ratio of the mapping relationship in the memory is greater than Or equal to the hit ratio of the mapping relationship in the hard disk; each mapping relationship refers to a mapping relationship between a first data and a second data, and the storage space occupied by the first data is larger than the storage space occupied by the second data.
  • the obtaining unit 1801 is configured to obtain a hit ratio of the data to be decompressed.
  • the determining unit 1802 is configured to determine the target storage medium according to the hit ratio of the data to be decompressed; the hit ratio of the data to be decompressed is used to determine the hit ratio of the target mapping relationship where the data to be decompressed is; when the determined target mapping relationship is hit When the rate belongs to the hit ratio range of the mapping relationship in the cache, the target storage medium is the cache; when the determined hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in the cache, but belongs to the hit ratio of the mapping relationship in memory When the range is specified, the target storage medium is memory; when the determined hit ratio of the target mapping relationship does not belong to the hit ratio range of the mapping relationship in memory, the target storage medium is a hard disk.
  • the decompression unit 1803 is configured to find the first data corresponding to the data to be decompressed from the second data of the target storage medium, and use the first data corresponding to the data to be decompressed as a value obtained by decompressing the data to be decompressed.
  • the obtaining unit 1801 may be configured to perform S801 and / or other steps provided in the embodiments of the present application.
  • the determining unit 1802 may be configured to perform S802 and / or other steps provided in the embodiments of the present application.
  • the decompression unit 1803 may be configured to perform S803 and / or other steps provided in the embodiments of the present application.
  • the obtaining unit 1801, the determining unit 1802, and the decompressing unit 1803 may all be implemented by the processor 202.
  • FIG. 19 it is a schematic structural diagram of a storage device 190 according to an embodiment of the present application.
  • the storage device 190 shown in FIG. 19 may be used to execute the data storage method shown in FIG. 13.
  • the storage device 190 may include a prediction unit 1901, an acquisition unit 1902, and a storage unit 1903.
  • the obtaining unit 1902 is configured to obtain current data and historical data of the current data.
  • the prediction unit 1901 is configured to use the historical data to predict the current data to obtain a first prediction data of the current data.
  • the obtaining unit 1902 is further configured to obtain a first difference between the current data and the first prediction data of the current data.
  • the storage unit 1903 is configured to store preset data when an absolute value of the first difference is less than or equal to a preset threshold.
  • the storage space occupied by the preset data is smaller than the storage space occupied by the current data.
  • the prediction unit 1901 may be used to execute S901.
  • the obtaining unit 1902 may be used to execute S901 and S902.
  • the storage unit 1903 may be used to execute S904.
  • the storage unit 1903 is further configured to store a correspondence between the information used to recover current data and parameters of an AI neural algorithm used to perform prediction.
  • the storage device 190 further includes an update unit 1904 for updating parameters of the AI neural algorithm through adaptive learning; and updating information for restoring current data according to the parameters of the updated AI neural algorithm.
  • the updating unit 1904 is specifically configured to: read information used to restore the current data; restore the current data according to the parameters of the AI neural algorithm used to perform the prediction, the information used to restore the current data, and the historical data of the current data.
  • Data according to the parameters of the updated AI neural algorithm and the historical data of the current data, the current data is predicted to obtain the second predicted data; the second predicted data is based on the change law of the historical data and the updated AI neural algorithm
  • the data after the current data is predicted by the parameters of the parameter; obtain the second difference between the current data and the second predicted data; when the storage space occupied by the second difference is less than the storage space occupied by the current data,
  • the information for restoring the current data is updated to a second difference or a value obtained by compressing the second difference.
  • the storage device 150 includes an AI calculation card, and the prediction unit 1501 is specifically configured to use the historical calculation data to predict the current data through the AI calculation card to obtain the first prediction data.
  • the storage unit 1903 is further configured to store the current data or a value obtained by compressing the current data when the absolute value of the first difference is greater than a preset threshold.
  • the storage unit 1903 may be used to execute S905.
  • the storage unit 1903 is further configured to store identification information when the absolute value of the first difference is greater than a preset threshold, where the identification information is used to indicate that the stored information used to recover the current data is the current data is compressed The value obtained; when the current data is stored, the identification information is used to indicate that the stored information used to restore the current data is the current data.
  • the storage unit 1903 may be used to execute S905A.
  • both the prediction unit 1901 and the acquisition unit 1902 may be implemented by the processor 202, and the storage unit 1903 may be implemented by the hard disk 204.
  • the prediction unit 1901 may be implemented by an AI calculation card 207.
  • the obtaining unit 1902 may be implemented by the processor 202.
  • the storage unit 1903 may be implemented by the hard disk 204.
  • the storage device 210 shown in FIG. 20 may be used to execute the data acquisition method shown in FIG. 14.
  • the storage device 210 may include a reading unit 2101, a prediction unit 2102, and a determination unit 2103.
  • the reading unit 2101 is configured to read information for recovering current data.
  • the predicted data of the current data is data obtained by predicting the current data based on a change rule of the historical data.
  • the prediction unit 2102 is configured to: when the information for restoring the current data includes preset data, use the historical data to predict the current data to obtain predicted data of the current data.
  • the determining unit 2103 is configured to use the predicted data of the current data as the current data.
  • the reading unit 2101 may be used to execute S1001.
  • the prediction unit 2102 may be configured to perform a prediction step in S1003.
  • the determining unit 2103 may be configured to perform the step of determining current data in S1003.
  • the storage device 210 further includes an obtaining unit 2104, configured to obtain the parameters of the AI neural algorithm used to predict the current data according to the correspondence between the information used to recover the current data and the parameters of the AI neural algorithm.
  • the prediction unit 2102 is specifically configured to use the historical data to predict the current data according to the obtained parameters of the AI neural algorithm to obtain the predicted data of the current data.
  • the storage device 210 includes an AI calculation card
  • the prediction unit 2102 is specifically configured to use the AI calculation card to predict the current data using the historical data to obtain prediction data of the current data.
  • the reading unit 2101, the prediction unit 2102, and the determination unit 2103 may all be implemented by the processor 202.
  • both the reading unit 2101 and the determining unit 2103 may be implemented by the processor 202, and the prediction unit 2102 may be implemented by the AI calculation card 207.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • a software program it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer executes instructions loaded and executed on a computer, the processes or functions according to the embodiments of the present invention are wholly or partially generated.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website site, computer, server, or data center via a wired (for example, Coaxial cable, optical fiber, digital subscriber line (DSL), or wireless (such as infrared, wireless, microwave, etc.) to transmit to another website site, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more servers, data centers, and the like that can be integrated with the medium.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)), or the like.
  • a magnetic medium for example, a floppy disk, a hard disk, a magnetic tape
  • an optical medium for example, a DVD
  • a semiconductor medium for example, a solid state disk (SSD)

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据存储及获取方法和装置,有助于节省存储开销。数据存储方法包括:存储设备获取当前数据和当前数据的历史数据(S100);存储设备使用该历史数据对当前数据进行预测,得到当前数据的预测数据,当前数据的预测数据是基于历史数据的变化规律对当前数据进行预测后的数据(S101);存储设备获取当前数据与当前数据的预测数据的差量(S102);当该差量所占的存储空间小于当前数据所占的存储空间时,存储用于恢复所述当前数据的信息;用于恢复所述当前数据的信息包括该差量或该差量经压缩得到的值。

Description

数据存储及获取方法和装置 技术领域
本申请实施例涉及数据处理技术领域,尤其涉及数据存储及获取方法和装置。
背景技术
随着人工智能(artificial intelligence,AI)、大数据、物联网等新兴应用的成熟,需要存储的数据在急剧增加。如果仅依靠提高存储设备的容量来存储急剧增加的数据,则会导致存储设备的采购成本和管理成本较高,且存储设备所占用的空间和耗能较大,这为企业带来了较大的成本负担。因此,需要提供有效的数据存储方案。
发明内容
本申请实施例提供了数据存储及获取方法和装置,有助于节省存储开销。另外,本申请实施例还提供了数据压缩及解压缩方法和装置,有助于节省压缩或解压缩时间。
第一方面,本申请实施例提供了一种数据存储方法,应用于存储设备,该方法可以包括:获取当前数据和当前数据的历史数据;使用该历史数据对当前数据进行预测,得到第一预测数据;第一预测数据是基于历史数据的变化规律对当前数据进行预测后的数据;获取当前数据与第一预测数据的第一差量;当第一差量所占的存储空间小于当前数据所占的存储空间时,存储用于恢复当前数据的信息;其中,用于恢复当前数据的信息包括第一差量或第一差量经压缩得到的值。
其中,差量是用于表征当前数据与当前数据的预测数据(如第一预测数据或下文中描述的第二预测数据)之间的差异的参数。例如该差量可以是差值、比值、倍数或百分比等。历史数据可以是至少两个待存储数据构成的序列中的在当前数据之前的一个或多个数据。待存储数据是指需要存储的原始数据。当前数据是至少两个待存储数据中的当前待存储数据。
本技术方案中,对于当前数据来说,实际存储的是用于恢复当前数据的信息如当前数据与当前数据的预测数据之间的差量或该差量经压缩得到的值。由于所存储的用于恢复当前数据的信息所占的存储空间小于当前数据的存储空间,因此相比现有技术中直接存储当前数据,可以节省存储开销。
在一种可能的设计中,执行预测所采用的算法包括AI神经算法。例如,AI神经算法的类型包括以下任一种:归一化最小均方自适应滤波(normalized least mean square,NLMS)类型、单层感知(single layer perceptron,SLP)类型、多层感知(multi-layer perceptron,MLP)类型、或循环神经网络(recurrent neural networks,RNN)类型等。
在一种可能的设计中,存储设备不存储当前数据的预测数据。这样可以节省存储开销。基于此,存储设备根据历史信息对当前数据进行预测,得到第一预测数据;然后,根据第一预测数据和所存储的用于恢复当前数据的信息,恢复当前数据,具体可以参考下述第二方面提供的技术方案。
在一种可能的设计中,执行压缩所采用的算法包括字典型压缩算法和/或重删算法。
在一种可能的设计中,执行压缩所采用的算法包括字典型压缩算法,字典型压缩算法的字典包括至少两个集合,每个集合包括一个或多个映射关系,每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间;每个集合对应一个命中率范围,不同集合对应的命中率范围不同。该方法还包括:获取第一差量的命中率;根据第一差量的命中率,在至少两个集合中确定目标集合;其中,第一差量的命中率用于确定第一差量所在的目标映射关系的命中率,所确定的目标映射关系的命中率属于目标集合对应的命中率范围;在目标集合的第一数据中查找第一差量,以确定与第一差量对应的第二数据;与第一差量对应的第二数据为第一差量经压缩得到的值。该技术方案中,将存储设备中包括的映射关系归为不同的集合,这样根据待压缩数据(即第一差量)的命中率可以直接锁定待压缩数据所在的集合,有助于缩小查找待压缩数据的范围,从而节省执行压缩的时间。
在一种可能的设计中,存储设备的存储介质包括缓存、内存和硬盘;执行压缩所采用的算法包括字典型压缩算法,字典型压缩算法的字典包括一个或多个映射关系,每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间;缓存中的映射关系的命中率大于或等于内存中的映射关系的命中率,内存中的映射关系的命中率大于或等于硬盘中的映射关系的命中率。该方法还包括:获取第一差量的命中率;根据第一差量的命中率,确定目标存储介质;其中,第一差量的命中率用于确定第一差量所在的目标映射关系的命中率,当所确定的目标映射关系的命中率属于缓存中的映射关系的命中率范围时,目标存储介质是缓存;当所确定的目标映射关系的命中率不属于缓存中的映射关系的命中率范围,但属于内存中的映射关系的命中率范围时,目标存储介质是内存;当所确定的目标映射关系的命中率不属于内存中的映射关系的命中率范围时,目标存储介质是硬盘;在目标存储介质的第一数据中查找第一差量,以确定第一差量对应的第二数据;与第一差量对应的第二数据为第一差量经压缩得到的值。该技术方案中,根据待压缩数据(具体是第一差量)的命中率和不同存储介质中存储的映射关系的命中率范围,可以直接锁定待压缩数据所在的读写性能最高的存储介质,缓存的读写性能高于内存的读写性能,内存的读写性能高于硬盘的读写性能。这样有助于缩小查找待压缩数据的范围,从而节省执行压缩的时间。
在一种可能的设计中,该方法还包括:当第一差量所占的存储空间大于或等于当前数据所占的存储空间时,存储当前数据或者存储当前数据经压缩得到的值。
在一种可能的设计中,该方法还包括:当第一差量所占的存储空间大于或等于当前数据所占的存储空间时,存储标识信息。该标识信息用于指示用于恢复当前数据的信息是第一差量或第一差量经压缩得到的值。可选的,该标识信息可以作为用于恢复当前数据的信息的标识信息,或者可以作为用于恢复当前数据的信息所携带的信息。该技术方案,有助于存储设备识别所存储的用于恢复当前数据的信息。
第二方面,本申请实施例提供了一种数据获取方法,应用于存储设备,该方法可以包括:读取用于恢复当前数据的信息;该用于恢复当前数据的信息包括差量或差量经压缩得到的值;该差量是当前数据与当前数据的预测数据的差量;该预测数据是基于历史数据的变化规律对当前数据进行预测后的数据;使用该历史数据对当前数据进 行预测,得到该预测数据;根据该用于恢复当前数据的信息和该预测数据确定当前数据。例如,历史数据是已获取的一个或多个数据。
在一种可能的设计中,执行解压缩所采用的算法包括字典型解压缩算法和/或重删算法中的至少一种。
在一种可能的设计中,用于恢复当前数据的信息包括差量经压缩得到的值;根据用于恢复当前数据的信息和当前数据的预测数据确定当前数据包括:对差量经压缩得到的值进行解压缩得到差量;根据差量和当前数据的预测数据确定当前数据。
在一种可能的设计中,执行解压缩所采用的算法包括字典型解压缩算法,字典型解压缩算法的字典包括至少两个集合,每个集合包括一个或多个映射关系,每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间,每个集合对应一个命中率范围,不同集合对应的命中率范围不同。对差量经压缩得到的值进行解压缩得到差量,可以包括:获取差量经压缩得到的值的命中率;根据差量经压缩得到的值的命中率,在至少两个集合中确定目标集合;差量经压缩得到的值的命中率用于确定差量经压缩得到的值所在的目标映射关系的命中率,所确定的目标映射关系的命中率属于目标集合对应的命中率范围;在目标集合的第二数据中查找差量经压缩得到的值,以确定与差量经压缩得到的值对应的第一数据;与差量经压缩得到的值对应的第一数据为差量。该技术方案中,将存储设备中包括的映射关系归为不同的集合,这样,根据待解压缩数据(具体是差量经压缩得到的值)的命中率可以直接锁定待解压缩数据所在的集合,有助于缩小查找待解压缩数据的范围,从而节省执行解压缩的时间。
在一种可能的设计中,存储设备的存储介质包括缓存、内存和硬盘;执行解压缩所采用的算法包括字典型解压缩算法,字典型解压缩算法的字典包括一个或多个映射关系,每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间;缓存中的映射关系的命中率大于或等于内存中的映射关系的命中率,内存中的映射关系的命中率大于或等于硬盘中的映射关系的命中率。该情况下,对差量经压缩得到的值进行解压缩,得到差量,可以包括:获取差量经压缩得到的值的命中率;根据差量经压缩得到的值的命中率,确定目标存储介质;差量经压缩得到的值的命中率用于确定差量经压缩得到的值所在的目标映射关系的命中率,当所确定的目标映射关系的命中率属于缓存中的映射关系的命中率范围时,目标存储介质是缓存;当所确定的目标映射关系的命中率不属于缓存中的映射关系的命中率范围,但属于内存中的映射关系的命中率范围时,目标存储介质是内存;当所确定的目标映射关系的命中率不属于内存中的映射关系的命中率范围时,目标存储介质是硬盘;在目标存储介质第二数据中查找差量经压缩得到的值,以确定与差量经压缩得到的值对应的第一数据;与差量经压缩得到的值对应的第一数据为差量。该技术方案中,根据待解压缩数据的命中率和不同存储介质中存储的映射关系的命中率,可以直接锁定待解压缩数据所在的读写性能最高的存储介质。这样有助于缩小查找待解压缩数据的范围,从而节省执行压缩的时间。
在一种可能的设计中,当存储设备所读取到的用于恢复当前数据的信息不携带标识信息时,说明用于恢复当前数据的信息包括差量或差量经压缩得到的值。关于标识 信息的相关描述可以参考上述第一方面。
可以理解的,当存储设备所读取到的用于恢复当前数据的信息携带标识信息时,说明用于恢复当前数据的信息包括当前数据或当前数据经压缩得到的值。
基于此,第二方面提供的技术方案可以替换为如下方案1或方案2:
方案1:读取用于恢复当前数据的信息,用于恢复当前数据的信息携带标识信息,且用于恢复当前数据的信息包括当前数据。
方案2:读取用于恢复当前数据的信息,用于恢复当前数据的信息携带标识信息,且用于恢复当前数据的信息包括当前数据经压缩得到的值;然后,对当前数据经压缩得到的值进行解压缩,得到当前数据。
第二方面提供的技术方案与方案1/方案2可以结合,从而构成新的技术方案。
第二方面或第二方面的替换方案与第一方面提供的技术方案及其相应的设计方案相对应,因此其具体实现方式以及有益效果均可以参考第一方面中的描述。
第三方面,本申请实施例提供了一种数据存储方法,应用于存储设备,该方法可以包括:获取当前数据和当前数据的历史数据;使用该历史数据对当前数据进行预测,得到当前数据的预测数据,该预测数据是基于历史数据的变化规律对当前数据进行预测后的数据;获取该当前数据与该预测数据的差量;当该差量的绝对值小于等于预设阈值时,存储预设数据。可选的,预设数据所占的存储空间小于当前数据所占的存储空间。本技术方案中,由于预设数据所占的存储空间小于当前数据所占的存储空间,因此,相比现有技术中直接存储当前数据的技术方案,可以节省存储开销。
可选的,预设数据是存储设备预先定义的。可选的,预设数据可以是一个标识符,该标识符用于指示当前数据的预测数据可以作为(或近似作为)当前数据。可选的,预设数据所占的存储空间小于大部分或全部待存储数据所占的存储空间。
以差量是差值为例,当该差量的绝对值为0时,本技术方案中的压缩过程具体是无损压缩过程。当该差量的绝对值不为0时,本技术方案中的压缩过程具体是有损压缩过程。通过合理设置预设阈值,有助于实现将数据的损失率限制在一定范围内;换句话说,可以基于实际需求(例如可接受的有损压缩率需求)设置预设阈值。本技术方案可以应用于允许一定数据损失的场景中,例如播放视频等场景中。
在一种可能的设计中,该方法还包括:当差量的绝对值大于预设阈值时,存储当前数据或者当前数据经压缩得到的值。执行压缩所使用的算法可以例如但不限于是字典型压缩算法和/或重删算法等。
在一种可能的设计中,该方法还包括:当差量的绝对值大于预设阈值时,存储标识信息,该标识信息用于指示所存储的用于恢复当前数据的信息是当前数据经压缩得到的值;当存储当前数据时,标识信息用于指示所存储的用于恢复当前数据的信息是当前数据。其中,可以将该标识信息,作为用于恢复当前数据的信息的标识信息,或者是用于恢复当前数据的信息所携带的信息。该技术方案,有助于存储设备识别所存储的用于恢复当前数据的信息的类型,该类型可以是“预设数据”类型,或者“当前数据或当前数据经压缩得到的值”类型,从而有助于实现数据获取流程。
基于上文第一方面或第三方面提供的任一种技术方案,以下提供几种可能的设计:
在一种可能的设计中,该方法还包括:存储用于恢复当前数据的信息和执行预测 所采用的AI神经算法的参数之间的对应关系。这样有助于正确恢复当前数据。例如,AI神经算法的参数每次更新之后,存储设备执行一次快照操作,以记录用于恢复当前数据的信息和执行预测所采用的AI神经算法的参数之间的对应关系。
在一种可能的设计中,在存储用于恢复当前数据的信息之后,该方法还包括:通过自适应学习更新AI神经算法的参数;根据更新后的AI神经算法的参数,更新用于恢复当前数据的信息。这样有助于正确恢复当前数据。
在一种可能的设计中,将上文中执行预测所采用的AI神经算法的参数标记为AI神经算法的第一参数,将AI神经算法的第一参数更新后得到的参数标记为AI神经算法的第二参数。基于此,根据更新后的AI神经算法的参数,更新用于恢复当前数据的信息,包括:读取用于恢复当前数据的信息;根据AI神经算法的第一参数(即更新前的AI神经算法的参数)、所读取的用于恢复当前数据的信息和当前数据的历史数据,恢复当前数据;根据AI神经算法的第二参数(即更新后的AI神经算法的参数)和当前数据的历史数据,对当前数据进行预测,得到第二预测数据;第二预测数据是基于历史数据的变化规律和AI神经算法的第二参数对当前数据进行预测后的数据;获取当前数据与第二预测数据的第二差量;当第二差量所占的存储空间小于当前数据所占的存储空间时,将所存储的用于恢复当前数据的信息更新为第二差量或者第二差量经压缩得到的值。
在一种可能的设计中,存储设备包括AI计算卡,上述使用历史数据对当前数据进行预测,得到第一预测数据,包括:通过AI计算卡使用历史数据对当前数据进行预测,得到第一预测数据。
在一种可能的设计中,存储设备包括内存。获取至少两个待存储数据中的当前数据和当前数据的历史数据,包括:从内存中获取至少两个待存储数据中的当前数据和当前数据的历史数据。
在一种可能的设计中,对任一待存储数据来说,在该待存储数据不作为其他待存储数据的历史数据的情况下,从内存中删除该待存储数据,以节省内存的存储开销。
第四方面,本申请实施例提供了一种数据获取方法,应用于存储设备,该方法可以包括:读取用于恢复当前数据的信息;当用于恢复当前数据的信息包括预设数据时,使用历史数据对当前数据进行预测,得到当前数据的预测数据,该预测数据是基于历史数据的变化规律对当前数据进行预测后的数据;将该预测数据作为当前数据。例如,历史数据是已获取的一个或多个数据。
在一种可能的设计中,当存储设备所读取到的用于恢复当前数据的信息不携带标识信息时,说明用于恢复当前数据的信息包括预设数据。关于标识信息的相关描述可以参考上述第三方面,此处不再赘述。
可以理解的,当存储设备所读取到的用于恢复当前数据的信息携带标识信息时,说明用于恢复当前数据的信息包括当前数据或当前数据经压缩得到的值。该情况下,第四方面可以替换为上文中描述的方案1或方案2。
第四方面提供的技术方案与方案1/方案2可以结合,从而构成新的技术方案。
第四方面或第四方面的替换方案与第三方面提供的技术方案及其相应的设计方案相对应,因此其实现方式以及有益效果均可以参考第三方面。
基于上文第二方面或第四方面提供的任一种技术方案,以下提供几种可能的设计:
在一种可能的设计中,存储设备包括内存,在使用历史数据对当前数据进行预测,得到当前数据的预测数据之前,该方法还包括:从内存中获取历史数据。
在一种可能的设计中,该方法还包括:存储设备将当前数据存储到内存中,以作为其他待获取数据的历史数据。
在一种可能的设计中,该方法还包括:当已获取数据不再作为待获取数据的历史数据时,存储设备可以从内存中删除该已获取数据,以节省内存的存储开销。
在一种可能的设计中,该方法还包括:根据该用于恢复当前数据的信息与AI神经算法的参数之间的对应关系,获取对当前数据进行预测所采用的AI神经算法的参数;所述使用该历史数据对当前数据进行预测,得到该预测数据,包括:根据获取的AI神经算法的参数,使用该历史数据对当前数据进行预测,得到该预测数据。
在一种可能的设计中,存储设备包括AI计算卡,上述使用该历史数据对当前数据进行预测,包括:通过AI计算卡使用该历史数据对当前数据进行预测。
第五方面,本申请实施例提供了一种数据压缩方法,应用于存储设备,存储设备中存储有至少两个集合,每个集合包括一个或多个映射关系,每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间,每个集合对应一个命中率范围,不同集合对应的命中率范围不同。该方法可以包括:获取待压缩数据的命中率;根据待压缩数据的命中率,在至少两个集合中确定目标集合;待压缩数据的命中率用于确定待压缩数据所在的目标映射关系的命中率,所确定的目标映射关系的命中率属于目标集合对应的命中率范围;在目标集合的第一数据中查找待压缩数据,以确定待压缩数据对应的第二数据,并将待压缩数据对应的第二数据作为待压缩数据经压缩得到的值。
第六方面,本申请实施例提供了一种数据解压缩方法,应用于存储设备,存储设备中存储有至少两个集合,每个集合包括一个或多个映射关系,每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间,每个集合对应一个命中率范围,不同集合对应的命中率范围不同。该方法可以包括:获取待解压缩数据的命中率;根据待解压缩数据的命中率,在至少两个集合中确定目标集合;待解压缩数据的命中率用于确定待解压缩数据所在的目标映射关系的命中率,所确定的目标映射关系的命中率属于目标集合对应的命中率范围;在目标集合的第二数据中查找与待解压缩数据,以确定与待解压缩数据对应的第一数据,并将与待解压缩数据对应的第一数据作为待解压缩数据经解压缩得到的值。
第七方面,本申请实施例提供了一种数据压缩方法,应用于存储设备,存储设备的存储介质包括缓存、内存和硬盘;缓存中的映射关系的命中率大于或等于内存中的映射关系的命中率,内存中的映射关系的命中率大于或等于硬盘中的映射关系的命中率;每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间。该方法包括:获取待压缩数据的命中率;根据待压缩数据的命中率,确定目标存储介质;待压缩数据的命中率用于确定待压缩数据所在的目标映射关系的命中率,当所确定的目标映射关系的命中率属于缓存中的映射关系的命中率范围时,目标存储介质是缓存;当所确定的目标映射关系的命中率 不属于缓存中的映射关系的命中率范围,但属于内存中的映射关系的命中率范围时,目标存储介质是内存;当所确定的目标映射关系的命中率不属于内存中的映射关系的命中率范围时,目标存储介质是硬盘;在目标存储介质的第一数据中查找待压缩数据,以确定待压缩数据对应的第二数据,并将待压缩数据对应的第二数据作为待压缩数据经压缩得到的值。
第八方面,本申请实施例提供了一种数据解压缩方法,应用于存储设备,存储设备的存储介质包括缓存、内存和硬盘;缓存中的映射关系的命中率大于或等于内存中的映射关系的命中率,内存中的映射关系的命中率大于或等于硬盘中的映射关系的命中率;每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间。该方法可以包括:获取待解压缩数据的命中率;根据待解压缩数据的命中率,确定目标存储介质;待解压缩数据的命中率用于确定待解压缩数据所在的目标映射关系的命中率,当所确定的目标映射关系的命中率属于缓存中的映射关系的命中率范围时,目标存储介质是缓存;当所确定的目标映射关系的命中率不属于缓存中的映射关系的命中率范围,但属于内存中的映射关系的命中率范围时,目标存储介质是内存;当所确定的目标映射关系的命中率不属于内存中的映射关系的命中率范围时,目标存储介质是硬盘;在目标存储介质的第二数据中查找与待解压缩数据对应的第一数据,并将与待解压缩数据对应的第一数据作为待解压缩数据经解压缩得到的值。
需要说明的是,第五或第七方面提供的数据压缩方法所能达到的有益效果可以参考第一方面的描述。第六或第八方面提供的数据解压缩方法所能达到的有益效果可以参考第二方面的描述。作为一个示例,第五和第七方面中描述的映射关系可以是字典型压缩算法的字典中包含的映射关系。
第九方面,本申请实施例提供了一种存储设备,该存储设备可以用于执行上述第一至第八方面提供的任一种方法。
在一种可能的设计中,可以根据上述第一方面至第八方面提供的任一种方法对该存储设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。
在另一种可能的设计中,该存储设备包括存储器和处理器,该存储器用于存储程序代码,处理器用于调用该程序代码,以执行第一方面至第八方面提供的任一方法。
应注意,本申请描述的存储器和处理器可以集成在一块芯片上,也可以分别设置在不同的芯片上,本申请对存储器的类型以及存储器与处理器的设置方式不做限定。
本申请实施例还提供了一种计算机可读存储介质,包括程序代码,该程序代码包括用于执行第一方面至第八方面提供的任一方法的部分或全部步骤的指令。
本申请实施例还提供了一种计算机可读存储介质,其上储存有计算机程序,当该计算机程序在计算机上运行时,使得计算机执行上述第一方面至第八方面提供的任一种可能的方法。
本申请实施例还提供了一种计算机程序产品,当其在计算机上运行时,使得第一方面至第八方面提供的任一方法被执行。
可以理解地,上述提供的任一种存储设备、计算机可读存储介质或计算机程序产 品等均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考对应的方法中的有益效果,此处不再赘述。
附图说明
图1为可适用于本申请一实施例的系统架构的示意图;
图2为可适用于本申请一实施例的存储设备的硬件结构示意图;
图3为可适用于本申请另一实施例的存储设备的硬件结构示意图;
图4为可适用于本申请一实施例的AI神经算法的示意图;
图5为本申请实施例提供的一种数据存储方法的示意图一;
图5A为本申请实施例提供的一种待存储数据与实际存储的信息的示意图;
图6为本申请实施例提供的一种数据存储方法的示意图二;
图6A为本申请实施例提供的某一时刻内存和硬盘中存储的信息的示意图;
图7为本申请实施例提供的一种数据获取方法的示意图一;
图7A为本申请实施例提供的一种待获取数据与实际存储的信息的示意图;
图8为本申请实施例提供的一种数据获取方法的示意图二;
图9为本申请实施例提供的一种数据压缩方法的示意图一;
图10为本申请实施例提供的一种数据解压缩方法的示意图一;
图11为本申请实施例提供的一种数据压缩方法的示意图二;
图12为本申请实施例提供的一种数据解压缩方法的示意图二;
图13为本申请实施例提供的一种数据存储方法的示意图三;
图14为本申请实施例提供的一种数据获取方法的示意图三;
图15为本申请实施例提供的一种存储设备的示意图一;
图16为本申请实施例提供的一种存储设备的示意图二;
图17为本申请实施例提供的一种存储设备的示意图三;
图18为本申请实施例提供的一种存储设备的示意图四;
图19为本申请实施例提供的一种存储设备的示意图五;
图20为本申请实施例提供的一种存储设备的示意图六。
具体实施方式
如图1所示,为可适用于本申请一实施例的系统架构的示意图。图1所示的系统架构包括客户端100和存储设备200。
在写数据流程中,客户端100用于向存储设备200发送写请求,该写请求包括一个或多个待写数据,以及每个待写数据的地址信息。存储设备200接收到该写请求之后,依次将每个待写数据存储到该待写数据的地址信息所指示的存储空间中,或者将对该待写数据进行处理(如下文中的预测、求差量、压缩等中的一种或多种)得到的数据存储到该待写数据的地址信息所指示的存储空间中。
在读数据流程中,客户端100用于向存储设备发送读请求,该读请求包括一个或多个待读数据的地址信息。存储设备200接收到该读请求之后,依次从每个待读数据的地址信息所指示的存储空间中读取数据,然后将所读取到的数据反馈给客户端100,或者将对所读取到的数据进行处理(如下文中的预测、求差量、解压等中的一种或多种)得到的数据反馈给客户端100。
需要说明的是,图1所示的系统架构仅为可适用于本申请实施例的系统架构的一个示例,其不对可适用于本申请实施例的系统架构构成限定。例如,可适用于本申请实施例的系统架构可以包括一个存储设备200以及多个客户端100;或者,包括一个客户端以及多个存储设备200等。
可以理解的,客户端100是一个逻辑功能模块,该逻辑功能模块可实现的功能的示例可以参考上文。图1中是以客户端100是独立于存储设备200为例进行说明的。该情况下,在硬件实现上,客户端100可以集成在独立于存储设备200的一个设备上。另外,在一些实施例中,例如在超融合场景或存储计算一体机中,客户端100可以作为存储设备200中的一个逻辑功能模块。该情况下,在硬件实现上,客户端100可以是由存储设备200中的存储介质(如内存)和处理器(如中央处理器(central processing unit,CPU))共同实现的。具体的,该存储介质中存储有程序指令,当该程序指令被该处理器调用时,使得该处理器执行客户端100可实现的功能。当然,客户端100还可以是由存储设备200中的存储介质(如内存)、处理器以及其他硬件共同实现的,本申请实施例对此不进行限定。如果不加说明,下文中均是以本申请实施例提供的技术方案应用于图1所示的系统架构为例进行说明。
如图2所示,为可适用于本申请一实施例的存储设备200的硬件结构示意图。图2所示的存储设备包括:接口卡201、处理器202、主存储器(例如内存memory)203、辅助存储器204(例如硬盘等)、协议转换模块205和总线206。这些器件之间的连接关系可以参见图2。本申请实施例中,硬盘包括但不限于硬盘驱动器(hard disk drive,HDD)或固态硬盘(solid-state disk,SSD)等存储介质。需要说明的是,下文中均是以主存储器203是内存(标记为内存203),且辅助存储器204具体是硬盘(标记为硬盘204),协议转换模块205具体是硬盘协议转换模块(标记为硬盘协议转换模块205)为例进行说明的。在此统一说明,下文不再赘述。
其中,接口卡201、处理器202、内存203、硬盘204和硬盘协议转换模块205可以通过总线206相互连接。总线206可以包括以下至少一种:外设部件互连标准(peripheral component interconnect,PCI)总线、PCIE(PCI express)总线、串行连接SCSI(serial attached SCSI,SAS)、SATA串口硬盘(serial advanced technology attachment,SATA)、扩展工业标准结构(extended industry standard architecture,EISA)总线等。SCSI是小型计算机系统接口(small computer system interface)的英文缩写。总线206可以包括地址总线、数据总线和控制总线等中的一种或多种。为便于表示,图2中使用带箭头的线表示总线206,但并不表示仅有一根总线或一种类型的总线。
接口卡201,也可以称为前端协议转换模块,用于对接收到的信息进行传输协议转换。例如,将采用光网络通信协议或以太网通信协议接收到来自客户端100的信息转换为采用PCIE协议的信息。又如,将采用PCIE协议接收到来自处理器202的信息转换为采用光网络通信协议或以太网通信协议的信息。接口卡201可以包括以下至少一种:光纤通道(fibre channel,FC)接口卡、千兆以太网(gigabit ethernet,GE)接口卡、接口总线(interface bus,IB)接口卡等。
处理器202是存储设备200的控制中心,可以用于控制存储设备200中的其他器件如内存203、硬盘204和硬盘协议转换模块205等器件工作,从而实现本申请实施 例提供的技术方案,具体示例可以参考下文。
可选的,处理器202可以包括CPU,具体可以包括一个或多个CPU。
可选的,处理器202可以包括CPU和缓存(即CPU缓存)。其中,缓存是介于CPU和内存203之间的高速存储器,主要用于提升存储设备200的读写性能。例如,缓存中存储的数据可以是内存203中存储的数据的一部分。若缓存中包括待访问数据(如待读数据或对待读数据进行处理后得到的数据等),则CPU可以从缓存中获取该待访问数据,而不用从内存203中获取该待访问数据,从而加快了数据读取速率。
内存203,一方面可以用于对来自接口卡201的信息(如写请求或读请求携带的信息)进行缓存,以便于处理器202调用内存203中缓存的信息,从而实现本申请实施例提供的技术方案;或者,用于对来自于处理器202的信息(如待读数据等)进行缓存,以便于处理器202调用内存203中缓存的信息,并发送给接口卡201,使得接口卡201依次对缓存的信息进行传输协议转换。另一方面,内存203是介于处理器202和硬盘204之间的存储器,用于提升存储设备200的读写性能。例如,内存203中存储的数据可以是硬盘203中存储的数据的一部分。若内存中包括待访问数据,则CPU可以从内存203中获取该待访问数据,而不用从硬盘204中获取该待访问数据,从而加快了数据读取速率。
硬盘204,用于存储数据。按照所支持的传输协议进行分类,硬盘204可以包括以下至少一种:SAS硬盘(或SAS级联框)、PCIE硬盘、SATA硬盘等。
硬盘协议转换模块205,也可以称为后端协议转换模块,介于处理器202与硬盘204之间,用于对接收到的信息进行传输协议转换。例如,将采用PCIE协议接收到的来自处理器202的信息转换为,采用可适用于硬盘204的协议如SAS协议或SATA协议等的信息。又如,将采用SAS协议或SATA协议等接收到的来自硬盘204的信息转换为,采用可适用于处理器202的协议如PCIE协议等的信息。以硬盘204是SAS硬盘为例,硬盘协议转换模块205具体可以是:SAS协议转换芯片,或SAS接口卡等。
在图2所示的存储设备200中,处理器202可以用于执行下文中描述的预测、求差量、压缩解压缩等步骤,具体示例可以参考下文。该情况下,可以认为处理器202通过调用程序来执行预测、求差量、压缩解压缩等步骤。
需要说明的是,图2所示的存储设备200仅为可适用于本申请实施例的存储设备的一个示例,其不对可适用于本申请实施例的存储设备构成限定。可适用于本申请实施例的存储设备还可以包括比存储设备200中更多或更少的器件。
例如,若处理器202和硬盘204所采用的协议相同,例如均是PCIE协议,则存储设备200可以不包括硬盘协议转换模块205。
再如,如图3所示,在图2所示的存储设备200的基础上,存储设备200还可以包括AI计算卡207,AI计算卡207用于在处理器202的控制下,实现AI计算功能,如下文中描述的执行预测和求差量等步骤,具体示例可以参考下文。AI计算卡例如可以是AI计算芯片,当然本申请实施例不限于此。该示例中,当AI计算卡用于执行预测和求差量等步骤时,处理器202可以不需要执行预测和求差量等步骤。
又如,存储设备200还可以包括压缩解压缩模块,用于在处理器202的控制下,执行压缩解压缩等步骤,具体示例可以参考下文。该示例中,处理器202可以不需要 执行压缩解压缩等步骤。这里描述的压缩解压缩模块可以是一个硬件如芯片等。
可以理解的,在不冲突的情况下,上述任意两个或两个以上示例可以结合使用,从而构成存储设备200的新的硬件架构。例如,存储设备200可以既包括AI计算卡207又包括压缩解压缩模块。
上文中描述的存储设备200的硬件结构均是基于图1所示的系统架构为例进行说明的。在客户端100是存储设备200中的一个逻辑功能模块的实施例中,上文提供的任一种存储设备200的硬件结构可以不包括接口卡201以及接口卡201与处理器202之间的总线206。用于实现客户端100的功能的处理器与上述处理器202可以是同一个处理器,也可以是不同的处理器。
以下简单介绍本申请实施例涉及的相关技术。
1)、AI神经算法
如图4所示,为可适用于本申请一实施例的AI神经算法的示意图。在图4中,AI神经算法可以包括输入层31、隐含层32和输出层33。其中:
输入层31用于接收输入变量的取值,并将接收到的输入变量的取值直接或者经过处理后发送到隐含层32。其中,处理的作用是为了获得隐含层32能够识别的信息。输入变量是待预测数据之前的一个或多个数据。输入层31的输入变量的个数及输入变量具体是待预测数据之前的哪些数据可以根据预测精度要求进行灵活调整。例如,若待预测数据是第n个数据,标记为X(n),则输入变量可以是在待预测数据X(n)之前的n-1个数据(标记为X(1)、X(2)……X(n-1))中任意一个或多个数据。n≥1,n是整数。
隐含层32用于根据接收到来自输入层31的输入变量的取值对待预测数据进行预测,并将预测结果发送到输出层33。隐含层32由y层神经网络构成,y≥1,y是整数。y的取值可以根据预测精度要求进行调整。每层神经网络包括一个或多个神经元,不同层神经网络包括的神经元的个数可以相同,也可以不相同。第1层神经网络包括的神经元可以表示为S 11、S 12、S 13……,第2层神经网络包括的神经元可以表示为S 21、S 22、S 23……,第y层神经网络包括的神经元可以表示为S y1、S y2、S y3……。隐含层包括的任意两个神经元之间可以有连线,也可以没有连线。每个连线具有一个权重,第i个连线的权重可以表示为wi。i≥1,i是整数。作为一个示例,y、wi和每层神经网络包括的神经元个数等参数可以在存储设备启动时进行初始化赋值。初始化所赋的值可以是通过离线机器在一定预测精度要求下,对被存储数据(如大量的被存储数据)进行训练、验证所得。另外在数据存储过程中可以根据实际业务需求,选择性开启在线学习从而调整y、wi以及每层神经网络包括的神经元个数等中的一个或多个参数的取值,进而提升预测精度。
输出层33用于将隐含层32的预测结果直接或经过处理后输出。其中,处理的作用是为了获得接收该预测结果的器件/模块能够识别的信息。预测结果包括对待预测数据进行预测得到的预测数据。
在本申请实施例中,AI神经算法的类型可以包括以下任一种:NLMS类型、SLP类型、MLP类型或RNN类型等。其中,RNN类型的AI神经算法可以包括google的快速准确的图像超分辨率(rapid and accurate image super-resolution,RAISR)算法,或者,智能驾驶中的物体运动轨迹预测技术与算法如百度的Apollo(阿波罗)智能驾驶算法等。以下通过简单介绍这两种算法,来说明AI神经算法的应用示例。
google的RAISR算法可以描述为:通过机器学习图片,获得图片变化的内在规律,其中,该内在规律可以通过该算法中的参数(如上述y、wi以及每层神经网络包括的神经元个数等中的一项或多项)的取值来表征;然后,通过所获得的参数的取值和图片中已知的像素值,预测图片中每个丢失像素在原始高分辨率图片中的像素值,从而将低分辨率图片恢复为高分辨图片。对计算机而言就是计算机内部的一组二进制数据中有缺失部分,而google的RAISR算法的作用是通过机器学习预测该缺失部分。
百度的Apollo智能驾驶算法可以描述为:通过机器学习物体的运动参数,获得物体运动参数变化的内在规律,其中,该内在规律可以通过该算法中的参数(如上述y、wi以及每层神经网络包括的神经元个数等中的一项或多项)的取值来表征;然后,通过所获得的参数的取值和物体当前和/或历史的运动参数,预测物体将来的运动参数。对计算机而言就是预测一组已知二进制数据在将来的二进制数据里面出现位置变化或具体值的变化。
由此类推,本申请实施例所采用的AI神经算法可以描述为:通过机器学习被存储数据,获得被存储数据变化的内在规律,其中,该内在规律可以通过该算法中的参数(如上述y、wi以及每层神经网络包括的神经元个数等中的一项或多项)的取值来表征;然后,通过所获得的参数的取值和已知的被存储数据,预测未知的被存储数据。对计算机而言就是使用一组已知二进制数据预测将来出现的二进制数据的值。
2)、字典型压缩技术(或算法)、字典型解压缩技术(或算法)
字典型压缩技术是目前业界公认的高效存储技术,其基本原理为:在存储设备中预存一个字典,该字典包括至少两个映射关系,每个映射关系是指一个第一数据与一个第二数据之间的映射关系,其中,每个映射关系的第一数据所占的存储空间大于该映射关系的第二数据所占的存储空间。换句话说,每个映射关系是一个复杂符号(或复杂数据)与一个简单符号(或简单数据)之间的映射关系。一般地,字典中的任意两个第一数据不同,且任意两个第二数据不同。当有待压缩数据(如待写数据)需要压缩时,存储设备可以将待压缩数据与字典中的第一数据进行对比,如果字典中的第一数据中包括待压缩数据,则存储待压缩数据对应的第二数据;如果字典中的第一数据中不包括待压缩数据,则存储待压缩数据本身。
例如,假设字典中存储的第一数据和第二数据之间的对应关系如表1所示:
表1
第一数据 第二数据
Chinese 00
people 01
China 02
并且,假设需要对如下待压缩数据进行压缩:I am a Chinese people,I am from China,那么,基于表1所示的字典执行字典型压缩之后,存储设备中存储的信息(即用于恢复该待压缩数据的信息)可以是:I am a 00 01,I am from 02。
字典型解压缩技术的基本原理为:存储设备将待解压缩数据(如从存储空间中读取的数据)与字典中的第二数据进行对比,如果字典中的第二数据中包括待解压缩数据,则将待解压缩数据对应的第一数据作为解压后的数据;如果字典中的第二数据中 不包括待解压缩数据,则将待解压缩数据本身作为解压后的数据。
另外,本申请中的术语“多个”是指两个或两个以上。本申请中的术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。当字符“/”用在公式中时,一般表示前后关联对象是一种“相除”的关系。例如,公式A/B表示A除以B。本申请中的术语“第一”、“第二”等是为了区分不同的对象,并不限定该不同对象的顺序。
以下,结合附图说明本申请实施例提供的存储及获取数据的方法。
如图5所示,为本申请实施例提供的一种数据存储方法的示意图。图5所示的方法可以包括如下步骤:
S100:存储设备获取当前数据和当前数据的历史数据。
例如,存储设备获取至少两个待存储数据中的当前数据(即当前待存储数据)和当前数据的历史数据(即历史待存储数据);该历史数据是至少两个待存储数据构成的序列中的在当前数据之前的一个或多个数据。
S101:存储设备使用该历史数据对当前数据进行预测,得到当前数据的预测数据。当前数据的预测数据是基于该历史数据的变化规律对当前数据进行预测后的数据。
其中,历史数据的变化规律具体是历史数据的内容或值的变化规律。
例如,假设序列中的各数据(即待存储数据)依次为:X(1)、X(2)、X(3)……X(n)……X(N),其中,1≤n≤N,且N≥2,n和N均是整数。那么,当当前数据是X(n)时,历史数据可以是在X(n)之前的任意一个或多个数据。可选的,历史数据是从X(n-1)开始的且在X(n-1)之前的连续的预设数量个数据。例如,假设预设数量是10,那么,当n=50时,历史数据可以是数据X(40)~X(49);当n=51时,历史数据可以是数据X(41)~X(50)。当然,历史数据是从X(n-1)开始的且在X(n-1)之前的不连续的多个数据。
当前数据的历史数据具体是当前数据之前的哪个或哪些数据可以是与执行S101中的预测所采用的算法相关。本申请实施例对执行预测所采用的算法不进行限定。例如,该算法可以包括AI神经算法。该情况下,S101中所采用的历史数据具体是当前数据之前的哪个或哪些数据可以是依据图4所示的AI神经算法的输入变量确定的。例如,若图4所示的AI神经算法的输入变量是从X(n-1)开始的且在X(n-1)之前的连续的10个数据,则当当前数据是X(n)且n=50时,历史数据是X(40)~X(49)。又如,若图4所示的AI神经算法的输入变量是X(n-2)、X(n-4)、X(n-6)、X(n-8)和X(n-10),则当当前数据是X(n)且n=50时,历史数据是X(48)、X(46)、X(44)、X(42)和X(40)。
关于AI神经算法的具体实现可以参考上文,此处不再赘述。需要说明的是,执行S101之前,存储设备已经获得AI神经算法的各参数(上述如y、wi和每层神经网络包括的神经元的个数等)的取值。AI神经算法的各参数的取值可以是通过离线和/或在线训练被存储数据得到的。执行S101时,存储设备可以根据已获得的AI神经算法的各参数的取值以及历史数据对当前数据进行预测,以得到当前数据的预测数据。
作为一个示例,存储设备可以在接收到客户端发送的一个或多个写请求之后,根据该一个或多个写请求携带的待写数据得到至少两个待存储数据,其中,该一个或多个写请求所携带的待写数据是针对同一个或同一类主体的数据,例如,该主体可以是 同一篇文章,或者同一个图片,或者同一类型的多个图片等。然后,对该至少两个待存储数据进行排序,得到该至少两个待存储数据构成的序列,接着,依次将该序列中的部分或全部待存储数据中的每个待存储数据作为当前数据,执行S100~S105。
其中,根据该一个或多个写请求携带的待写数据得到至少两个待存储数据可以包括:将该一个或多个写请求携带的每个待写数据作为一个待存储数据,或者将一个或多个写请求携带的待写数据进行重新组合和/或分割成至少两个待存储数据。也就是说,存储设备接收到的待写数据的粒度与存储设备处理(包括预测、求差量、存储等中的一项或多项)的粒度可以相同,也可以不同。例如,若该一个或多个写请求包括的每个待写数据是8比特,则当每个待存储数据是8比特时,每个待写数据是一个待存储数据;当每个待存储数据是16比特时,每个待存储数据可以是由2个待写数据组合得到;当每个待存储数据是4比特时,每2个待存储数据可以是由一个待写数据分割得到。为了便于描述,下文中均是以每个待写数据作为一个待存储数据为例进行说明。
本申请实施例对本示例中根据何种方式对至少两个待存储数据进行排序不进行限定。通常,本示例中执行排序所依据的排序规则与执行预测所采用的预测算法如AI神经算法相关。例如,本示例中执行排序所依据的排序规则,与存储设备获得AI神经算法的各参数(上述y、wi和每层神经网络包括的神经元的个数等)的取值的过程中,参与训练时被存储数据所依据的排序规则相同。例如,假设参与训练时被存储数据是针对同一篇文章的,则该排序规则可以是该文章中的各字符在该文章中的顺序或该顺序的倒序。又如,假设参与训练的被存储数据是针对同一个图片的,则该排序规则可以是该图片中的各像素点逐行或逐列进行排序的规则,或者将该图片分割为多个部分,并将相似部分进行组合后得到的新图片中的各像素点逐行或逐列进行排序的规则。
作为一个示例,存储设备可以在接收到客户端发送的一个或多个写请求之后,将该写请求携带的待写数据得到至少两个待存储数据,并将该至少两个待写数据的先后顺序构成的序列作为至少两个待存储数据构成的序列;然后,依次将该序列中的部分或全部待存储数据中的每个待存储数据作为当前数据,执行S101~S105。本示例中,存储设备可以不执行对待写数据的排序步骤。本示例的一种应用场景可以为:存储设备获得AI神经算法的各参数(如上述y、wi和每层神经网络包括的神经元的个数等)的取值的过程中,参与训练时被存储数据的顺序,为存储设备接收到客户端发送的被存储数据的顺序。本示例中相关参数的解释及实现方式可以参考上文。
需要说明的是,关于预测步骤,本申请实施例支持以下技术方案:
方案一:对于某个或某些待存储数据来说,预测步骤可以是缺省的。例如,假设历史数据是从X(n-1)开始的且在X(n-1)之前的连续的10个数据,则对于第1~10个待存储数据来说,预测步骤可以是缺省的。
基于方案一,存储设备可以按照现有技术中提供的技术方案对该存储数据进行存储,如直接存储,或者按照字典型压缩算法和/或重删算法等算法进行压缩理后进行存储。可以理解的,该情况下,S102~S104也可以是缺省的。
方案二:存储设备对不同待存储数据进行预测时,所采用的预测算法的各参数可以相同,也可以不相同。例如,对于第5~10个待存储数据来说,AI神经算法的输入变量可以是从X(n-1)开始的且在X(n-1)之前的连续的5个数据,也就是说,输入变量 的个数是5;对于第10个及之后的待存储数据来说,AI神经算法的输入变量可以是从X(n-1)开始的且在X(n-1)之前的连续的10个数据,也就是说,输入变量的个数是6。
S102:存储设备获取当前数据与当前数据的预测数据的差量。
该差量是用于表征当前数据与当前数据的预测数据之间的差异的参数。例如,该差量可以是差值、比值、倍数或百分比等,当然本申请实施例不限于此。
例如,若差量是差值,则该差值可以是当前数据减去当前数据的预测数据得到的差值,或当前数据的预测数据减去当前数据得到的差值。具体是哪种差值可以是预定义的,本申请实施例不限于此。可以理解的,由于当前数据的预测数据可能大于、等于或小于当前数据,因此,该差值可以是大于、等于或小于0的值。差量是比值、倍数或百分比等时,差量的具体实现方式以及取值的原理与此类似,此处不再一一列举。
S103:存储设备判断该差量所占用的存储空间是否小于当前数据所占用的存储空间。若是,则执行S104;若否,则执行S105。
S103可以通过以下方式之一实现:
方式1:存储设备判断该差量的比特数是否小于当前数据的比特数。
方式2:存储设备对该差量和当前数据分别进行压缩(如采用字典型压缩算法或重删算法进行压缩),并判断该差量经压缩得到的值的比特数是否小于当前数据经压缩得到的值的比特数。
基于方式1和方式2任一种,若判断结果为“是”,说明差量所占用的存储空间小于当前数据所占用的存储空间;若判断结果为“否”,说明差量所占用的存储空间大于或等于当前数据所占用的存储空间。
S104:存储设备存储该差量或者存储该差量经压缩得到的值。其中,具体存储差量经压缩得到的值还是存储差量可以是预定义的,当然本申请实施例不限于此。
本申请实施例对执行压缩所采用的压缩算法不进行限定,例如可以包括字典型压缩算法和重删算法中的至少一种。具体采用使用哪个或哪些算法可以是预定义的,当然本申请实施例不限于此。
执行S104之后,针对当前数据的存储过程结束。
S105:存储设备存储当前数据或者存储当前数据经压缩得到的值。具体存储当前数据经压缩得到的值还是存储当前数据可以是预定义的,当然本申请实施例不限于此。
作为一个示例,执行S105时,存储设备采用的压缩算法与S104中的压缩所使用采用的压缩算法是一致的,当然本申请实施例不限于此。
执行S105之后,针对当前数据的存储过程结束。
为了便于存储设备区分所存储的用于恢复当前数据的信息是“经压缩得到的值或者差量”还是“经压缩得到的值或者当前数据”,从而在读数据流程中,确定待读数据,可选的,若执行S105,则该方法还可以包括以下S105A:
S105A:存储设备存储第一标识信息,第一标识信息用于指示所存储的用于恢复当前数据的信息是S105中所存储的信息(即当前数据或者当前数据经压缩得到的值)。第一标识信息可以作为用于恢复当前数据的信息的标识信息,或者是用于恢复当前数据的信息携带的信息。
可以理解的,S105A可以被替换为在执行S104之后,执行以下S104A;或者,可 以在执行S105A的情况下,在执行S104之后,还执行以下S104A:
S104A:存储设备存储第二标识信息,第二标识信息用于指示所存储的用于恢复当前数据的信息是S104中所存储的信息(即差量或者差量经压缩得到的值)。
由于通过调整预测算法的参数的取值,可以实现待存储数据的预测数据逼近待存储数据,从而有助于实现大部分待存储数据的预测数据与待存储数据之间的差量所占的存储空间小于该待存储数据所占的存储空间;因此,具体实现时,执行S105A且不执行S104A(即存储第一标识信息且不存储第二标识信息)可以在实现使存储设备区分所存储的用于恢复当前数据的信息是“差量经压缩得到的值或者差量”还是“当前数据经压缩得到的值或者当前数据”的同时,节省存储开销。下文中图6~图8所示的实施例均是基于数据存储方法中执行S105A且不执行S104A为例进行说明的。
本申请实施例提供的数据存储方法中,使用历史数据对当前数据进行预测,并且在当前数据与当前数据的预测数据的差量所占的存储空间小于当前数据所占的存储空间时,存储差量或者存储差量经压缩得到的值。由于差量所占的存储空间小于当前数据所占的存储空间,因此,预测和求差量的过程可以被认为是数据压缩过程,这样,相比现有技术,无论直接存储差量还是存储差量经压缩得到的值,均可以节省存储开销。另外,通过使用合适的预测算法或调整预测算法中的参数,有助于实现当前数据的预测数据逼近当前数据,从而使得差量所占的存储空间远小于当前数据所占的存储空间,从而更有效地节省存储开销。此外,存储差量经压缩得到的值的技术方案可以进一步节省存储开销。
图5所示的存储差量经压缩得到的值的技术方案,可以理解为:在采用传统压缩算法对数据处理之前,引入预测算法,其中,预测算法是基于数据内容规律、发展趋势、内在关系等构建的;使用预测算法和已经输入存储设备的数据对将要输入存储设备的数据的内容进行预测(或者对已经输入存储设备的多个数据构成的序列中的在前数据对在后数据进行预测),然后,对预测准确内容或相似内容,只调用传统数据压缩算法压缩真实值与预测值之间的差量,而不存储预测准确内容或相似内容。从而达成增加压缩率、主动减少传统压缩算法输入值的波动范围目标,实现当前压缩算法在解压缩率、解压缩速度方面优化突破。
例如,虽然对于存储设备端来说,存储对象是二进制的序列,但是将序列恢复成可理解的语义时会发现其内部蕴含着一些变化规律。如将二进制序列Xn={10,101,1010,10001,11010,100101,110010,1000001,1010010,1100101,1111010,10010001,10101011}转换为十进制后得到Xn'={2,5,10,17,26,37,50,65,82,101,122,145,171};经分析可以发现Xn'中的前12个数据满足如下变化规律:x 2+1,且x=1~12。
基于此,对于Xn'来说,存储设备根据预测算法x 2+1以及图5所示的实施例,实际存储的序列可以为{10,101,1010,0,0,0,0,0,0,0,0,0,01}。其中,对于Xn'中的前3个数据来说,预测步骤是缺省的。结合S105A可知,对于这3个数据来说,存储设备还可以分别存储第一标识信息。所存储的“01”为待存储数据171与该待存储数据的预测数据170之间的差值。由此可见,需要压缩存储的数据的范围显著降低,其数据的重复概率明显增加。因此可以显著提升数据压缩比及压缩效率。
需要说明的是,对于多个待存储数据来说,由于每个待存储数据均可以按照图5 所示的数据存储方法进行存储,而每个待存储数据与该待存储数据的预测数据相比可能出现以下情形之一:完全相同,部分相同,完全不同;对于完全相同或部分相同的情形,可以节省存储空间;对于完全不同的情形,与现有技术中所使用对应的方法的效果等同。因此,总体来说,可以节省存储空间。并且,对于存储经压缩得到的值的技术方案,可以显著提升数据压缩比及压缩效率。
如图5A所示,为本申请实施例提供的一种待存储数据与实际存储的信息(即用于恢复待存储数据的信息)的示意图。图5A中是以历史数据是当前数据之前的,且从当前数据的前一个待存储数据开始的连续5个待存储数据为例进行说明的,因此,对于序列的前5个待存储数据来说,所对应的实际存储的信息分别为待存储数据(或待存储数据经压缩得到的值)以及第一标识信息。每个阴影小方格表示一个待存储数据对应的实际存储的信息,且对应关系如虚线箭头所示。“A”表示第一标识信息。
结合图2所示的存储设备,图5所示的数据存储方法的一个示例可以如图6所示。图6所示的方法可以包括如下步骤:
S201:存储设备通过接口卡接收客户端发送的写请求,该写请求中包括至少两个待写数据和至少两个待写数据中的每个待写数据的地址信息。
S202:接口卡将对该至少两个待写数据和每个待写数据的地址信息进行传输协议转换,例如将采用以太网通信协议的这些信息转换为采用PCIE协议的信息。
S203:接口卡将传输协议转换后得到的至少两个待写数据和每个待写数据的地址信息发送给处理器。
S204:处理器将接收到的来自接口卡的至少两个待写数据中的每个待写数据作为一个待存储数据,并对所得到的至少两个待存储数据进行排序。
S205:处理器将排序后得到的序列和每个待存储数据(即每个待写数据)的地址信息存储到内存中。后续,处理器可以依次将该序列中的部分或全部待存储数据中的每个待存储数据作为当前数据,执行以下S206~S219。可以理解的,对于该序列中的任意两个待存储数据来说,在该序列中所处的位置靠前的待存储数据可以作为位置靠后的待存储数据的历史数据。
可选的,S204和S205可以替换为:处理器将接收到的来自接口卡的至少两个待写数据作为一个待存储数据,并将所得到的待存储数据和待存储数据的地址信息写入内存中。然后,处理器可以对写入内存的至少两个待存储数据进行排序;或者,处理器可以依次将接收到接口卡发送的至少两个待写数据的先后顺序作为至少两个待存储数据的先后顺序,从而构成一个序列,并依次将该序列中的部分或全部待存储数据中的每个待存储数据作为当前数据,执行以下S206~S219。
可以理解的,由于序列中所处的位置靠前的待存储数据可以作为位置靠后的待存储数据的历史数据,然而,根据上文中的描述可知,并非在一个待存储数据之前的所有待存储数据均会被用作该待存储数据的历史数据。基于此,可选的,对任意一个待存储数据来说,处理器可以在该待存储数据不再作为其他待存储数据的历史数据的情况下,从内存中删除该待存储数据,以节省内存的存储开销。
如图6A所示,为某一时刻内存和硬盘中存储的信息的示意图。图6A是基于图5A进行绘制的,因此,图6A中的各种图形或箭头等的解释可以参见图5A。由图6A 可知,某一时刻,内存中的待存储数据构成的序列可以仅包含当前数据的历史数据、当前数据以及当前数据之后的待存储数据,这样可以节省内存的存储开销。
S206:处理器从内存中获取当前数据和该当前数据的历史数据。
S207:处理器使用历史数据对当前数据进行预测,得到当前数据的预测数据。
S208:处理器获取当前数据与当前数据的预测数据的差量。
S209:处理器判断差量所占的存储空间是否小于当前数据所占的存储空间。
若是,则执行S210;若否,则执行S215。
S210:处理器对差量进行压缩。
S211:处理器将差量经压缩得到的值和从内存中获取的当前数据的地址信息发送给硬盘协议转换模块。
S212:硬盘协议转换模块对接收到的差量经压缩得到的值和当前数据的地址信息进行传输协议转换,例如由PCIE协议转换为SAS协议。
S213:硬盘协议转换模块将经传输协议转换后的差量经压缩得到的值和当前数据的地址信息发送给硬盘如SAS硬盘。
S214:硬盘在当前数据的地址信息所指示的存储空间中存储差量经压缩得到的值。执行S214之后,针对当前数据的存储过程结束。
S215:处理器对当前数据进行压缩。
S216:处理器将第一标识信息、当前数据经压缩得到的值和从内存中获取的当前数据的地址信息发送给硬盘协议转换模块。第一标识信息用于指示所存储的用于恢复当前数据的信息是当前数据经压缩得到的值。
S217:硬盘协议转换模块对接收到的第一标识信息、当前数据经压缩得到的值和当前数据的地址信息进行传输协议转换,例如由PCIE协议转换为SAS协议。
S218:硬盘协议转换模块将经传输协议转换后的第一标识信息、当前数据经压缩得到的值和当前数据的地址信息发送给硬盘(如SAS硬盘)。
S219:硬盘在当前数据的地址信息所指示的存储空间中存储第一标识信息和当前数据经压缩得到的值。执行S219之后,针对当前数据的存储过程结束。
结合图3所示的存储设备,图5所示的数据存储方法的一个示例可以是对上述图6所示的实施例进行如下几点修改得到的实施例:第一,上述S207~S209是由AI计算卡执行的。第二,在执行S206之后且执行S207之前,上述方法还包括:处理器将从内存中获取的历史数据和当前数据发送给AI计算卡。第三,在执行S209之后且执行S210之前,上述方法还包括:AI计算卡将差量发送给处理器。第四,在执行S209之后且执行S215之前,上述方法还包括:AI计算卡将当前数据发送给处理器。
如图7所示,为本申请实施例提供的一种数据获取方法的示意图。本实施例与图5所示的数据存储方法相对应,因此,本实施例中相关内容的解释可以参考图5所示的实施例。图7所示的方法可以包括如下步骤:
S301:存储设备读取用于恢复当前数据(即当前待获取数据)的信息。用于恢复当前数据的信息包括“差量或差量经压缩得到的值”或者“当前数据或当前数据经压缩得到的值”。差量是当前数据与当前数据的预测数据的差量。当前数据的预测数据是基于历史数据的变化规律对当前数据进行预测后的数据。
其中,历史数据是已获取的一个或多个数据。
例如,存储设备可以在接收到客户端发送的一个或多个读请求之后,根据该一个或多个读请求所请求的待读数据得到至少两个待获取数据的地址信息,然后,根据该至少两个待获取数据的地址信息,读取用于恢复该至少两个待获取数据的信息。其中,该一个或多个读请求所请求的数据是针对同一个主体的数据,关于该主体的相关描述可以参考上述图5所示的实施例。其中,待读数据的粒度与待获取数据的粒度可以相同,也可以不同,例如,若一个待读数据是8比特,则一个待获取数据可以是4比特、8比特或16比特等。为了便于描述,下文中均是以每个待读数据是一个待获取数据为例进行说明。待读数据与待获取数据之间的对应关系,可以参考上文中待写数据与待存储数据之间的对应关系,此处不再赘述。存储设备可以将该至少两个待获取数据中的每个待获取数据可以作为当前数据,从而执行S301~S306。
若用于恢复当前数据的信息包括“差量或差量经压缩得到的值”,那么具体包括差量还是差量经压缩得到的值可以是预定义的,当然本申请不限于此。
若用于恢复当前数据的信息包括“当前数据或当前数据经压缩得到的值”,那么具体包括当前数据还是当前数据经压缩得到的值可以是预定义的,当然本申不限于此。
S302:存储设备判断用于恢复当前数据的信息是否携带第一标识信息。
根据上述图5所示的实施例中的描述可知:
若S302的判断结果为否,则说明用于恢复当前数据的信息包括差量或差量经压缩得到的值。基于此,当用于恢复当前数据的信息是差量经压缩得到的值时,执行S303;当用于恢复当前数据的信息是差量时,执行S304。
若S302的判断结果为是,说明用于恢复当前数据的信息包括当前数据或当前数据经压缩得到的值。基于此,当用于恢复当前数据的信息是当前数据经压缩得到的值时,执行S306;当用于恢复当前数据的信息是当前数据时,针对当前数据的获取过程结束。
S303:存储设备对差量经压缩得到的值进行解压缩,得到差量。
可以理解的,S303中执行解压缩所采用的解压缩算法与上述S104中执行压缩所采用的压缩算法相对应。例如,若S104中执行压缩所采用的是字典型压缩算法,则S303中执行解压缩所采用的是字典型解压缩算法。又如,若S104中执行压缩所采用的是重删算法,则S303中执行解压缩所采用的是重删算法。
S304:存储设备使用历史数据对当前数据进行预测,得到当前数据的预测数据。
其中,历史数据是存储设备已获取的一个或多个数据。历史数据具体是已获取的一个还是多个数据,以及具体是哪一个或多个数据与预测算法相关。其具体实现方式可以参考图5所示的实施例,此处不再赘述。
如图7A所示,为本申请实施例提供的一种待获取数据与实际存储的信息(即用于恢复待获取数据的信息)的示意图。图7A中的实际存储的信息与图5A中所示的实际存储的信息相同,因此相关图形或箭头等的解释可以图5A。
本申请实施例对S303和S304的执行顺序不进行限定。例如,可以先执行S303再执行S304,或者先执行S304再执行S303,或者同时执行S303和S304。
S305:存储设备根据差量和当前数据的预测数据,确定当前数据。
例如,若差量是当前数据减去当前数据的预测数据得到的差值,则在S305中,将 差量与当前数据的预测数据之和作为当前数据。例如,若差量是当前数据除以当前数据的预测数据得到的比值,则在S305中,将差量与当前数据的预测数据的乘积作为当前数据。其他示例不再一一列举。
执行S305之后,针对当前数据的获取过程结束。
S306:存储设备对当前数据经压缩得到的值进行解压缩,得到当前数据。
执行S306之后,针对当前数据的获取过程结束。
本实施例提供的数据获取方法与图5所示的数据存储方法相对应,因此,本实施例中的有益效果可以参考图5所示的实施例中所描述的有益效果,此处不再赘述。
结合图2所示的存储设备,图7所示的数据存储方法的一个示例可以如图8所示。图8所示的方法可以包括如下步骤:
S401:存储设备通过接口卡接收客户端发送的读请求,该读请求中包括一个或多个待读数据的地址信息。
S402:接口卡对该一个或多个待读数据的地址信息进行传输协议转换,例如将采用以太网通信协议的该一个或多个待读数据的地址信息转换为采用PCIE协议的信息。
S403:接口卡将经传输协议转换的一个或多个待读数据的地址信息发送给处理器。
S404:处理器将接收到的一个或多个待读数据的地址信息中的每个待读数据的地址信息作为一个待获取数据的地址信息。
S405:处理器将每个待获取数据的地址信息存储到内存中。
后续,处理器可以依次将部分或全部待获取数据中的每个待获取数据作为当前数据,执行S406~S415。并且,处理器在获取到每个当前数据时,可以将该当前数据存储在内存中,以便于后续将该当前数据作为其他当前数据的历史数据。
S406:处理器从硬盘的当前数据的地址信息所指示的存储空间中读取用于恢复当前数据的信息,并将所读取到的用于恢复当前数据的信息发送给硬盘协议转换模块进行传输协议转换,如将采用SAS协议的用于恢复当前数据的信息转换为采用PCIE协议的信息。
S407:处理器判断用于恢复当前数据的信息是否携带第一标识信息。
若否,说明用于恢复当前数据的信息包括差量或差量经压缩得到的值,则基于此,当用于恢复当前数据的信息是差量经压缩得到的值时,则执行S408;当用于恢复当前数据的信息是差量时,执行S409。
若是,说明用于恢复当前数据的信息包括当前数据或当前数据经压缩得到的值,则基于此,当用于恢复当前数据的信息是当前数据经压缩得到的值时,执行S412;当用于恢复当前数据的信息是当前数据时,执行S413。
S408:处理器对差量经压缩得到的值进行解压缩,得到差量。
S409:处理器从内存中获取历史数据。
S410:处理器使用历史数据对当前数据进行预测,得到当前数据的预测数据。
S411:处理器根据差量和当前数据的预测数据,确定当前数据。
执行S411之后,执行S413。
S412:处理器对当前数据经压缩得到的值进行解压缩,得到当前数据。
S413:处理器将当前数据发送给接口卡。
可选的,处理器还可以将当前数据存储到内存中,以作为其他待获取数据的历史数据。进一步可选的,当已获取数据不再作为待获取数据的历史数据时,处理器可以从内存中删除该已获的取数据,以节省内存的存储开销。例如,假设当前数据是X(n),历史数据是从X(n-1)开始的且在X(n-1)之前的连续的10个数据(即X(n-10)~X(n-1)),那么,X(n-11)及之前的数据均不再作为待获取数据的历史数据,因此,处理器可以从内存中删除该已获的取数据。
S414:接口卡对当前数据进行传输协议转换,例如将PCIE协议转换为以太网通信协议。
S415:接口卡将采用以太网通信协议的当前数据反馈给客户端。
至此,针对当前数据的获取过程结束。
结合图3所示的存储设备,图7所示的数据存储方法的一个示例可以是是对上述图8所示的实施例进行如下几点修改得到的实施例:第一,上述S410~S411是由AI计算卡执行的。第二,在执行S409之后且执行S410之前,上述方法还包括:处理器将从内存中获取历史数据发送给AI计算卡。第三,在执行S411之后且执行S413之前,上述方法还包括:AI计算卡将当前数据发送给处理器。
目前,字典型压缩(或解压缩)算法的字典中的各映射关系是按照命中率由高到低进行排列的。通常,当需要对待压缩数据进行压缩时,按照映射关系的命中率由高到低的顺序在字典的第一数据中查找待压缩数据,并将待压缩数据对应的第二数据作为待压缩数据压缩后得到的值。当需要对待解压缩数据进行解压时,按照映射关系的命中率由高到低的顺序在字典的第二数据中查找待解压缩数据,并将待解压缩数据对应的第一数据作为待解压缩数据解压缩后得到的值。这样,当待压缩/待解压缩数据所在的映射关系的命中率较低时,执行压缩/解压缩的时间较长。
为此,本申请实施例提供了设计方案一:存储设备中存储有至少两个集合,每个集合包括一个或多个映射关系,每个映射关系是指一个第一数据与一个第二数据之间的映射关系,每个映射关系的第一数据所占的存储空间大于该映射关系的第二数据所占的存储空间,每个集合对应一个命中率范围,不同集合对应的命中率范围不同,每个集合中的映射关系的命中率属于该集合对应的命中率范围。
其中,存储设备包括的集合的个数,以及每个集合对应的命中率范围可以是预定义的,也可以根据被存储数据更新。另外,映射关系也可以更新。
示例的,每个映射关系可以是字典型压缩算法的字典中的一个映射关系。该至少两个集合所包含的映射关系可以是字典中的部分或全部映射关系。例如,该至少两个集合所包含的映射关系可以是存储设备中任意一个存储介质(如缓存、内存或硬盘)所存储的映射关系。如果该存储介质是缓存或内存,则该至少两个集合所包含的映射关系可以是字典中的部分映射关系;如果该存储介质是硬盘,则该至少两个集合所包含的映射关系可以是字典中的全部映射关系。
例如,字典中存储的各集合及其对应的命中率范围可以如表2所示:
表2
集合 命中率范围
集合1 (80%,100%]
集合2 (50%,80%]
集合3 (20%,50%]
集合4 [0%,20%]
在一些实施例中,对于数据压缩来说,每个映射关系的命中率可以是该映射关系的第一数据的命中率,例如第一数据的命中率可以是根据预设时间段内该第一数据的被压缩次数除以执行被压缩总次数得到的值。对于数据解压缩来说,每个映射关系的命中率可以是该映射关系的第二数据的命中率,例如第二数据的命中率可以是根据预设时间段内该第二数据的被解压缩次数除以执行被解压缩总次数得到的值。
该实施例中,应用于数据压缩场景和解压缩场景时,同一个映射关系的命中率的获取机制不同。因此,设计方案一应用于数据压缩场景和解压缩场景中时,存储设备包括的集合可以相同,也可以不同;且同一集合对应的命中率范围可以相同,也可以不同。例如,假设存储设备包括100个映射关系,应用于数据压缩场景中,这100个映射关系中的每个映射关系可以归属于集合A1和集合A2的其中之一;应用于数据解压缩场景时,这100个映射关系中的每个映射关系可以归属于集合B1、集合B2和集合B3的其中之一。
在另一些实施例中,对于数据压缩和解压缩来说,每个映射关系的命中率可以是根据该映射关系的第一数据的命中率和该映射关系的第二数据的命中率得到。例如,假设存储设备写数据和读数据的比例是3:7,且对于某一映射关系来说,在写数据的过程中,该映射关系的第一数据的命中率是10%,在读数据的过程中,该映射关系的第二数据的命中率是50%,则该映射关系的命中率可以根据0.3*10%+0.7*50%得到。当然本申请实施例不限于此。
该实施例中,应用于数据压缩场景和解压缩场景时,同一个映射关系的命中率的获取机制不同。因此,设计方案一应用于数据压缩和解压缩场景中时,存储设备包括的集合相同,且同一集合对应的命中率解压缩场景范围相同。例如,假设存储设备包括100个映射关系,应用于数据压缩场景和解压缩场景中时,这100个映射关系中的每个映射关系可以归属于集合A1和集合A2的其中之一。
基于上述设计方案一,如图9所示,为本申请实施例提供的一种数据压缩方法的示意图。图9所示的方法可以包括如下步骤:
S501:存储设备获取待压缩数据的命中率。待压缩数据的命中率的获取方式可以参考上文中的第一数据的命中率的获取方式,当然本申请实施例不限于此。
例如,待压缩数据可以是上文中的差量或当前数据,当然本申请实施例不限于此。
S502:存储设备根据待压缩数据的命中率,在至少两个集合中确定目标集合。待压缩数据的命中率用于确定待压缩数据所在的映射关系(下文中称为目标映射关系)的命中率,所确定的目标映射关系的命中率属于目标集合对应的命中率范围。
根据上文中的描述可知,应用于数据压缩场景中时,映射关系的命中率可以是该映射关系的第一数据的命中率,或者是根据该映射关系的第一数据的命中率和第二数 据的命中率得到的等。为了方便描述,下文中数据压缩方法的实施例中均是以映射关系的命中率可以是该映射关系的第一数据的命中率为例进行说明的。
例如,若待压缩数据的命中率是75%,则待压缩数据所在的目标映射关系的命中率可以是75%,该情况下,参见表2可以得出目标集合是集合2。
S503:存储设备在目标集合的第一数据中查找待压缩数据,从而查找待压缩数据所在的映射关系,以根据该映射关系确定待压缩数据对应的第二数据,并将待压缩数据对应的第二数据作为待压缩数据经压缩得到的值。
例如,基于S502中的示例,存储设备可以直接从集合2的第一数据中查找待压缩数据,从而实现对待压缩数据的压缩。而不用如现有技术中一样按照映射关系的命中率由高到低顺序依次从第一数据中查找差量,这样可以节省执行压缩的时间。
例如,如果待压缩数据是上文中的差量,则待压缩数据对应的第二数据可以是上文中描述的差量经压缩得到的值。如果待压缩数据是上文中的当前数据,则待压缩数据对应的第二数据可以是上文中描述的当前数据经压缩得到的值。
本实施例提供的数据压缩方法中,将存储设备包括的映射关系归为不同的集合,这样,根据待压缩数据的命中率可以直接锁定待压缩数据所在的集合,与现有技术相比,缩小了查找待压缩数据的范围,因此,可以节省执行压缩的时间。
基于上述设计方案一,如图10所示,为本申请实施例提供的一种数据解压缩方法的示意图。图10所示的方法可以包括如下步骤:
S601:存储设备获取待解压缩数据的命中率。待解压缩数据的命中率的获取方式可以参考上文中的第二数据的命中率的获取方式,当然本申请实施例不限于此
例如,待解压缩数据可以是上文中描述的差量经压缩得到的值,或当前数据经压缩得到的值,当然本申请实施例不限于此。
S602:存储设备根据待解压缩数据的命中率,在至少两个集合中确定目标集合;待解压缩数据的命中率用于确定差量经压缩得到的值所在的目标映射关系的命中率,所确定的目标映射关系的命中率属于目标集合对应的命中率范围。
根据上文中的描述可知,应用于数据解压缩场景中时,映射关系的命中率可以是该映射关系的第二数据的命中率,或者是根据该映射关系的第一数据的命中率和第二数据的命中率得到的等。为了方便描述,下文中解数据压缩方法的实施例中均是以映射关系的命中率可以是该映射关系的第二数据的命中率为例进行说明的。
例如,若待解压缩数据的命中率是75%,则待解压缩数据所在的目标映射关系的命中率可以是75%,该情况下,参见表2可以得出目标集合是集合2。
S603:存储设备在目标集合的第二数据中查找待解压缩数据,从而查找待解压缩数据所在的映射关系,以根据该映射关系确定待解压缩数据对应的第一数据,并将待解压缩数据对应的第一数据作为待解压缩数据经解压缩得到的值。
例如,如果待解压缩数据是上文中的差量经压缩得到的值,则待解压缩数据对应的第一数据可以是上文中描述的差量。如果待解压缩数据是上文中的当前数据经压缩得到的值,则待解压缩数据对应的第一数据可以是上文中描述的当前数据。
本实施例提供的数据解压缩方法中,将存储设备中包括的映射关系归为不同的集合,这样,根据待解压缩数据的命中率可以直接锁定待解压缩数据所在的集合,与现 有技术相比,缩小了查找待解压缩数据的范围,因此,可以节省数据解压缩时间。
参见图2可知,存储设备的存储介质可以包括缓存、内存和硬盘。其中,缓存中存储的数据是内存中存储的一部分数据,内存中存储的数据是硬盘中存储的数据的一部分。目前,CPU读取数据的过程具体为:CPU先从缓存中查找待访问数据,查找到则直接读取;若没有查找到,则从内存中查找该待访问数据。进一步的,若查找到则直接读取,若没有查找到,则从硬盘中查找该待访问数据。其中,以存储技术是字典型压缩技术为例,缓存、内存和硬盘中存储的数据可以是字典中的映射关系。这样,当某一存储介质(如缓存或内存)中包含的映射关系的数量较多,而待压缩/待解压缩数据所在的映射关系不在该存储介质中时,执行压缩/解压缩的时间较长。
为此,本申请实施例提供了设计方案二:存储设备的存储介质包括缓存、内存和硬盘;缓存中的映射关系的命中率大于或等于内存中的映射关系的命中率,内存中的映射关系的命中率大于或等于硬盘中的映射关系的命中率;每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间。其中,每种存储介质中的映射关系的命中率所在的范围可以是预设的,也可以根据被存储数据更新。
示例的,每个映射关系可以是字典型压缩算法的字典中的一个映射关系。例如,存储设备的各存储介质及其对应的命中率范围可以如表3所示。
表3
存储介质 命中率范围
缓存 (80%,100%]
内存 (50%,100%]
硬盘 [0%,100%]
基于上述设计方案二,如图11所示,为本申请实施例提供的一种数据压缩方法的示意图。图11所示的方法可以包括如下步骤:
S701:存储设备获取待压缩数据的命中率。
S702:存储设备根据待压缩数据的命中率,确定目标存储介质。其中,待压缩数据的命中率用于确定待压缩数据所在的目标映射关系的命中率。当所确定的目标映射关系的命中率属于缓存中的映射关系的命中率范围时,目标存储介质是缓存;当所确定的目标映射关系的命中率不属于缓存中的映射关系的命中率范围,但属于内存中的映射关系的命中率范围时,目标存储介质是内存;当所确定的目标映射关系的命中率不属于内存中的映射关系的命中率范围时,目标存储介质是硬盘。
例如,若待压缩数据的命中率是90%,则待压缩数据所在的目标映射关系的命中率可以是90%,该情况下,参见表3可以得出,目标存储介质是缓存。类似的,若待压缩数据的命中率是60%,则目标存储介质可以是内存;若待压缩数据的命中率是30%,则目标存储介质可以是硬盘。
S703:存储设备在目标存储介质的第一数据中查找待压缩数据,从而查找待压缩数据所在的映射关系,以根据该映射关系确定待压缩数据对应的第二数据,并将待压缩数据对应的第二数据作为待压缩数据经压缩得到的值。
例如,如果目标存储介质中包括的映射关系如上述设计方案一所示,则S703的具 体实现过程可以参考上述S501~S503。当然,S703也可以根据现有技术中的方法实现。
本实施例中的待压缩数据以及对待压缩数据进行压缩得到的值的示例可以参考上述图9所示的实施例。
如果存储设备在目标存储介质的第一数据中没有查找到待压缩数据,那么:当存储设备中不存在目标存储介质的下一级存储介质时,将该待压缩数据本身作为待压缩数据经压缩得到的值;当存储设备中存在目标存储介质的下一级存储介质时,可以在目标存储介质的下一级存储介质中查找待压缩数据,依次类推,直至查找到待压缩数据,或者在存储介质的最后一级存储介质的第一数据中没有查找到待压缩数据为止。其中,缓存的下一级存储介质是内存,内存的下一级存储介质是硬盘。
本实施例提供的数据压缩方法中,根据待压缩数据的命中率和不同存储介质中存储的映射关系的命中率,可以直接锁定待压缩数据所在的读写性能最高的存储介质,缓存的读写性能高于内存的读写性能,内存的读写性能高于硬盘的读写性能。与现有技术相比,缩小了查找待压缩数据的范围,因此,可以节省数据压缩时间。
基于上述设计方案二,如图12所示,为本申请实施例提供的一种数据解压缩方法的示意图。图12所示的方法可以包括如下步骤:
S801:存储设备获取待解压缩数据的命中率。
S802:存储设备根据待解压缩数据的命中率,确定目标存储介质。其中,待解压缩数据的命中率用于确定待解压缩数据所在的目标映射关系的命中率。当所确定的目标映射关系的命中率属于缓存中的映射关系的命中率范围时,目标存储介质是缓存;当目标映射关系的命中率不属于缓存中的映射关系的命中率范围,但属于内存中的映射关系的命中率范围时,目标存储介质是内存;当目标映射关系的命中率不属于内存中的映射关系的命中率范围时,目标存储介质是硬盘。
例如,若待解压缩数据的命中率是90%,则待解压缩数据所在的目标映射关系的命中率可以是90%,该情况下,参见表3可以得出,目标存储介质是缓存。类似的,若待解压缩数据的命中率是60%,则目标存储介质可以是内存;若待解压缩数据的命中率是30%,则目标存储介质可以是硬盘。
S803:存储设备在目标存储介质的第二数据中查找待解压缩数据,从而查找待解压缩数据所在的映射关系,以根据该映射关系确定待解压缩数据对应的第一数据,将待解压缩数据对应的第一数据作为待解压缩数据经解压缩得到的值。
例如,如果目标存储介质中包括的映射关系如上述设计方案一所示,则S803的具体实现过程可以参考上述S601~S603。当然,S803也可以根据现有技术中的方法实现。
本实施例中的待解压缩数据以及对待解压缩数据进行解压缩得到的值的示例可以参考上述图10所示的实施例。
本实施例提供的数据解压缩方法中,根据待解压缩数据的命中率和不同存储介质中存储的映射关系的命中率,可以直接锁定待解压缩数据所在的读写性能最高的存储介质,缓存的读写性能高于内存的读写性能,内存的读写性能高于硬盘的读写性能。与现有技术相比,缩小了查找待解压缩数据的范围,因此,可以节省数据解压缩时间。
如图13所示,为本申请实施例提供的一种数据存储方法的示意图。图13所示的方法可以包括如下步骤:
S900:存储设备获取至少两个待存储数据中的当前数据(即当前待存储数据)和当前数据的历史数据(即历史待存储数据);历史数据是至少两个待存储数据构成的序列中的在当前数据之前的一个或多个数据。
S901:存储设备使用历史数据对当前数据进行预测,得到当前数据的预测数据。当前数据的预测数据是基于历史数据的变化规律对当前数据进行预测后的数据。
S902:存储设备获取当前数据与当前数据的预测数据的差量。
关于S900~S902的实现方式可以参考上文对S100~S102实现方式的描述。
S903:存储设备判断该差量的绝对值是否小于或等于预设阈值。例如,假设差量是a,则差量的绝对值可以表示为|a|。
若是,则执行S904;若否,则执行S905。
S904:存储设备存储预设数据。示例的,预设数据所占的存储空间小于当前数据所占的存储空间。
可选的,预设数据是存储设备预先定义的。可选的,预设数据可以是一个标识符,该标识符用于指示当前数据的预测数据可以作为(或近似作为)当前数据。例如,预设数据是二进制数“0”或“1”等。
可选的,预设数据所占的存储空间小于大部分或全部待存储数据所占的存储空间。
需要说明的是,执行数据存储流程时,存储设备可以不需要判断预设数据所占的存储空间与当前数据所占的存储空间之间的大小关系。而是存储设备可以在预先定义预设数据时,基于“预设数据所占的存储空间小于大部分或全部待存储数据所占的存储空间”这一原则,将预设数据设置成所占存储空间较小的标识符。这样,即使对于某个当前数据来说,不满足“预设数据所占的存储空间小于当前数据所占的存储空间”,但是,从对多个待存储数据执行数据存储流程这一整体上来看,仍然可能满足“预设数据所占的存储空间小于大部分或全部待存储数据所占的存储空间”,因此,与现有技术相比,有助于节省存储空间。
可选的,存储设备可以基于存储开销等因素预先定义预设数据。
以差量是差值为例,当该差量的绝对值为0时,本技术方案中的压缩过程具体是无损压缩过程。当该差量的绝对值不为0时,本技术方案中的压缩过程具体是有损压缩过程。通过合理设置预设阈值,有助于实现将数据的损失率限制在一定范围内;换句话说,可以基于实际需求(例如可接受的有损压缩率需求)设置预设阈值。
执行S904之后,针对当前数据的存储过程结束。
S905:存储设备存储当前数据或者存储当前数据经压缩得到的值。
关于S905的实现方式,可以参考上文对S105实现方式的描述。
执行S905之后,针对当前数据的存储过程结束。
为了便于存储设备区分所存储的用于恢复当前数据的信息是“预设数据”还是“经压缩得到的值或者存储当前数据”,从而在读数据流程中,确定待读数据,可选的,若执行S905,则该方法还可以包括以下S905A:
S905A:当差量的绝对值大于预设阈值时,存储标识信息,该标识信息用于指示所存储的用于恢复当前数据的信息是当前数据经压缩得到的值。当差量的绝对值大于预设阈值时,存储标识信息,该标识信息用于指示所存储的用于恢复当前数据的信息 是当前数据经压缩得到的值;当存储当前数据时,标识信息用于指示所存储的用于恢复当前数据的信息是当前数据。其中,可以将该标识信息,作为用于恢复当前数据的信息的标识信息,或者是用于恢复当前数据的信息所携带的信息。
关于S905A的可替换方式以及有益效果的描述均可以参考上述对S105A的可替换方式以及有益效果得到,此处不再赘述。
本申请实施例提供了的数据存储方法中,使用历史数据对当前数据进行预测,并且在当前数据与当前数据的预测数据的差量的绝对值小于或等于预设阈值时,存储预设数据。由于预设数据所占的存储空间小于当前数据所占的存储空间,因此,相比现有技术中直接存储当前数据的技术方案,可以节省存储开销。本技术方案可以应用于允许一定数据损失的场景中,例如播放视频等场景中。
例如,对于待存储数据构成的序列Xn={2,5,10,17,26,37,50,65,82,101,122,145,171}来说,假设差量具体是待存储数据与待存储数据的预测数据之差,且预设阈值为2,那么,存储设备根据预测算法x 2+1以及图13所示的实施例,实际存储的数据构成的序列可以为{10,101,1010,α,α,α,α,α,α,α,α,α,α}。其中,α为预设数据。对于Xn'中的前3个数据来说,预测步骤是缺省的。结合S905A可知,对于这3个数据来说,存储设备还可以分别存储第一标识信息。由此可见,需要压缩存储的数据规模显著降低,其数据的重复概率明显增加。因此可以显著提升数据压缩比及压缩效率。
如图14所示,为本申请实施例提供的一种数据获取方法的示意图。图14所示的方法可以包括如下步骤:
S1001:存储设备读取用于恢复当前数据的信息。用于恢复当前数据的信息包括“预设数据”或者“当前数据或当前数据经压缩得到的值”。
其中,当前数据的预测数据是基于历史数据的变化规律对当前数据进行预测后的数据;历史数据是已获取的一个或多个数据。
S1002:存储设备判断用于恢复当前数据的信息是否携带标识信息。
根据上述图13所示的实施例中的描述可知:若S1002的判断结果为否,说明用于恢复当前数据的信息包括预设数据,则执行S1003。若S1002的判断结果为是,说明用于恢复当前数据的信息包括当前数据或当前数据经压缩得到的值;当用于恢复当前数据的信息是当前数据经压缩得到的值时,执行S1004;当用于恢复当前数据的信息是当前数据时,针对当前数据的获取过程结束。
S1003:存储设备使用历史数据对当前数据进行预测,得到当前数据的预测数据,并将当前数据的预测数据作为当前数据;历史数据是已获取的一个或多个数据。
执行S1003之后,针对当前数据的获取过程结束。
S1004:存储设备对当前数据经压缩得到的值进行解压缩,得到当前数据。
执行S1004之后,针对当前数据的获取过程结束。
本实施例提供的数据获取方法与图13所示的数据存储方法相对应,因此,本实施例中的有益效果可以参考图13所示的实施例中所描述的有益效果,此处不再赘述。
需要说明的是,对于同一个数据来说,执行数据存储流程和数据获取流程的过程中,如果需要执行预测,则预测时采用同一个预测算法(即预测算法的参数的取值相 同)。当预测算法是AI神经算法时,由于AI神经算法的参数的取值可以进行更新,因此,为了在数据获取流程中成功获取该数据,在数据存储流程中,本申请实施例提供了以下可选的实现方式:
可选的实现方式1:存储设备还可以存储AI神经算法的参数的取值与用于恢复数据的信息之间的对应关系。示例的,存储设备可以在AI神经算法的参数每次更新之后,执行一次快照操作,以记录用于恢复当前数据的信息和执行预测所采用的AI神经算法的参数之间的对应关系。当然本申请实施例不限于此。
例如,假设初始时刻是t1时刻,t1时刻AI神经算法的参数是第一参数,t2时刻AI神经算法的参数由第一参数更新为第二参数;并且,从t1时刻到t2时刻这一时间段、以及t2时刻之后的时间段所存储的用于恢复当前数据的信息是信息分别是:信息1~100、信息101~500;那么,存储设备可以存储信息1~100与第一参数之间的对应关系,以及存储信息101~500与第二参数之间的对应关系。
可选的实现方式2:在存储用于恢复当前数据的信息之后,通过自适应学习更新AI神经算法的参数;根据更新后的AI神经算法的参数更新用于恢复当前数据的信息。
例如,假设将上文中执行预测(例如S101或S901中的预测等)所采用的AI神经算法的参数标记为AI神经算法的第一参数,将AI神经算法的第一参数更新后得到的参数标记为AI神经算法的第二参数;那么:根据更新后的AI神经算法的参数,更新用于恢复当前数据的信息包括:读取用于恢复当前数据的信息;根据AI神经算法的第一参数(即更新前的AI神经算法的参数)、所读取的用于恢复当前数据的信息和当前数据的历史数据,恢复当前数据;根据AI神经算法的第二参数(即更新后的AI神经算法的参数)和当前数据的历史数据,对当前数据进行预测,得到第二预测数据;第二预测数据是基于历史数据的变化规律和AI神经算法的第二参数对当前数据进行预测后的数据;获取当前数据与第二预测数据的第二差量;当第二差量所占的存储空间小于当前数据所占的存储空间时,将所存储的用于恢复当前数据的信息更新为第二差量或者第二差量经压缩得到的值(或者当前数据或者当前数据经压缩得到的值或者预设数据等,具体存储何种信息可以参考上文如图5或图13所示的方法)。也就是说,利用更新前的AI神经算法的参数执行一次数据获取流程,获取到当前数据之后,再利用更新后的AI神经算法的参数执行一次数据存储流程,这样有助于实现针对当前数据执行数据存储流程时所使用的AI神经算法的参数是最新的参数。
基于该可选的实现方式2,存储设备还可以存储AI神经算法的第二参数。或者,将所存储的AI神经算法的第一参数更新为第二参数,也就是说,存储设备中存储的是AI神经算法的最新的参数。
这两种可选的实现方式可以应用于上文如图5或图13所示的数据存储流程。对比这两种可选的实现方式,可选的实现方式1可以应用于存储设备中已存储的数据较多的场景中;可选的实现方式2可以应用于存储设备中已存储的数据较少的场景中,并且,随着AI神经算法的参数的更新,有助于使得当前数据的预测数据更接近于当前数据,因此,使用该可选的实现方式2可以进一步提高数据的压缩效率。具体实现的过程中,上述可选的实现方式1、2可以结合使用,从而构成新的技术方案。例如,针对已存储的部分数据,存储设备可以执行上述可选的实现方式1;针对已存储的另一部 分数据,存储设备可以执行上述实现方式2。
基于上述可选的实现方式1,在数据获取流程中,存储设备可以根据该用于恢复当前数据的信息与AI神经算法的参数之间的对应关系,获取对当前数据进行预测所采用的AI神经算法的参数。该情况下,使用历史数据对当前数据进行预测,得到当前数据的预测数据,可以包括:根据所获取的AI神经算法的参数,使用历史数据对当前数据进行预测,得到当前数据的预测数据。
其中,根据上述可选的实现方式1中的描述可知,该对应关系中的“AI神经算法的参数”,是指存储该对应关系中的“用于恢复当前数据的信息”的过程中所采用的AI神经算法的参数。所获取的“对当前数据进行预测所采用的AI神经算法的参数”是该映射关系中的“AI神经算法的参数”。例如基于上述可选的实现方式1中的示例,若用于恢复当前数据的信息是信息99,则该对应关系中的“AI神经算法的参数”是第一参数;假设若用于恢复当前数据的信息是信息200,则该对应关系中的“AI神经算法的参数”是第二参数。
另外,如果上述可选的实现方式1应用于图5所示的数据存储流程,则在数据获取流程中,存储设备可以按照上文图7所示的实施例获取当前数据。如果上述可选的实现方式1应用于图13所示的数据存储流程,则在数据获取流程中,存储设备可以按照上文提供的图14所示的实施例获取当前数据。
基于上述可选的实现方式2,在数据获取流程中,存储设备可以根据所存储的AI神经算法的最新的参数,执行上述图7或图14所示的实施例。
上述主要从方法的角度对本申请实施例提供的方案进行了介绍。为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对存储设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
如图15所示,为本申请实施例提供的存储设备的结构示意图。图15所示的存储设备150可以用于执行图5或图6所示的数据存储方法。存储设备150可以包括:第一获取单元1500、预测单元1501、第二获取单元1502和存储单元1503。其中,第一获取单元1500、用于获取当前数据和当前数据的历史数据。预测单元1501,用于使用该历史数据对当前数据进行预测,得到当前数据的第一预测数据。第二获取单元1502,用于获取当前数据与第一预测数据的差量第一差量。存储单元1503,用于当差量第一差量所占的存储空间小于当前数据所占的存储空间时,存储差量第一差量或者存储差量第一差量经压缩得到的值。例如,结合图5,第一获取单元1500可以用于执行S100, 预测单元1501可以用于执行S101。第二获取单元1502可以用于执行S102。存储单元1503可以用于执行S104。
可选的,第一获取单元1500具体用于从存储设备150的内存中获取当前数据和该历史数据。
可选的,存储单元1503还用于,存储所述用于恢复当前数据的信息和执行预测所采用的AI神经算法的参数之间的对应关系。
可选的,存储设备150还包括更新单元1504,用于通过自适应学习更新AI神经算法的参数;根据更新后的AI神经算法的参数,更新用于恢复当前数据的信息。
可选的,更新单元1504具体用于:读取用于恢复当前数据的信息;根据执行预测所采用的AI神经算法的参数、用于恢复当前数据的信息和当前数据的该历史数据,恢复当前数据;根据更新后的AI神经算法的参数和当前数据的该历史数据,对当前数据进行预测,得到第二预测数据;第二预测数据是基于该历史数据的变化规律和更新后的AI神经算法的参数对当前数据进行预测后的数据;获取当前数据与第二预测数据的第二差量;当第二差量所占的存储空间小于当前数据所占的存储空间时,将所存储的用于恢复当前数据的信息更新为第二差量或者第二差量经压缩得到的值。
可选的,存储设备150包括AI计算卡,预测单元1501具体用于:通过AI计算卡使用该历史数据对当前数据进行预测,得到第一预测数据。
可选的,执行压缩所采用的算法包括字典型压缩算法,字典型压缩算法的字典包括至少两个集合,每个集合包括一个或多个映射关系,每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间;每个集合对应一个命中率范围,不同集合对应的命中率范围不同;存储设备150还包括:第三获取单元1505,用于获取差量第一差量的命中率;确定单元1506,用于根据差量第一差量的命中率,在至少两个集合中确定目标集合;其中,差量第一差量的命中率用于确定差量第一差量所在的目标映射关系的命中率,所确定的目标映射关系的命中率属于目标集合对应的命中率范围;压缩单元1507,用于在目标集合的第一数据中查找差量第一差量,以确定与差量第一差量对应的第二数据;与差量第一差量对应的第二数据为差量第一差量经压缩得到的值。
可选的,存储设备150的存储介质包括缓存、内存和硬盘;执行压缩所采用的算法包括字典型压缩算法,字典型压缩算法的字典包括一个或多个映射关系,每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间;缓存中的映射关系的命中率大于或等于内存中的映射关系的命中率,内存中的映射关系的命中率大于或等于硬盘中的映射关系的命中率;存储设备150还包括:第三获取单元1505,用于获取差量第一差量的命中率;确定单元1506,用于根据差量第一差量的命中率,确定目标存储介质;差量第一差量的命中率用于确定差量第一差量所在的目标映射关系的命中率,当所确定的目标映射关系的命中率属于缓存中的映射关系的命中率范围时,目标存储介质是缓存;当所确定的目标映射关系的命中率不属于缓存中的映射关系的命中率范围,但属于内存中的映射关系的命中率范围时,目标存储介质是内存;当所确定的目标映射关系的命中率不属于内存中的映射关系的命中率范围时,目标存储介质是硬盘;压缩单元1507,用于在目标 存储介质的第一数据中查找差量第一差量,以确定差量第一差量对应的第二数据;与差量第一差量对应的第二数据为差量第一差量经压缩得到的值。
可选的,存储单元1503还用于,当第一差量所占的存储空间大于或等于当前数据所占的存储空间时,存储当前数据或者存储当前数据经压缩得到的值。例如,结合图5,存储单元1503可以用于执行S105。
可选的,存储单元1503还用于,当第一差量所占的存储空间大于或等于当前数据所占的存储空间时,存储标识信息;其中,当存储当前数据经压缩得到的值时,标识信息用于指示所存储的用于恢复当前数据的信息是当前数据经压缩得到的值;当存储当前数据时,标识信息用于指示所存储的用于恢复当前数据的信息是当前数据。例如结合图5,存储单元1503可以用于执行S105A。
例如,结合图2,第一获取单元1500、预测单元1501、第二获取单元1502、更新单元1504、第三获取单元1505、确定单元1506和压缩单元1507均可以通过处理器202实现。存储单元1503可以通过硬盘204实现。又如,结合图3,预测单元1501可以通过AI计算卡207实现。第一获取单元1500、第二获取单元1502、更新单元1504、第三获取单元1505、确定单元1506和压缩单元1507均可以通过处理器202实现。存储单元1503可以通过硬盘204实现。
如图16所示,为本申请实施例提供的存储设备160的结构示意图。图16所示的存储设备160可以用于执行图7或图8所示的数据获取方法。存储设备160可以包括读取单元1601、预测单元1602和确定单元1603。读取单元1601,用于读取用于恢复当前数据的信息;用于恢复当前数据的信息包括差量或差量经压缩得到的值;差量是当前数据与当前数据的预测数据的差量;当前数据的预测数据是基于历史数据的变化规律对当前数据进行预测后的数据。预测单元1602,用于使用历史数据对当前数据进行预测,得到当前数据的预测数据。确定单元1603,用于根据用于恢复当前数据的信息和当前数据的预测数据确定当前数据。例如,结合图7,读取单元1601可以用于执行S301。预测单元1602可以用于执行S304。确定单元1603可以用于执行S305。
可选的,存储设备160还包括获取单元1604,用于从存储设备150的内存中获取历史数据。
可选的,存储设备160还包括获取单元1604,用于根据用于恢复当前数据的信息与AI神经算法的参数之间的对应关系,获取对当前数据进行预测所采用的AI神经算法的参数。预测单元1602具体用于:根据获取的AI神经算法的参数,使用该历史数据对当前数据进行预测,得到当前数据的预测数据。
可选的,存储设备160包括AI计算卡,预测单元1602具体用于:通过AI计算卡使用该历史数据对当前数据进行预测,得到当前数据的预测数据。
可选的,用于恢复当前数据的信息包括差量经压缩得到的值。该情况下,确定单元1603包括:解压缩模块1603-1,用于对差量经压缩得到的值进行解压缩,得到差量;确定模块1603-2,用于根据差量和当前数据的预测数据,确定当前数据。例如,结合图7,解压缩模块1603-1可以用于执行S303。确定模块1603-2可以用于执行S304。
可选的,执行解压缩所采用的算法包括字典型解压缩算法,字典型解压缩算法的字典包括至少两个集合,每个集合包括一个或多个映射关系,每个映射关系是指一个 第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间,每个集合对应一个命中率范围,不同集合对应的命中率范围不同。该情况下,解压缩模块1603-1具体用于:获取差量经压缩得到的值的命中率;根据差量经压缩得到的值的命中率,在至少两个集合中确定目标集合;差量经压缩得到的值的命中率用于确定差量经压缩得到的值所在的目标映射关系的命中率,所确定的目标映射关系的命中率属于目标集合对应的命中率范围;在目标集合的第二数据中查找差量经压缩得到的值,以确定与差量经压缩得到的值对应的第一数据;与差量经压缩得到的值对应的第一数据为差量。
可选的,存储设备160的存储介质包括缓存、内存和硬盘;执行解压缩所采用的算法包括字典型解压缩算法,字典型解压缩算法的字典包括一个或多个映射关系,每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间;缓存中的映射关系的命中率大于或等于内存中的映射关系的命中率,内存中的映射关系的命中率大于或等于硬盘中的映射关系的命中率。该情况下,解压缩模块1603-1具体用于:获取差量经压缩得到的值的命中率;根据差量经压缩得到的值的命中率,确定目标存储介质;其中,差量经压缩得到的值的命中率用于确定差量经压缩得到的值所在的目标映射关系的命中率,当所确定的目标映射关系的命中率属于缓存中的映射关系的命中率范围时,目标存储介质是缓存;当所确定的目标映射关系的命中率不属于缓存中的映射关系的命中率范围,但属于内存中的映射关系的命中率范围时,目标存储介质是内存;当所确定的目标映射关系的命中率不属于内存中的映射关系的命中率范围时,目标存储介质是硬盘;在目标存储介质第二数据中查找差量经压缩得到的值,以确定与差量经压缩得到的值对应的第一数据;与差量经压缩得到的值对应的第一数据为差量。
例如,结合图2,读取单元1601、预测单元1602和确定单元1603均可以通过处理器202实现。又如,结合图3,预测单元1501可以通过AI计算卡207实现。读取单元1601和确定单元1603均可以通过处理器202实现。
如图17所示,为本申请实施例提供的存储设备170的结构示意图。图17所示的存储设备170可以用于执行图9或图11所示的数据压缩方法。存储设备170可以包括获取单元1701、确定单元1702和压缩单元1703。
在一种可能的设计中,存储设备170中存储有至少两个集合,每个集合包括一个或多个映射关系,每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间,每个集合对应一个命中率范围,不同集合对应的命中率范围不同。获取单元1701用于获取待压缩数据的命中率。确定单元1702用于根据待压缩数据的命中率,在至少两个集合中确定目标集合;待压缩数据的命中率用于确定待压缩数据所在的目标映射关系的命中率,所确定的目标映射关系的命中率属于目标集合对应的命中率范围。压缩单元1703用于在目标集合的第一数据中查找待压缩数据,以确定待压缩数据对应的第二数据,并将待压缩数据对应的第二数据作为待压缩数据经压缩得到的值。例如,结合图9,获取单元1701可以用于执行S501,和/或本申请实施例提供的其他步骤。确定单元1702可以用于执行S502,和/或本申请实施例提供的其他步骤。压缩单元1703可以用于执行S503,和/或本申请 实施例提供的其他步骤。
在另一种可能的设计中,存储设备170的存储介质包括缓存、内存和硬盘;缓存中的映射关系的命中率大于或等于内存中的映射关系的命中率,内存中的映射关系的命中率大于或等于硬盘中的映射关系的命中率;每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间。获取单元1701用于获取待压缩数据的命中率。确定单元1702,用于根据待压缩数据的命中率,确定目标存储介质;待压缩数据的命中率用于确定待压缩数据所在的目标映射关系的命中率,当所确定的目标映射关系的命中率属于缓存中的映射关系的命中率范围时,目标存储介质是缓存;当所确定的目标映射关系的命中率不属于缓存中的映射关系的命中率范围,但属于内存中的映射关系的命中率范围时,目标存储介质是内存;当所确定的目标映射关系的命中率不属于内存中的映射关系的命中率范围时,目标存储介质是硬盘。压缩单元1703,用于在目标存储介质的第一数据中查找待压缩数据,以确定待压缩数据对应的第二数据,并将待压缩数据对应的第二数据作为待压缩数据经压缩得到的值。例如,结合图11,获取单元1701可以用于执行S701和/或本申请实施例提供的其他步骤。确定单元1702可以用于执行S702和/或本申请实施例提供的其他步骤。压缩单元1703可以用于执行S703和/或本申请实施例提供的其他步骤。
例如,结合图2或图3,获取单元1701、确定单元1702和压缩单元1703均可以通过处理器202实现。
如图18所示,为本申请实施例提供的存储设备180的结构示意图。图18所示的存储设备180可以用于执行图10或图12所示的数据解压缩方法。存储设备180可以包括获取单元1801、确定单元1802和解压缩单元1803。
在一种可能的设计中,存储设备180中存储有至少两个集合,每个集合包括一个或多个映射关系,每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间,每个集合对应一个命中率范围,不同集合对应的命中率范围不同。获取单元1801用于获取待解压缩数据的命中率。确定单元1802用于根据待解压缩数据的命中率,在至少两个集合中确定目标集合;待解压缩数据的命中率用于确定待解压缩数据所在的目标映射关系的命中率,所确定的目标映射关系的命中率属于目标集合对应的命中率范围。解压缩单元1803用于在目标集合的第二数据中查找与待解压缩数据,以确定与待解压缩数据对应的第一数据,并将与待解压缩数据对应的第一数据作为待解压缩数据经解压缩得到的值。例如,结合图10,获取单元1801可以用于执行S601,和/或本申请实施例提供的其他步骤。确定单元1802可以用于执行S602,和/或本申请实施例提供的其他步骤。解压缩单元1803可以用于执行S603,和/或本申请实施例提供的其他步骤。
在一种可能的设计中,存储设备180的存储介质包括缓存、内存和硬盘;缓存中的映射关系的命中率大于或等于内存中的映射关系的命中率,内存中的映射关系的命中率大于或等于硬盘中的映射关系的命中率;每个映射关系是指一个第一数据与一个第二数据之间的映射关系,第一数据所占的存储空间大于第二数据所占的存储空间。获取单元1801用于获取待解压缩数据的命中率。确定单元1802用于根据待解压缩数据的命中率,确定目标存储介质;待解压缩数据的命中率用于确定待解压缩数据所在 的目标映射关系的命中率,当所确定的目标映射关系的命中率属于缓存中的映射关系的命中率范围时,目标存储介质是缓存;当所确定的目标映射关系的命中率不属于缓存中的映射关系的命中率范围,但属于内存中的映射关系的命中率范围时,目标存储介质是内存;当所确定的目标映射关系的命中率不属于内存中的映射关系的命中率范围时,目标存储介质是硬盘。解压缩单元1803用于在目标存储介质的第二数据中查找与待解压缩数据对应的第一数据,将与待解压缩数据对应的第一数据作为待解压缩数据经解压缩得到的值。例如,结合图12,获取单元1801可以用于执行S801和/或本申请实施例提供的其他步骤。确定单元1802可以用于执行S802和/或本申请实施例提供的其他步骤。解压缩单元1803可以用于执行S803和/或本申请实施例提供的其他步骤。
例如,结合图2或图3,获取单元1801、确定单元1802和解压缩单元1803均可以通过处理器202实现。
如图19所示,为本申请实施例提供的存储设备190的结构示意图。图19所示的存储设备190可以用于执行图13所示的数据存储方法。存储设备190可以包括预测单元1901、获取单元1902和存储单元1903。获取单元1902用于获取当前数据和当前数据的历史数据。预测单元1901用于使用该历史数据对当前数据进行预测,得到当前数据的第一预测数据。获取单元1902还用于获取当前数据与当前数据的第一预测数据的第一差量。存储单元1903,用于当该第一差量的绝对值小于等于预设阈值时,存储预设数据。可选的,预设数据所占的存储空间小于当前数据所占的存储空间。例如,结合图13,预测单元1901可以用于执行S901。获取单元1902可以用于执行S901和S902。存储单元1903可以用于执行S904。
可选的,存储单元1903还用于,存储所述用于恢复当前数据的信息和执行预测所采用的AI神经算法的参数之间的对应关系。
可选的,存储设备190还包括更新单元1904,用于通过自适应学习更新AI神经算法的参数;根据更新后的AI神经算法的参数,更新用于恢复当前数据的信息。
可选的,更新单元1904具体用于:读取用于恢复当前数据的信息;根据执行预测所采用的AI神经算法的参数、用于恢复当前数据的信息和当前数据的该历史数据,恢复当前数据;根据更新后的AI神经算法的参数和当前数据的该历史数据,对当前数据进行预测,得到第二预测数据;第二预测数据是基于该历史数据的变化规律和更新后的AI神经算法的参数对当前数据进行预测后的数据;获取当前数据与第二预测数据的第二差量;当第二差量所占的存储空间小于当前数据所占的存储空间时,将所存储的用于恢复当前数据的信息更新为第二差量或者第二差量经压缩得到的值。
可选的,存储设备150包括AI计算卡,预测单元1501具体用于:通过AI计算卡使用该历史数据对当前数据进行预测,得到第一预测数据。
可选的,存储单元1903还用于当第一差量的绝对值大于预设阈值时,存储当前数据或者当前数据经压缩得到的值。例如,结合图13,存储单元1903可以用于执行S905。
可选的,存储单元1903还用于,当第一差量的绝对值大于预设阈值时,存储标识信息,该标识信息用于指示所存储的用于恢复当前数据的信息是当前数据经压缩得到的值;当存储当前数据时,标识信息用于指示所存储的用于恢复当前数据的信息是当前数据。例如,结合图13,存储单元1903可以用于执行S905A。
例如,结合图2,预测单元1901和获取单元1902均可以通过处理器202实现,存储单元1903可以通过硬盘204实现。又如,结合图3,预测单元1901可以通过AI计算卡207实现。获取单元1902可以通过处理器202实现。存储单元1903可以通过硬盘204实现。
如图20所示,为本申请实施例提供的存储设备210的结构示意图。图20所示的存储设备210可以用于执行图14所示的数据获取方法。存储设备210可以包括:读取单元2101、预测单元2102和确定单元2103。读取单元2101用于读取用于恢复当前数据的信息,当前数据的预测数据是基于历史数据的变化规律对当前数据进行预测后的数据。预测单元2102用于当用于恢复当前数据的信息包括预设数据时,使用历史数据对当前数据进行预测,得到当前数据的预测数据。确定单元2103用于将当前数据的预测数据作为当前数据。例如,结合图14,读取单元2101可以用于执行S1001。预测单元2102可以用于执行S1003中的预测步骤。确定单元2103可以用于执行S1003中的确定当前数据的步骤。
可选的,存储设备210还包括获取单元2104,用于根据用于恢复当前数据的信息与AI神经算法的参数之间的对应关系,获取对当前数据进行预测所采用的AI神经算法的参数。预测单元2102具体用于:根据获取的AI神经算法的参数,使用该历史数据对当前数据进行预测,得到当前数据的预测数据。
可选的,存储设备210包括AI计算卡,预测单元2102具体用于:通过AI计算卡使用该历史数据对当前数据进行预测,得到当前数据的预测数据。
例如,结合图2,读取单元2101、预测单元2102和确定单元2103均可以通过处理器202实现。又如,结合图3,读取单元2101和确定单元2103均可以通过处理器202实现,预测单元2102可以通过AI计算卡207实现。
如图15~20提供的任一种存储设备的实现方式及有益效果的描述均可以参考上述对应的方法实施例,此处不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式来实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机执行指令时,全部或部分地产生按照本发明实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或者数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可以用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带),光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
尽管在此结合各实施例对本申请进行了描述,然而,在实施所要求保护的本申请过程中,本领域技术人员通过查看附图、公开内容、以及所附权利要求书,可理解并 实现公开实施例的其他变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其他单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。
尽管结合具体特征及其实施例对本申请进行了描述,显而易见的,在不脱离本申请的精神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本申请的示例性说明,且视为已覆盖本申请范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (30)

  1. 一种数据存储方法,其特征在于,应用于存储设备,所述方法包括:
    获取当前数据和所述当前数据的历史数据;
    使用所述历史数据对所述当前数据进行预测,得到第一预测数据;所述第一预测数据是基于所述历史数据的变化规律对所述当前数据进行预测后的数据;
    获取所述当前数据与所述第一预测数据的第一差量;
    当所述第一差量所占的存储空间小于所述当前数据所占的存储空间时,存储用于恢复所述当前数据的信息;其中,所述用于恢复所述当前数据的信息包括所述第一差量或所述第一差量经压缩得到的值。
  2. 根据权利要求1所述的数据存储方法,其特征在于,执行所述预测所采用的算法包括人工智能AI神经算法。
  3. 根据权利要求2所述的数据存储方法,其特征在于,所述AI神经算法的类型包括以下任一种:归一化最小均方自适应滤波NLMS类型、单层感知SLP类型、多层感知MLP类型或循环神经网络RNN类型。
  4. 根据权利要求2或3所述的数据存储方法,其特征在于,所述方法还包括:
    存储所述用于恢复所述当前数据的信息和执行所述预测所采用的所述AI神经算法的参数之间的对应关系。
  5. 根据权利要求2或3所述的数据存储方法,其特征在于,在所述存储用于恢复所述当前数据的信息之后,所述方法还包括:
    通过自适应学习更新所述AI神经算法的参数;
    根据更新后的所述AI神经算法的参数,更新所述用于恢复所述当前数据的信息。
  6. 根据权利要求5所述的数据存储方法,其特征在于,所述根据更新后的所述AI神经算法的参数,更新所述用于恢复所述当前数据的信息,包括:
    读取所述用于恢复所述当前数据的信息;
    根据执行所述预测所采用的所述AI神经算法的参数、所述用于恢复所述当前数据的信息和所述当前数据的历史数据,恢复所述当前数据;
    根据更新后的所述AI神经算法的参数和所述当前数据的历史数据,对所述当前数据进行预测,得到第二预测数据;所述第二预测数据是基于所述历史数据的变化规律和更新后的所述AI神经算法的参数对所述当前数据进行预测后的数据;
    获取所述当前数据与所述第二预测数据的第二差量;
    当所述第二差量所占的存储空间小于所述当前数据所占的存储空间时,将所存储的所述用于恢复所述当前数据的信息更新为所述第二差量或者所述第二差量经压缩得到的值。
  7. 根据权利要求2至6任一项所述的数据存储方法,其特征在于,所述存储设备包括AI计算卡,所述使用所述历史数据对所述当前数据进行预测,得到第一预测数据,包括:
    通过所述AI计算卡使用所述历史数据对所述当前数据进行预测,得到所述第一预测数据。
  8. 根据权利要求1至7任一项所述的数据存储方法,其特征在于,执行所述压缩 所采用的算法包括字典型压缩算法,所述字典型压缩算法的字典包括至少两个集合,每个集合包括一个或多个映射关系,每个映射关系是指一个第一数据与一个第二数据之间的映射关系,所述第一数据所占的存储空间大于所述第二数据所占的存储空间;每个集合对应一个命中率范围,不同集合对应的命中率范围不同;所述方法还包括:
    获取所述第一差量的命中率;
    根据所述第一差量的命中率,在所述至少两个集合中确定目标集合;所述第一差量的命中率用于确定所述第一差量所在的目标映射关系的命中率,所述确定的目标映射关系的命中率属于所述目标集合对应的命中率范围;
    在所述目标集合的第一数据中查找所述第一差量,以确定与所述第一差量对应的第二数据;与所述第一差量对应的第二数据为所述第一差量经压缩得到的值。
  9. 根据权利要求1至7任一项所述的数据存储方法,其特征在于,所述存储设备的存储介质包括缓存、内存和硬盘;执行所述压缩所采用的算法包括字典型压缩算法,所述字典型压缩算法的字典包括一个或多个映射关系,每个所述映射关系是指一个第一数据与一个第二数据之间的映射关系,所述第一数据所占的存储空间大于所述第二数据所占的存储空间;所述缓存中的映射关系的命中率大于或等于所述内存中的映射关系的命中率,所述内存中的映射关系的命中率大于或等于所述硬盘中的映射关系的命中率;所述方法还包括:
    获取所述第一差量的命中率;
    根据所述第一差量的命中率,确定目标存储介质;其中,所述第一差量的命中率用于确定所述第一差量所在的目标映射关系的命中率;当所述确定的目标映射关系的命中率属于所述缓存中的映射关系的命中率范围时,所述目标存储介质是所述缓存;当所述确定的目标映射关系的命中率不属于所述缓存中的映射关系的命中率范围,但属于所述内存中的映射关系的命中率范围时,所述目标存储介质是所述内存;当所述确定的目标映射关系的命中率不属于所述内存中的映射关系的命中率范围时,所述目标存储介质是所述硬盘;
    在所述目标存储介质的第一数据中查找所述第一差量,以确定所述第一差量对应的第二数据;与所述第一差量对应的第二数据为所述第一差量经压缩得到的值。
  10. 一种数据获取方法,其特征在于,应用于存储设备,所述方法包括:
    读取用于恢复当前数据的信息;所述用于恢复当前数据的信息包括差量或差量经压缩得到的值;所述差量是所述当前数据与所述当前数据的预测数据的差量;所述预测数据是基于历史数据的变化规律对所述当前数据进行预测后的数据;
    使用所述历史数据对所述当前数据进行预测,得到所述预测数据;
    根据所述用于恢复当前数据的信息和所述预测数据确定所述当前数据。
  11. 根据权利要求10所述的数据获取方法,其特征在于,执行所述预测所采用的算法包括人工智能AI神经算法。
  12. 根据权利要求11所述的数据获取方法,其特征在于,所述AI神经算法的类型包括以下任一种:归一化最小均方自适应滤波NLMS类型、单层感知SLP类型、多层感知MLP类型或循环神经网络RNN类型。
  13. 根据权利要求11或12所述的数据获取方法,其特征在于,所述方法还包括:
    根据所述用于恢复当前数据的信息与所述AI神经算法的参数之间的对应关系,获取对所述当前数据进行预测所采用的所述AI神经算法的参数;
    所述使用所述历史数据对所述当前数据进行预测,得到所述预测数据,包括:
    根据所述获取的所述AI神经算法的参数,使用所述历史数据对所述当前数据进行预测,得到所述预测数据。
  14. 根据权利要求11至13任一项所述的数据获取方法,其特征在于,所述存储设备包括AI计算卡,所述使用所述历史数据对所述当前数据进行预测,得到所述预测数据,包括:
    通过AI计算卡使用所述历史数据对所述当前数据进行预测,得到所述预测数据。
  15. 根据权利要求10至14任一项所述的数据获取方法,其特征在于,所述用于恢复当前数据的信息包括所述差量经压缩得到的值;所述根据所述用于恢复当前数据的信息和所述预测数据确定所述当前数据,包括:
    对所述差量经压缩得到的值进行解压缩,得到所述差量;
    根据所述差量和所述预测数据,确定所述当前数据。
  16. 根据权利要求15所述的数据获取方法,其特征在于,执行所述解压缩所采用的算法包括字典型解压缩算法,所述字典型解压缩算法的字典包括至少两个集合,每个所述集合包括一个或多个映射关系,每个所述映射关系是指一个第一数据与一个第二数据之间的映射关系,所述第一数据所占的存储空间大于所述第二数据所占的存储空间,每个集合对应一个命中率范围,不同集合对应的命中率范围不同;所述对所述差量经压缩得到的值进行解压缩,得到所述差量,包括:
    获取所述差量经压缩得到的值的命中率;
    根据所述差量经压缩得到的值的命中率,在所述至少两个集合中确定目标集合;所述差量经压缩得到的值的命中率用于确定所述差量经压缩得到的值所在的目标映射关系的命中率,所述确定的目标映射关系的命中率属于所述目标集合对应的命中率范围;
    在所述目标集合的第二数据中查找所述差量经压缩得到的值,以确定与所述差量经压缩得到的值对应的第一数据;与所述差量经压缩得到的值对应的第一数据为所述差量。
  17. 根据权利要求15所述的数据获取方法,其特征在于,所述存储设备的存储介质包括缓存、内存和硬盘;执行所述解压缩所采用的算法包括字典型解压缩算法,所述字典型解压缩算法的字典包括一个或多个映射关系,每个所述映射关系是指一个第一数据与一个第二数据之间的映射关系,所述第一数据所占的存储空间大于所述第二数据所占的存储空间;所述缓存中的映射关系的命中率大于或等于所述内存中的映射关系的命中率,所述内存中的映射关系的命中率大于或等于所述硬盘中的映射关系的命中率;所述对所述差量经压缩得到的值进行解压缩,得到所述差量,包括:
    获取所述差量经压缩得到的值的命中率;
    根据所述差量经压缩得到的值的命中率,确定目标存储介质;其中,所述差量经压缩得到的值的命中率用于确定所述差量经压缩得到的值所在的目标映射关系的命中率,当所述确定的目标映射关系的命中率属于所述缓存中的映射关系的命中率范围时, 所述目标存储介质是所述缓存;当所述确定的目标映射关系的命中率不属于所述缓存中的映射关系的命中率范围,但属于所述内存中的映射关系的命中率范围时,所述目标存储介质是所述内存;当所述确定的目标映射关系的命中率不属于所述内存中的映射关系的命中率范围时,所述目标存储介质是所述硬盘;
    在所述目标存储介质第二数据中查找所述差量经压缩得到的值,以确定与所述差量经压缩得到的值对应的第一数据;与所述差量经压缩得到的值对应的第一数据为所述差量。
  18. 一种存储设备,其特征在于,包括:
    第一获取单元,用于获取当前数据和所述当前数据的历史数据;
    预测单元,用于使用所述历史数据对所述当前数据进行预测,得到第一预测数据;所述第一预测数据是基于所述历史数据的变化规律对所述当前数据进行预测后的数据;
    第二获取单元,用于获取所述当前数据与所述第一预测数据的第一差量;
    存储单元,用于当所述第一差量所占的存储空间小于所述当前数据所占的存储空间时,存储用于恢复所述当前数据的信息;其中,所述用于恢复所述当前数据的信息包括所述第一差量或所述第一差量经压缩得到的值。
  19. 根据权利要求18所述的存储设备,其特征在于,执行所述预测所采用的算法包括人工智能AI神经算法。
  20. 根据权利要求19所述的存储设备,其特征在于,所述AI神经算法的类型包括以下任一种:归一化最小均方自适应滤波NLMS类型、单层感知SLP类型、多层感知MLP类型或循环神经网络RNN类型。
  21. 根据权利要求19或20所述的存储设备,其特征在于,
    所述存储单元还用于,存储所述用于恢复所述当前数据的信息和执行所述预测所采用的所述AI神经算法的参数之间的对应关系。
  22. 根据权利要求19或20所述的存储设备,其特征在于,所述存储设备还包括:
    更新单元,用于通过自适应学习更新所述AI神经算法的参数,并根据更新后的所述AI神经算法的参数,更新所述用于恢复所述当前数据的信息。
  23. 根据权利要求22所述的存储设备,其特征在于,所述更新单元具体用于:
    读取所述用于恢复所述当前数据的信息;
    根据执行所述预测所采用的所述AI神经算法的参数、所述用于恢复所述当前数据的信息和所述当前数据的历史数据,恢复所述当前数据;
    根据更新后的所述AI神经算法的参数和所述当前数据的历史数据,对所述当前数据进行预测,得到第二预测数据;所述第二预测数据是基于所述历史数据的变化规律和更新后的所述AI神经算法的参数对所述当前数据进行预测后的数据;
    获取所述当前数据与所述第二预测数据的第二差量;
    当所述第二差量所占的存储空间小于所述当前数据所占的存储空间时,将所存储的所述用于恢复所述当前数据的信息更新为所述第二差量或者所述第二差量经压缩得到的值。
  24. 根据权利要求19至23任一项所述的存储设备,其特征在于,所述存储设备包括AI计算卡;所述预测单元具体用于:通过所述AI计算卡使用所述历史数据对所 述当前数据进行预测,得到所述第一预测数据。
  25. 一种存储设备,其特征在于,包括:
    读取单元,用于读取用于恢复当前数据的信息;所述用于恢复当前数据的信息包括差量或差量经压缩得到的值;所述差量是所述当前数据与所述预测数据的差量;所述预测数据是基于历史数据的变化规律对所述当前数据进行预测后的数据;
    预测单元,用于使用所述历史数据对所述当前数据进行预测,得到所述预测数据;
    确定单元,用于根据所述用于恢复当前数据的信息和所述预测数据确定所述当前数据。
  26. 根据权利要求25所述的存储设备,其特征在于,执行所述预测所采用的算法包括人工智能AI神经算法。
  27. 根据权利要求26所述的存储设备,其特征在于,所述AI神经算法的类型包括以下任一种:归一化最小均方自适应滤波NLMS类型、单层感知SLP类型、多层感知MLP类型或循环神经网络RNN类型。
  28. 根据权利要求26或27所述的存储设备,其特征在于,所述存储设备还包括:
    获取单元,用于根据所述用于恢复当前数据的信息与所述AI神经算法的参数之间的对应关系,获取对所述当前数据进行预测所采用的所述AI神经算法的参数;
    所述预测单元具体用于:根据所述获取的所述AI神经算法的参数,使用所述历史数据对所述当前数据进行预测,得到所述预测数据。
  29. 根据权利要求25至28任一项所述的存储设备,其特征在于,所述存储设备包括AI计算卡;所述预测单元具体用于:通过AI计算卡使用所述历史数据对所述当前数据进行预测,得到所述预测数据。
  30. 根据权利要求25至29任一项所述的存储设备,其特征在于,所述用于恢复当前数据的信息包括所述差量经压缩得到的值;所述确定单元包括:
    解压缩模块,用于对所述差量经压缩得到的值进行解压缩,得到所述差量;
    确定模块,用于根据所述差量和所述预测数据,确定所述当前数据。
PCT/CN2018/101597 2018-08-21 2018-08-21 数据存储及获取方法和装置 WO2020037511A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP18931202.8A EP3822795B1 (en) 2018-08-21 2018-08-21 Data storage and acquisition method and device
CN201880013245.4A CN111083933B (zh) 2018-08-21 2018-08-21 数据存储及获取方法和装置
PCT/CN2018/101597 WO2020037511A1 (zh) 2018-08-21 2018-08-21 数据存储及获取方法和装置
JP2021509809A JP7108784B2 (ja) 2018-08-21 2018-08-21 データ記憶方法、データ取得方法、及び機器
US17/179,591 US11960467B2 (en) 2018-08-21 2021-02-19 Data storage method, data obtaining method, and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/101597 WO2020037511A1 (zh) 2018-08-21 2018-08-21 数据存储及获取方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/179,591 Continuation US11960467B2 (en) 2018-08-21 2021-02-19 Data storage method, data obtaining method, and apparatus

Publications (1)

Publication Number Publication Date
WO2020037511A1 true WO2020037511A1 (zh) 2020-02-27

Family

ID=69592335

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/101597 WO2020037511A1 (zh) 2018-08-21 2018-08-21 数据存储及获取方法和装置

Country Status (5)

Country Link
US (1) US11960467B2 (zh)
EP (1) EP3822795B1 (zh)
JP (1) JP7108784B2 (zh)
CN (1) CN111083933B (zh)
WO (1) WO2020037511A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109587557B (zh) * 2019-01-11 2022-03-08 京东方科技集团股份有限公司 数据传输方法及装置、显示装置
CN111817722A (zh) * 2020-07-09 2020-10-23 北京奥星贝斯科技有限公司 数据压缩方法、装置及计算机设备
EP4038486A4 (en) * 2020-11-17 2023-02-22 Zhejiang Dahua Technology Co., Ltd DATA STORAGE AND PROCESSING SYSTEMS AND PROCEDURES
CN114095033B (zh) * 2021-11-16 2024-05-14 上海交通大学 基于上下文的图卷积的目标交互关系语义无损压缩系统及方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222085A (zh) * 2011-05-17 2011-10-19 华中科技大学 一种基于相似性与局部性结合的重复数据删除方法
US20120166401A1 (en) * 2010-12-28 2012-06-28 Microsoft Corporation Using Index Partitioning and Reconciliation for Data Deduplication
CN103959259A (zh) * 2012-11-20 2014-07-30 华为技术有限公司 数据存储方法、数据存储装置及数据存储系统
US20170259944A1 (en) * 2016-03-10 2017-09-14 General Electric Company Using aircraft data recorded during flight to predict aircraft engine behavior
CN107357764A (zh) * 2017-06-23 2017-11-17 联想(北京)有限公司 数据分析方法、电子设备及计算机存储介质

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2995037B2 (ja) * 1997-07-04 1999-12-27 三洋電機株式会社 音声符号化復号化装置
US7509356B2 (en) * 2001-09-06 2009-03-24 Iron Mountain Incorporated Data backup
US7225208B2 (en) * 2003-09-30 2007-05-29 Iron Mountain Incorporated Systems and methods for backing up data files
JP2006259937A (ja) 2005-03-15 2006-09-28 Omron Corp データ収集装置およびデータ復元装置
CN101499094B (zh) * 2009-03-10 2010-09-29 焦点科技股份有限公司 一种数据压缩存储并检索的方法及系统
US20100293147A1 (en) * 2009-05-12 2010-11-18 Harvey Snow System and method for providing automated electronic information backup, storage and recovery
CN105025298B (zh) * 2010-01-19 2019-04-16 三星电子株式会社 对图像进行编码/解码的方法和设备
WO2011129819A1 (en) * 2010-04-13 2011-10-20 Empire Technology Development Llc Combined-model data compression
EP2387004B1 (en) * 2010-05-11 2016-12-14 Dassault Systèmes Lossless compression of a structured set of floating point numbers, particularly for CAD systems
CN102760250B (zh) * 2011-04-28 2016-03-30 国际商业机器公司 用于选择碳排放预测方案的方法、设备和系统
US9026505B1 (en) * 2011-12-16 2015-05-05 Emc Corporation Storing differences between precompressed and recompressed data files
JP5841297B1 (ja) * 2013-10-25 2016-01-13 株式会社ワコム 手書きデータ出力方法及びコンピュータシステム
CN104462422A (zh) * 2014-12-15 2015-03-25 北京百度网讯科技有限公司 对象的处理方法及装置
CN104636272A (zh) * 2015-02-25 2015-05-20 浪潮电子信息产业股份有限公司 一种基于差值预测算法的Cache替换策略
CN105205014B (zh) * 2015-09-28 2018-12-07 北京百度网讯科技有限公司 一种数据存储方法和装置
CN106909990A (zh) * 2017-03-01 2017-06-30 腾讯科技(深圳)有限公司 一种基于历史数据的预测方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120166401A1 (en) * 2010-12-28 2012-06-28 Microsoft Corporation Using Index Partitioning and Reconciliation for Data Deduplication
CN102222085A (zh) * 2011-05-17 2011-10-19 华中科技大学 一种基于相似性与局部性结合的重复数据删除方法
CN103959259A (zh) * 2012-11-20 2014-07-30 华为技术有限公司 数据存储方法、数据存储装置及数据存储系统
US20170259944A1 (en) * 2016-03-10 2017-09-14 General Electric Company Using aircraft data recorded during flight to predict aircraft engine behavior
CN107357764A (zh) * 2017-06-23 2017-11-17 联想(北京)有限公司 数据分析方法、电子设备及计算机存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3822795A4 *

Also Published As

Publication number Publication date
EP3822795A4 (en) 2021-08-04
US11960467B2 (en) 2024-04-16
EP3822795A1 (en) 2021-05-19
CN111083933A (zh) 2020-04-28
JP2021534505A (ja) 2021-12-09
US20210173824A1 (en) 2021-06-10
EP3822795B1 (en) 2023-07-26
JP7108784B2 (ja) 2022-07-28
CN111083933B (zh) 2023-02-03

Similar Documents

Publication Publication Date Title
WO2020037511A1 (zh) 数据存储及获取方法和装置
US9048862B2 (en) Systems and methods for selecting data compression for storage data in a storage system
JP2019053772A (ja) オブジェクトネットワークをモデル化するシステム及び方法
CN110018786B (zh) 用于预测数据存储特性的系统和方法
US20190138507A1 (en) Data Processing Method and System and Client
CN112667528A (zh) 一种数据预取的方法及相关设备
WO2021012162A1 (zh) 存储系统数据压缩的方法、装置、设备及可读存储介质
EP3848815B1 (en) Efficient shared bulk loading into optimized storage
US11729268B2 (en) Computer-implemented method, system, and storage medium for prefetching in a distributed graph architecture
WO2021190501A1 (zh) 数据预取方法、装置以及存储设备
WO2022007596A1 (zh) 图像检索系统、方法和装置
WO2023246754A1 (zh) 一种数据重删方法及相关系统
CN110688097A (zh) 高可靠高可用高性能融合型软件定义存储系统及方法
US10067678B1 (en) Probabilistic eviction of partial aggregation results from constrained results storage
WO2022218218A1 (zh) 数据处理方法、装置、归约服务器及映射服务器
WO2023050856A1 (zh) 数据处理方法及存储系统
CN114356241B (zh) 小对象数据的存储方法、装置、电子设备和可读介质
US20210374162A1 (en) Methods and systems for streamlined searching according to semantic similarity
CN114168084A (zh) 文件合并方法、文件合并装置、电子设备以及存储介质
Lv et al. A Survey of Graph Pre-processing Methods: From Algorithmic to Hardware Perspectives
WO2017186049A1 (zh) 信息处理方法和装置
WO2022057054A1 (zh) 一种卷积运算优化方法、系统、终端以及存储介质
US11966393B2 (en) Adaptive data prefetch
WO2022148306A1 (zh) 一种淘汰数据的方法、装置、缓存节点以及缓存系统
WO2022206334A1 (zh) 一种数据压缩方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18931202

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018931202

Country of ref document: EP

Effective date: 20210211

Ref document number: 2021509809

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE