WO2016141735A1 - 缓存数据的确定方法及装置 - Google Patents

缓存数据的确定方法及装置 Download PDF

Info

Publication number
WO2016141735A1
WO2016141735A1 PCT/CN2015/095608 CN2015095608W WO2016141735A1 WO 2016141735 A1 WO2016141735 A1 WO 2016141735A1 CN 2015095608 W CN2015095608 W CN 2015095608W WO 2016141735 A1 WO2016141735 A1 WO 2016141735A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
time window
cache
identifier
cache memory
Prior art date
Application number
PCT/CN2015/095608
Other languages
English (en)
French (fr)
Inventor
柴云鹏
孙东旺
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP15884413.4A priority Critical patent/EP3252609A4/en
Publication of WO2016141735A1 publication Critical patent/WO2016141735A1/zh
Priority to US15/699,406 priority patent/US20170371807A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0888Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0871Allocation or management of cache space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0891Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1021Hit rate improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1032Reliability improvement, data loss prevention, degraded operation etc
    • G06F2212/1036Life time enhancement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/22Employing cache memory using specific memory technology
    • G06F2212/222Non-volatile memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/50Control mechanisms for virtual memory, cache or TLB
    • G06F2212/502Control mechanisms for virtual memory, cache or TLB using adaptive policy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6022Using a prefetch buffer or dedicated prefetch cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6024History based prefetching

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for determining cached data.
  • the processor In the process of processing data, the processor reads data from the cache memory faster than the data from the disk. In order to speed up the data processing, as the data processing progresses, the processor It will continually pick out good data with high traffic and store this good data in a cache memory. On the basis of this, when the processor processes the data, if the required data is already stored in the cache memory, the data is directly read from the cache memory, and the data is cache hit data; if the required data is not yet stored in the cache memory, Read the required data from the disk, which is the cache missing data. In order to speed up the data processing, data with a relatively high amount of access in the cached missing data can be written to the cache memory to ensure that the data can be read directly from the cache memory when the subsequent processor needs the data.
  • the commonly used method is: randomly selecting partially cached missing data from all cached missing data, and using the selected cache missing data as a cache that can be written into the cache memory. Missing data.
  • Cache missing data is divided into good data with high traffic, general data with medium access, and poor data with low traffic. Since the related art only randomly selects the cache missing data when determining the cache missing data, Therefore, the probability that good data, general data, and bad data are selected is equal, so that the ratio of good data, general data, and bad data written to the cache memory is equal, resulting in a low hit rate when the processor subsequently reads data from the cache memory. .
  • the embodiment of the present invention provides a method and a device for determining cache data.
  • the technical solution is as follows:
  • a method for determining cached data comprising:
  • the target pending data is selected according to the number of occurrences, and the target pending data is determined as cache missing data written to the cache memory.
  • the respective packets form a time window sequence
  • the time window sequence includes at least two time windows
  • each time window includes a first preset Numerically storing a first storage unit, and spacing a second predetermined number of second storage units between every two time windows;
  • the grouping and recording the data identifiers of the respective pending data includes:
  • the statistics show the number of occurrences of the data identifiers of the respective pending data in each group, including:
  • the data identifier of each of the pending data is in each time window of the time window sequence
  • the number of occurrences including:
  • the calculating, by the time window of the time window sequence, the first storage unit of the same data identifier is recorded Quantity, including:
  • the calculating, by the time window of the time window sequence, the first storage unit of the same data identifier After the quantity also includes:
  • the selecting a target to be determined according to the number of occurrences Data, determining the target pending data as cache missing data written to the cache memory including:
  • the to-be-determined data whose number of occurrences is not less than the preset threshold is selected, and the to-be-determined data whose occurrence number is not less than the preset threshold is used as the target pending data, and the target pending data is determined as the cache missing data written in the cache memory.
  • the cached missing data is written to the cache memory, it also includes:
  • each third storage unit may record a data identifier of a target pending data ;
  • the hit target pending data is written into the cache memory.
  • the writing the target pending data to the cache memory includes:
  • the target pending data of the hit is directly written into the cache memory according to the data size of the target data to be hit;
  • the data stored in the storage space of the data size in the cache memory is eliminated according to a preset cache replacement algorithm, and the target data to be hit is written into the data to be eliminated. Storage location.
  • a device for determining cached data comprising:
  • An obtaining module configured to acquire a data identifier of the cached missing data that is read, where the data identifier is used to distinguish different cache missing data
  • a selection module configured to select a data identifier of the to-be-determined data based on the acquired data identifier of the cached missing data
  • a recording module configured to group record data identifiers of each pending data
  • a statistics module configured to count the number of occurrences of the data identifiers of the respective to-be-determined data in each group
  • a determining module configured to select target pending data according to the number of occurrences, and determine the target pending data as cache missing data written in the cache memory.
  • the respective packets form a time window sequence
  • the time window sequence includes at least two time windows
  • each time window includes a first preset Numerical first storage unit, with a second preset value between every two time windows Second storage unit;
  • the recording module is configured to sequentially record the data identifiers of the respective pending data into the first storage units of the time window sequence;
  • the statistic module is configured to count the number of occurrences of the data identifiers of the respective to-be-determined data in each time window of the time window sequence.
  • the statistic module includes:
  • a statistical unit configured to count the number of the first storage units that record the same data identifier in each time window of the time window sequence
  • the first determining unit is configured to determine, according to the number of the first storage units that record the same data identifier, the number of occurrences of the data identifiers of the respective pending data in each time window of the time window sequence.
  • the statistical unit is configured to: when all the first storage units in the time window sequence are filled When full, the number of first storage units of the same data identifier is recorded in each time window of the time window sequence.
  • the statistic module further includes:
  • a clearing unit configured to clear the data identifiers stored in the first storage units of the time window sequence, to record, by the time window sequence, the data identifiers of the pending data selected in the subsequent data reading process.
  • the determining module is configured to select the number of occurrences The pending data not less than the preset threshold is used as the target pending data, and the target pending data is determined as the cache missing data written to the cache memory.
  • the device further includes:
  • Adding a module configured to add a data identifier of the target pending data to a preset white list, where the preset white list includes a fourth preset value third storage unit, and each third storage unit can record one target The data identifier of the pending data;
  • a writing module configured to write the hit target pending data to the cache memory when the cached missing data that is subsequently read hits the target pending data corresponding to any one of the preset whitelists.
  • the writing module is configured to:
  • the target pending data of the hit is directly written into the cache memory according to the data size of the target data to be hit;
  • the data stored in the storage space of the data size in the cache memory is eliminated according to a preset cache replacement algorithm, and the target data to be hit is written into the data to be eliminated. Storage location.
  • the target pending data is selected according to the number of occurrences, and the target pending data is determined as the cache of the write cache memory. Missing data. Since the number of occurrences can identify that the cached missing data is read more times, the good data that is read more times can be selected from the read cache missing data, thereby improving the good data stored in the cache memory. The ratio, in turn, can increase the hit rate when the processor subsequently reads data from the cache memory.
  • FIG. 1 is a system architecture diagram of a method for determining cache data according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for determining cache data according to another embodiment of the present invention.
  • FIG. 3 is a flowchart of a method for determining cache data according to another embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a sequence of time windows according to another embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a device for determining cache data according to another embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a device for determining cached data according to another embodiment of the present invention.
  • FIG. 1 it shows a system architecture diagram involved in a method for determining cache data provided by an embodiment of the present invention.
  • the system architecture includes a processor 101, a memory 102, a cache memory 103, and a disk 104.
  • the data required by the processor 101 can be stored in three types of storage media: the memory 102, the cache memory 103, and the disk 104.
  • the order of media access speeds of the three storage media is memory 102 > cache memory 103 > disk 104
  • the order of capacity size is memory 102 ⁇ cache memory 103 ⁇ disk 104.
  • the memory 102 and the cache memory 103 are often used for the first and second level caches, respectively, to speed up the data processing.
  • the processor 101 when the processor 101 reads data, it is generally implemented by a DMA (Direct Memory Access) technology, that is, when the processor 101 reads data from the cache memory 103 or the disk 104, the data is first read. The memory 102 is then read from the memory 102. That is, the processor 101 first reads data from the memory 102 when reading data, and reads the required data from the cache memory 103 if the memory 102 does not have the required data. If the cache memory 103 still does not have the required data, The data is then read from the disk 104.
  • DMA Direct Memory Access
  • the embodiment of the present invention provides a method for determining cache data when the processor 101 needs to read data from the cache memory 103 or the disk 104 when there is no data required by the processor 101 in the memory 102. Specifically, the speed ratio of the processor 101 reading data from the cache memory 103 The speed of reading data from the disk 104 is fast. Therefore, as the data processing process continues, the processor 101 stores a good amount of data with a relatively high amount of access in the cache memory 103 so that the processor can read the data later. The 101 can read the data directly from the cache memory 103 without having to access the disk 104.
  • the data already stored in the cache memory 103 is defined as cache hit data
  • the data that is not stored in the cache memory 103 and that the processor 103 needs to read from the disk 104 is defined as a cache miss. data.
  • cache memories 103 have a limited number of writes, such as an SSD, if data written to the cache memory 103 is data that is not highly accessed, that is, the data is not data that the processor 101 often needs. In this case, the processor 101 may not subsequently acquire the required data from the cache memory 103, that is, the cache hit ratio is not high. At this time, in order to increase the cache hit ratio, the cached data in the cache memory 103 is replaced to eliminate those data that are not highly accessed, and further rewrite the new data to the cache memory 103. This data replacement behavior will cause the data in the cache memory 103 to be updated frequently. However, if the number of writes to the cache memory 103 is limited, frequently updating the data will reduce the useful life of the cache memory 103.
  • FIG. 2 is a flowchart of a method for determining cache data according to an exemplary embodiment.
  • the method process provided by the embodiment of the present invention includes:
  • each group constitutes a time window sequence
  • the time window sequence includes at least two time windows, each time window including a first predetermined number of first storage units, and an interval between each two time windows Two preset values of the second storage unit;
  • the data identification of each pending data is recorded in groups, including:
  • the number of occurrences of the data identification of each pending data in each time window of the time window sequence is counted.
  • the number of occurrences of the data identification of each pending data in each time window of the time window sequence is counted, including:
  • the number of occurrences of the data identification of each pending data in each time window of the time window sequence is determined according to the number of first storage units that record the same data identification.
  • the number of the first storage units of the same data identifier is recorded in each time window of the statistical time window sequence, including:
  • the method further includes:
  • the data identifiers stored in the respective first storage units of the time window sequence are cleared to record the data identifiers of the pending data selected in the subsequent data reading process through the time window sequence.
  • Target pending data Select target pending data according to the number of occurrences, and determine target pending data as cache missing data written to the cache memory.
  • the target pending data is selected according to the number of occurrences, and the target pending data is determined as the cache missing data written into the cache memory, including:
  • the pending data whose occurrence frequency is not less than the preset threshold is selected, the pending data whose occurrence number is not less than the preset threshold is used as the target pending data, and the target pending data is determined as the cache missing data written in the cache memory.
  • the method further includes:
  • the preset whitelist includes a fourth preset value third storage unit, and each third storage unit may record a data identifier of the target pending data;
  • the hit target pending data is written into the cache memory.
  • the hit target pending data is written to the cache memory, including:
  • the target pending data of the hit is directly written into the cache memory according to the data size of the target data to be hit;
  • the data stored in the storage space of the data size in the cache memory is eliminated according to the preset cache replacement algorithm, and the target data to be hit is written to the storage location corresponding to the eliminated data.
  • the target pending data is selected according to the number of occurrences, and the target pending data is obtained.
  • the cache is missing data that is determined to be written to the cache memory. Since the number of occurrences can identify that the cached missing data is read more times, the good data that is read more times can be selected from the read cache missing data, thereby improving the good data stored in the cache memory. Ratio, which in turn increases the number of subsequent reads from the cache memory by the processor According to the hit rate.
  • FIG. 3 is a flowchart of a method for determining cache data according to an exemplary embodiment of the present invention. As shown in FIG. 3, the method process provided by the embodiment of the present invention includes:
  • the processor when the data required by the processor is not stored in memory or cache memory, the processor needs to read data from the disk. Among them, the data read by the processor from the disk is the cache missing data. However, among all the cache missing data read by the processor, some cache missing data may also be good data that the processor needs more times. For these good data, it can be written to the cache memory, so that the processor can read these good data directly from the cache memory, thereby improving the data processing speed. Therefore, when the processor reads the cache missing data, it triggers the processor to determine whether to write it to the cache memory.
  • the data identifier can be used to distinguish different cache missing data, that is, one cache missing data corresponds to one data identifier.
  • the data identifier of the cached data may be processed only. Therefore, after the cached missing data is read, the The data ID of the cached missing data read.
  • Each cache missing data may carry its corresponding data identifier. On this basis, when the data identifier of the cached missing data is obtained, for each cached missing data, it can be parsed to obtain its data identifier.
  • each read when acquiring the data identifier of the cached missing data that is read, each read can be slowed down.
  • the missing data is stored, and the data identifier of the cached missing data is obtained once; after some cached missing data is read, the data identifier of the cached missing data is obtained once.
  • the process of determining cache missing data may be periodic, such as once every day, or once every hour. Therefore, when the data identifier of the cached missing data is obtained, the data identifier of all the cached missing data read in the cycle may be uniformly acquired once the cycle is reached.
  • the processor may process the read data identifier of each cached missing data to determine whether to write the data identifier to the cache memory according to the processing result. It is also possible to process only the data identifier of the partially cached missing data that is read to determine whether to write it to the cache memory according to the processing result.
  • the data identifier of the cached missing data that needs to be processed is defined as the data identifier of the data to be determined, that is, in the embodiment of the present invention, based on the processing result of the data identifier of the data to be determined, whether to determine the pending data is determined. Into the cache memory.
  • the processor needs to select the data identifier of the pending data based on the obtained data identifier of the cached missing data.
  • the processor may obtain a data identifier of the cached missing data, that is, determine whether the data identifier of the cached data is determined as the data identifier of the pending data, or obtain a certain number of data identifiers of the cached missing data, based on the a certain number of cached data identifiers, selecting a data identifier of the pending data from the data identifiers of the certain number of cached missing data; and when obtaining the data identifier of the cached missing data for a certain period of time, based on the certain The data identifier of the cached missing data acquired in the duration, and the data identifier of the pending data is selected from the data identifier of the cached missing data for a certain period of time.
  • the processor selects the data identifier of the pending data based on the acquired data identifier of the cached missing data, including but not limited to the following two implementation manners:
  • the first implementation manner is: performing random sampling on the data identifier of the cached missing data that is read, and using the data identifier of each cached missing data as the data identifier of the selected pending data.
  • the processor acquires a data identifier of the cached missing data, that is, whether to write it to the cache memory at a time, a lot of load is added, thereby reducing the data processing speed of the processor.
  • the data identifiers of all cached missing data that are read may be randomly sampled, and the data identifier of the cached missing data is used as the data identifier of the pending data, and only for the data identifier of the pending data. Processing, while the data identifiers for other cache miss data that are not drawn are not processed, thereby optimizing the load of the processor to reduce the impact on the processing speed of the processor when processing other data in the process of determining the cache missing data. .
  • the data identifier of the cached missing data is randomly sampled, and the data identifier of each cached missing data is used as the data identifier of the selected data to be determined, including but not limited to the following methods:
  • the first type the data identifier of the cached missing data read is selected every preset time, and the data identifier of the cached missing data is used as the data identifier of the pending data.
  • the specific value of the preset duration can be set according to experience.
  • the preset duration may be 1 minute, 2 minutes, 5 minutes, and the like.
  • the processor may use the data identifier of the cached missing data acquired next time as the data identifier of the selected data to be determined.
  • the processor samples the data identifier of the cached missing data read every 3 minutes, and the timing start point is 14:20:15. If the processor arrives at 14:23:15, the processor is getting the cache missing data a. The data identifier, the processor will cache the data identifier of the missing data a as a data identifier of the selected data to be determined. If the processor does not get the data identifier of the cache missing data when it reaches 14:23:15, and the processor gets the data identifier of the next cache missing data b at 14:23:20, the processor will cache The data of the missing data b is identified as the data identifier of a pending data of the lottery.
  • the processor does not obtain the data identifier of the cached missing data. Then, it is also determined that the data identifier of the data to be determined is not obtained in this time period, and continues to count. After the preset duration is again, the data identifier of the pending data is continuously selected.
  • the second type after obtaining the data identifier of the third preset value cache missing data, selecting a data identifier of the obtained cache missing data, and using the data identifier of the selected cache missing data as a data identifier of the pending data.
  • the number of the third preset values may be set as needed.
  • the third preset value may be 10, 30, 50, or the like.
  • the processor may count the number of data identifiers of the acquired cache miss data from the start of the timing, and set the initial value of the count quantity to 0.
  • the processor may count the number of data identifiers of the acquired cache miss data from the start of the timing, and set the initial value of the count quantity to 0.
  • the processor acquires the data identifier of the cached missing data
  • the number of the count is incremented by one until the counted number reaches the third preset value, and the third preset value is obtained when the count reaches the third preset value.
  • the data identifier of the cached missing data is identified as the data identifier of a pending data.
  • the processor is acquiring the data identifier of the cache missing data c, and the data identifier of the cache missing data c is taken as The data identifier of a pending data of the lottery.
  • the third type each time the cached missing data is read, a corresponding random probability is generated; it is determined whether the random probability generated each time is not greater than a preset probability; if the generated random probability is not greater than the preset probability, then The data identifier of the cache missing data read when the random probability is generated is used as the data identifier of a pending data.
  • the preset probability is a preset probability of sampling the read cache missing data.
  • the specific value of the preset probability is not specifically limited in the embodiment of the present invention.
  • the preset probability may be 0.2, 0.3, or the like.
  • the embodiment of the present invention when generating a random probability, it can be implemented according to a predefined random function.
  • the specific content of the random function the embodiment of the present invention does not specifically limit, and the value generated according to the random function is guaranteed to be between 0-1.
  • the data reading process is generated during the data reading process of reading the cache missing data at one time.
  • the generated probability may be compared with the preset probability. When the generated probability is not less than the preset probability, the generated probability is generated.
  • the data identifier of the cache miss data read at random probability is used as the data identifier of a pending data.
  • the data identifier of d is the data identifier of a pending data.
  • the problem can be avoided.
  • the data identification of all cached missing data that is read performs this processing and consumes too much load on the processor, thereby reducing the impact on the processing speed of the data.
  • the second implementation manner the data identifier of each cached missing data obtained is used as a data identifier of a pending data.
  • the data identifier for each cached missing data is processed. In this way, it is ensured that each cached missing data read is identified, so that the subsequently determined target pending data is more accurate.
  • the data identifiers of the respective pending data are grouped and recorded, and the data of each of the pending data is further identified in each group by statistics.
  • the number of occurrences in the implementation. When the number of occurrences of the data identifier of a certain pending data in each group is relatively high, it indicates that the number of accesses of the pending data is relatively high, so that the pending data can be determined as good data.
  • the occurrence of the data identifier of a certain pending data in the set of data is contingent, and when multiple packets are set, when the data identifier of a certain pending data appears in each packet.
  • the number of times is high, it can eliminate the number of occurrences caused by accidental reasons, and ensure that the number of visits can be accurately selected. High good data.
  • the number of occurrences of the data identifiers of the respective pending data in each group may represent different meanings. For example, for a data identifier of a certain pending data, when the data identifier of the same pending data is not recorded in each group, the number of occurrences of the data identifier of the pending data in each group represents the number of groups of data identifiers for recording the pending data. When each packet can be used to record a data identifier of the same pending data, the number of occurrences of the data identifier of the pending data in each packet represents the sum of the number of occurrences of the data identifier of the pending data in each packet.
  • each grouping may constitute a sequence of time windows, the sequence of time windows comprising at least two time windows, each time window comprising a first predetermined number of first storage units, and a second preset between each two time windows The value is a second storage unit.
  • the specific number of time windows included in the time window sequence is not specifically limited in the embodiment of the present invention.
  • the time window sequence may include 3 time windows, may also include 5 time windows, and the like.
  • the specific values of the first preset value and the second preset value are also not limited in the embodiment of the present invention.
  • the first preset value may be 3, 4, 5, etc.
  • the second preset value may be 1 or 2, and the like.
  • the first storage unit and the second storage unit may be storage units of different sizes, or may be storage units of the same size, which is not limited in the embodiment of the present invention.
  • FIG. 4 shows a structural diagram of a sequence of time windows.
  • the time window sequence shown in FIG. 4 includes three time windows, each time window can store data identifiers of three pending data, the gap between each two time windows and the size of the first storage unit storing one data identifier. the same.
  • the data identifiers of the respective pending data may be separately recorded into the respective first storage units of the time window sequence. Specifically, when the data identifiers of the respective pending data are separately recorded into the first storage unit of the time window sequence, the order of each time window in the time window sequence may be pressed. The first storage unit of each time window is sequentially filled.
  • the acquired data identifier of the first pending data may be recorded into the first first storage unit of the first time window, and the acquired data identifier of the second pending data is recorded to In the second first storage unit of the first time window, the acquired data identifier of the fourth pending data is recorded into the first first storage unit of the second time window, and so on.
  • the number of occurrences of the data of each of the pending data in each time window of the time window sequence may be counted by counting achieve.
  • Step 1 Count the number of first storage units of the same data identifier in each time window of the time window sequence.
  • the occurrence of the data identifier of the same pending data in the time window sequence may be determined according to the number of the first storage units that record the same data identifier. frequency. In combination with the content, in order to determine the number of occurrences of the same pending data in each time window of the time window sequence, it is necessary to first count the number of first storage units in the respective time windows of the time window sequence that record the same data identifier.
  • each process of determining cache miss data may be defined by whether all of the first memory cells of the time window sequence are filled.
  • it can be regarded as a process of determining the cache missing data; otherwise, a process of determining the cache missing data is not finished yet, and the data identifier of the pending data still needs to be used.
  • the process of determining to cache missing data ends when the first storage unit that is not occupied is filled until all of the first storage units are filled.
  • the process of determining the cache missing data may not be limited to the end when all the first storage units are filled, or the occupancy rate of the first storage unit in the time window sequence may reach a preset threshold.
  • the embodiment of the present invention does not limit the definition criteria of a process for determining cache miss data. In this case, when the occupancy rate of the first storage unit of the time window sequence reaches a preset threshold, the number of the first storage units of the same data identifier is recorded in each time window of the time window sequence.
  • Step 2 Determine the number of occurrences of the data identifiers of the respective pending data in each time window of the time window sequence according to the number of the first storage units that record the same data identifier.
  • the number of the first storage unit that records the same data identifier is the number of occurrences of the data identifier of the pending data in the time window sequence. For example, if the number of first storage units of the data identification of the pending data a is 3, it can be indicated that the number of occurrences of the pending data a in each time window of the time window sequence is three.
  • the data identifier of the pending data When the number of times a certain pending data is accessed is relatively high, the data identifier of the pending data will have many chances to be recorded in each first storage unit of the time window sequence, and therefore, according to each pending data in the time window sequence The number of occurrences of the data identifier selects good data with a relatively high number of visits. Referring to FIG. 4, since the data identifier a of the pending data a appears most frequently in the time window sequence, it can be determined that the pending data a is good data with a relatively high number of accesses.
  • a process of determining cache miss data ends.
  • the time window sequence can also be cleared in time.
  • good data with high access amount can be identified from all the pending data, so that the good data can be written to the cache memory later, and the cache miss data with other low access values is not written into the cache memory. This can increase the proportion of good data in the cache memory.
  • the number of occurrences of the data identifier of a certain pending data in the time window is high.
  • the number of occurrences of a certain data to be determined in each time window is relatively large, the number of occurrences caused by accidental reasons can be eliminated, and it is ensured that the data with relatively high number of accesses can be accurately selected.
  • Target pending data Select target pending data according to the number of occurrences, and determine target pending data as cache missing data written to the cache memory.
  • Each pending data includes good pending data with high traffic, and general pending data with medium access and poor pending data with low access.
  • the target pending data may be selected according to the number of occurrences of each pending data in each packet, and the target pending data is determined as the cache missing data written to the cache memory. .
  • the target pending data when the target pending data is selected according to the number of occurrences, and the target pending data is determined as the cache missing data written in the cache memory, the pending data whose number of occurrences is not less than the preset threshold may be selected as the target pending data, and the target pending data is to be determined.
  • the cache is missing data that is determined to be written to the cache memory.
  • the preset threshold may be determined according to the number of time windows in the sequence of time windows and whether each of the first storage units in each time window records the same data identifier. If each of the first storage units in each time window does not record the same data identification, the preset threshold is not greater than the number of time windows. For example, when the number of time windows is 5, and each first storage unit in each time window does not record the same data identifier, the preset threshold may be 3, 4, 5, and the like. If each of the first storage units in each time window can record the same data identification, the preset threshold may be greater than the number of time windows. For example, when the number of time windows is 5, and each of the first storage units in each time window can record the same data identification, the preset threshold may be 4, 5, 7, 9, and the like.
  • the cache memory is stored, wherein the preset white list includes a fourth preset value third storage unit, and each third storage unit can record a data identifier of the target pending data.
  • This step is an optional step. After determining the target pending data that can be written to the cache memory through steps 301 to 305, the target pending data hit in the subsequent data reading process can be written into the cache memory by the optional step.
  • the current storage condition of the cache memory is combined, including but not limited to the following two cases:
  • the storage space of the cache memory is not yet full.
  • the hit target pending data can be directly written into the cache memory according to the data size of the target data to be hit. For example, if the data size of the hit target pending data is 20k, the hit target pending data can be written into the 20k storage space where the cache memory has not yet stored data.
  • the second case the storage space of the cache memory is full.
  • the data stored in the storage space of the data size in the cache memory may be eliminated according to the preset cache replacement algorithm, and the target pending data to be hit is written into the storage location corresponding to the eliminated data.
  • the preset cache replacement algorithm may be a FIFO (First Input First Output) algorithm or a LRU (Least Recently Used) algorithm.
  • the preset cache replacement algorithm is a FIFO algorithm
  • the data stored in the storage space of the data size stored first in the cache memory may be eliminated according to the data size of the target data to be hit, and then the target data to be hit is written. Enter the storage location corresponding to the obsolete data.
  • the preset cache replacement algorithm is the LRU algorithm, The data size of the hit target data to be determined, the data stored in the storage space of the data size which is the least recently used in the cache memory is eliminated, and then the target data to be hit is written to the storage location corresponding to the eliminated data.
  • the preset cache replacement algorithm may also be used for other cache replacement algorithms.
  • the embodiment of the present invention does not limit the specific content of the preset cache replacement algorithm.
  • the target pending data can be written into the cache memory without specifically consuming processor resources, so that the process of writing the target pending data to the cache memory is processed by the processor.
  • the effect of speed is small.
  • the partial data in the preset whitelist may be eliminated according to the FIFO algorithm or the LRU algorithm, and the data identifier of the subsequently determined target pending data is added to the preset whitelist.
  • the preset whitelist may also sort the respective data identifiers according to the number of data identifiers obtained in a process of determining the cached missing data.
  • it can also be implemented according to the number of each data identifier. For example, the data identifier with the least number of data identifiers in the preset whitelist can be eliminated.
  • the determined target pending data may be directly written into the cache memory without using the optional step, which is not specifically limited in the embodiment of the present invention.
  • the target pending data is selected according to the number of occurrences, and the target pending data is obtained.
  • the cache is missing data that is determined to be written to the cache memory. Since the number of occurrences can identify that the cached missing data is read more times, the good data that is read more times can be selected from the read cache missing data, thereby improving the good data stored in the cache memory.
  • the ratio in turn, can increase the hit rate when the processor subsequently reads data from the cache memory.
  • an embodiment of the present invention provides a device for determining cached data, where the device for determining cache data may be used to perform the determination of cached data provided by the embodiment corresponding to FIG. 2 or FIG. law.
  • the determining device of the cache data includes:
  • the obtaining module 501 is configured to obtain a data identifier of the cached missing data that is read, where the data identifier is used to distinguish different cache missing data.
  • the selecting module 502 is configured to select a data identifier of the to-be-determined data based on the acquired data identifier of the cached missing data;
  • a recording module 503, configured to perform group record recording of data identifiers of each to-be-determined data
  • the statistics module 504 is configured to count the number of occurrences of the data identifiers of the respective pending data in each group;
  • the determining module 505 is configured to select the target pending data according to the number of occurrences, and determine the target pending data as the cache missing data written in the cache memory.
  • each group constitutes a time window sequence
  • the time window sequence includes at least two time windows, each time window including a first predetermined number of first storage units, and an interval between each two time windows Two preset values of the second storage unit;
  • the recording module 503 is configured to sequentially record the data identifiers of the respective pending data into the first storage units of the time window sequence;
  • the statistics module 504 is configured to count the number of occurrences of the data identifiers of the respective pending data in each time window of the time window sequence.
  • the statistics module 504 includes:
  • a statistical unit configured to count the number of the first storage units of the same data identifier in each time window of the time window sequence
  • a determining unit configured to determine, according to the number of the first storage units that record the same data identifier, the number of occurrences of the data identifiers of the respective pending data in each time window of the time window sequence.
  • a statistical unit is configured to record the number of first storage units of the same data identifier in each time window of the statistical time window sequence when all of the first storage units in the time window sequence are filled.
  • the statistics module 504 further includes:
  • the clearing unit is configured to clear the data identifiers stored in the first storage units of the time window sequence to store the data identifiers of the pending data selected in the subsequent data reading process through the time window sequence.
  • the determining module 505 is configured to select pending data whose occurrence frequency is not less than a preset threshold, and to determine the pending data that is not less than the preset threshold as the target pending data, and determine the target pending data as the write cache.
  • the cache of the memory is missing data.
  • the apparatus further includes:
  • the adding module is configured to add the data identifier of the target pending data to the preset white list, where the preset white list includes a fourth preset value third storage unit, and each third storage unit can record data of a target pending data.
  • the writing module is configured to write the hit target pending data into the cache memory when the cached missing data that is subsequently read hits the target pending data corresponding to any one of the preset whitelists.
  • the write module is used to:
  • the target pending data of the hit is directly written into the cache memory according to the data size of the target data to be hit;
  • the data stored in the storage space of the data size in the cache memory is eliminated according to the preset cache replacement algorithm, and the target data to be hit is written to the storage location corresponding to the eliminated data.
  • the device records the data identifiers of the respective to-be-determined data, and counts the number of occurrences of the data identifiers of the respective pending data in each group, selects the target pending data according to the number of occurrences, and sets the target pending data.
  • the cache is missing data that is determined to be written to the cache memory. Since the number of occurrences can identify that the cached missing data is read more times, the good data that is read more times can be selected from the read cache missing data, thereby improving the good data stored in the cache memory.
  • the ratio in turn, can increase the hit rate when the processor subsequently reads data from the cache memory.
  • an embodiment of the present invention provides a device for determining cached data, which may be used to perform a method for determining cached data provided by the embodiment corresponding to FIG. 2 or FIG.
  • the cache data determining means includes a processor 601, a memory 604, a cache memory 602, and a disk 603. among them:
  • Disk 603 is used to store cache missing data
  • the cache memory 602 is configured to store cache hit data
  • the memory 604 is used to store data read by the processor 601 from the cache memory 602 or the disk 603;
  • the processor 601 is configured to:
  • the target pending data is selected according to the number of occurrences, and the target pending data is determined as the cache missing data written to the cache memory.
  • each group constitutes a time window sequence
  • the time window sequence includes at least two time windows, each time window including a first predetermined number of first storage units, and an interval between each two time windows Two preset values of the second storage unit;
  • the processor 601 is also used to:
  • processor 601 is further configured to:
  • the number of occurrences of the data identification of each pending data in each time window of the time window sequence is determined according to the number of first storage units that record the same data identification.
  • processor 601 is further configured to:
  • processor 601 is further configured to:
  • the data identifiers stored in the respective first storage units of the time window sequence are cleared to record the data identifiers of the pending data selected in the subsequent data reading process through the time window sequence.
  • processor 601 is further configured to:
  • the pending data whose occurrence frequency is not less than the preset threshold is selected, the pending data whose occurrence number is not less than the preset threshold is used as the target pending data, and the target pending data is determined as the cache missing data written in the cache memory.
  • processor 601 is further configured to:
  • the preset whitelist includes a fourth preset value third storage unit, and each third storage unit may record a data identifier of the target pending data;
  • the hit target pending data is written into the cache memory.
  • processor 601 is further configured to:
  • the target pending data of the hit is directly written into the cache memory according to the data size of the target data to be hit;
  • the cache memory When the storage space of the cache memory is full, the cache memory is eliminated according to the preset cache replacement algorithm.
  • the data stored in the storage space of the medium data size writes the target pending data to be written to the storage location corresponding to the eliminated data.
  • the device records the data identifiers of the respective to-be-determined data, and counts the number of occurrences of the data identifiers of the respective pending data in each group, selects the target pending data according to the number of occurrences, and sets the target pending data.
  • the cache is missing data that is determined to be written to the cache memory. Since the number of occurrences can identify that the cached missing data is read more times, the good data that is read more times can be selected from the read cache missing data, thereby improving the good data stored in the cache memory.
  • the ratio in turn, can increase the hit rate when the processor subsequently reads data from the cache memory.
  • the determining device for the cached data provided by the foregoing embodiment is only illustrated by the division of the foregoing functional modules. In an actual application, the foregoing functions may be allocated by different functional modules according to requirements. Upon completion, the internal structure of the device is divided into different functional modules to perform all or part of the functions described above.
  • the device for determining the cached data provided by the foregoing embodiment is the same as the method for determining the cached data, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本发明公开一种缓存数据的确定方法及装置,属于计算机技术领域。包括:获取所读取的缓存缺失数据的数据标识;基于获取的数据标识选择待定数据的数据标识;将各个数据标识进行分组记录;统计各个数据标识在各个分组中的出现次数;根据出现次数选择目标待定数据,将目标待定数据确定为写入缓存存储器的缓存缺失数据。通过将各个数据标识进行分组记录,统计各个数据标识在各个分组中的出现次数后,根据出现次数选择目标待定数据,将目标待定数据确定为写入缓存存储器的缓存缺失数据。由于出现次数多能标识缓存缺失数据的读取次数比较多,因而可选出读取次数高的好数据,从而能提高缓存存储器中存储的好数据的比例,进而能提高后续数据读取的命中率。

Description

缓存数据的确定方法及装置 技术领域
本发明涉及计算机技术领域,特别涉及一种缓存数据的确定方法及装置。
背景技术
计算机的处理器在处理数据的过程中,由于处理器从缓存存储器读取数据的速度比从磁盘中读取数据的速度快,为了加快数据处理速度,随着数据处理过程的不断进行,处理器会不断挑选出访问量比较高的好数据,并将这些好数据存储于缓存存储器中一份。在此基础上,处理器在处理数据时,如果所需数据已存储于缓存存储器,则直接从缓存存储器读取数据,该数据为缓存命中数据;如果所需数据还未存储于缓存存储器,则从磁盘读取所需数据,该数据为缓存缺失数据。为了加快数据处理速度,对于缓存缺失数据中访问量比较高的数据,可以将其写入缓存存储器,确保后续处理器需要该数据时,可以直接从缓存存储器读取该数据。然而,由于有些缓存存储器的写入次数有限,如SSD(Solid State Disk,固态硬盘),频繁向该类缓存存储器写入数据会降低该类缓存存储器的使用寿命。因此,在向该类缓存存储器写入数据之前,需要从所有的缓存缺失数据中,确定可以写入该类缓存存储器的缓存缺失数据。
相关技术在确定哪些缓存缺失数据可以写入缓存存储器时,通常采用的方法为:从所有缓存缺失数据中,随机选择部分缓存缺失数据,将该选择的缓存缺失数据作为可以写入缓存存储器的缓存缺失数据。
在实现本发明的过程中,发明人发现相关技术至少存在以下问题:
缓存缺失数据分为访问量高的好数据、访问量居中的一般数据和访问量低的差数据。由于相关技术在确定缓存缺失数据时,仅是随机选择缓存缺失数据, 因此,好数据、一般数据和差数据被选中的概率相等,使写入缓存存储器的好数据、一般数据和差数据的比例相等,导致处理器后续从缓存存储器读取数据时的命中率不高。
发明内容
为了提高缓存存储器中存储的访问次数比较高的好数据的比例,从而提高处理器后续从缓存存储器读取数据时的命中率,本发明实施例提供了一种缓存数据的确定方法及装置。所述技术方案如下:
第一方面,提供了一种缓存数据的确定方法,所述方法包括:
获取所读取的缓存缺失数据的数据标识,所述数据标识用于区分不同的缓存缺失数据;
基于获取的缓存缺失数据的数据标识选择待定数据的数据标识;
将各个待定数据的数据标识进行分组记录;
统计所述各个待定数据的数据标识在各个分组中的出现次数;
根据所述出现次数选择目标待定数据,将所述目标待定数据确定为写入缓存存储器的缓存缺失数据。
结合第一方面,在第一方面的第一种可能的实现方式中,所述各个分组构成一个时间窗口序列,所述时间窗口序列包括至少两个时间窗口,每个时间窗口包括第一预设数值个第一存储单元,每两个时间窗口之间间隔第二预设数值个第二存储单元;
所述将各个待定数据的数据标识进行分组记录,包括:
将各个待定数据的数据标识按顺序分别记录至所述时间窗口序列的各个第一存储单元中;
所述统计所述各个待定数据的数据标识在各个分组中的出现次数,包括:
统计所述各个待定数据的数据标识在所述时间窗口序列的各个时间窗口中的出现次数。
结合第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,所述统计所述各个待定数据的数据标识在所述时间窗口序列的各个时间窗口中的出现次数,包括:
统计所述时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量;
根据所述记录同一数据标识的第一存储单元的数量,确定所述各个待定数据的数据标识在所述时间窗口序列的各个时间窗口中的出现次数。
结合第一方面的第二种可能的实现方式,在第一方面的第三种可能的实现方式中,所述统计所述时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量,包括:
当所述时间窗口序列中的所有第一存储单元均被填满时,统计所述时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量。
结合第一方面的第二种可能的实现方式,在第一方面的第四种可能的实现方式中,所述统计所述时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量之后,还包括:
清除所述时间窗口序列的各个第一存储单元中存储的数据标识,以通过所述时间窗口序列记录后续数据读取过程中选择的待定数据的数据标识。
结合第一方面至第一方面的第四种可能的实现方式中的任一种可能的实现方式,在第一方面的第五种可能的实现方式中,所述根据所述出现次数选择目标待定数据,将所述目标待定数据确定为写入缓存存储器的缓存缺失数据,包括:
选择出现次数不小于预设阈值的待定数据,将所述出现次数不小于预设阈值的待定数据作为目标待定数据,将所述目标待定数据确定为写入缓存存储器的缓存缺失数据。
结合第一方面至第一方面的第五种可能的实现方式中的任一种可能的实现方式,在第一方面的第六种可能的实现方式中,所述将所述目标待定数据确 定为写入缓存存储器的缓存缺失数据之后,还包括:
将所述目标待定数据的数据标识添加至预设白名单中,所述预设白名单包括第四预设数值个第三存储单元,每个第三存储单元可记录一个目标待定数据的数据标识;
当后续读取的缓存缺失数据命中所述预设白名单中的任一个数据标识对应的目标待定数据时,将命中的目标待定数据写入所述缓存存储器。
结合第一方面的第六种可能的实现方式,在第一方面的第七种可能的实现方式中,所述将命中的目标待定数据写入所述缓存存储器,包括:
当所述缓存存储器的存储空间还未满时,根据所述命中的目标待定数据的数据大小直接将所述命中的目标待定数据写入所述缓存存储器;
当所述缓存存储器的存储空间已满时,根据预设缓存替换算法淘汰所述缓存存储器中所述数据大小的存储空间所存储的数据,将所述命中的目标待定数据写入被淘汰数据对应的存储位置。
第二方面,提供了一种缓存数据的确定装置,所述装置包括:
获取模块,用于获取所读取的缓存缺失数据的数据标识,所述数据标识用于区分不同的缓存缺失数据;
选择模块,用于基于获取的缓存缺失数据的数据标识选择待定数据的数据标识;
记录模块,用于将各个待定数据的数据标识进行分组记录;
统计模块,用于统计所述各个待定数据的数据标识在各个分组中的出现次数;
确定模块,用于根据所述出现次数选择目标待定数据,将所述目标待定数据确定为写入缓存存储器的缓存缺失数据。
结合第二方面,在第二方面的第一种可能的实现方式中,所述各个分组构成一个时间窗口序列,所述时间窗口序列包括至少两个时间窗口,每个时间窗口包括第一预设数值个第一存储单元,每两个时间窗口之间间隔第二预设数值 个第二存储单元;
所述记录模块,用于将各个待定数据的数据标识按顺序分别记录至所述时间窗口序列的各个第一存储单元中;
所述统计模块,用于统计所述各个待定数据的数据标识在所述时间窗口序列的各个时间窗口中的出现次数。
结合第二方面的第一种可能的实现方式,在第二方面的第二种可能的实现方式中,所述统计模块包括:
统计单元,用于统计所述时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量;
第一确定单元,用于根据所述记录同一数据标识的第一存储单元的数量,确定所述各个待定数据的数据标识在所述时间窗口序列的各个时间窗口中的出现次数。
结合第二方面的第二种可能的实现方式,在第二方面的第三种可能的实现方式中,所述统计单元,用于当所述时间窗口序列中的所有第一存储单元均被填满时,统计所述时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量。
结合第二方面的第二种可能的实现方式,在第二方面的第四种可能的实现方式中,所述统计模块还包括:
清除单元,用于清除所述时间窗口序列的各个第一存储单元中存储的数据标识,以通过所述时间窗口序列记录后续数据读取过程中选择的待定数据的数据标识。
结合第二方面至第二方面的第四种可能的实现方式中的任一种可能的实现方式,在第二方面的第五种可能的实现方式中,所述确定模块,用于选择出现次数不小于预设阈值的待定数据作为目标待定数据,将所述目标待定数据确定为写入缓存存储器的缓存缺失数据。
结合第二方面至第二方面的第五种可能的实现方式中的任一种可能的实 现方式,在第二方面的第六种可能的实现方式中,所述装置还包括:
添加模块,用于将所述目标待定数据的数据标识添加至预设白名单中,所述预设白名单包括第四预设数值个第三存储单元,每个第三存储单元可记录一个目标待定数据的数据标识;
写入模块,用于当后续读取的缓存缺失数据命中所述预设白名单中的任一个数据标识对应的目标待定数据时,将命中的目标待定数据写入所述缓存存储器。
结合第二方面的第六种可能的实现方式,在第二方面的第七种可能的实现方式中,所述写入模块用于:
当所述缓存存储器的存储空间还未满时,根据所述命中的目标待定数据的数据大小直接将所述命中的目标待定数据写入所述缓存存储器;
当所述缓存存储器的存储空间已满时,根据预设缓存替换算法淘汰所述缓存存储器中所述数据大小的存储空间所存储的数据,将所述命中的目标待定数据写入被淘汰数据对应的存储位置。
本发明实施例提供的技术方案带来的有益效果是:
通过将各个待定数据的数据标识进行分组记录,并统计各个待定数据的数据标识在各个分组中的出现次数后,根据出现次数选择目标待定数据,并将目标待定数据确定为写入缓存存储器的缓存缺失数据。由于出现次数多能够标识该缓存缺失数据的被读取次数比较多,因而可以从所读取的缓存缺失数据中选择出被读取次数多的好数据,从而能够提高缓存存储器中存储的好数据的比例,进而能够提高处理器后续从缓存存储器读取数据时的命中率。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。
图1是本发明一实施例提供的缓存数据的确定方法所涉及的系统架构图;
图2是本发明另一实施例提供的一种缓存数据的确定方法的流程图;
图3是本发明另一实施例提供的一种缓存数据的确定方法的流程图;
图4是本发明另一实施例提供的一种时间窗口序列的示意图;
图5是本发明另一实施例提供的一种缓存数据的确定装置的结构示意图;
图6是本发明另一实施例提供的一种缓存数据的确定装置的结构示意图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
如图1所示,其示出了本发明实施例提供的缓存数据的确定方法所涉及的系统架构图。如图1所示,该系统架构包括处理器101、内存102、缓存存储器103和磁盘104。处理器101所需数据可以存储在内存102、缓存存储器103和磁盘104这三种存储介质中。其中,这三种存储介质的介质访问速度的顺序是内存102>缓存存储器103>磁盘104,而容量大小的顺序是内存102<缓存存储器103<磁盘104。综合考虑成本,内存102和缓存存储器103常分别用于第一、二级缓存,以加速数据处理速度。具体地,处理器101在读取数据时,一般通过DMA(Direct Memory Access,直接内存存取)技术实现,即处理器101在从缓存存储器103或磁盘104读取数据时,先将数据读到内存102中,然后再从内存102中读取。也就是说,处理器101在读取数据时,先从内存102读取数据,如果内存102没有所需数据,则从缓存存储器103读取所需数据,如果缓存存储器103仍然没有所需数据,再从磁盘104读取数据。
其中,本发明实施例针对当内存102中没有处理器101所需的数据时,处理器101需要从缓存存储器103或磁盘104读取数据的情况,提出一种缓存数据的确定方法。具体地,由于处理器101从缓存存储器103读取数据的速度比 从磁盘104读取数据的速度快,因此,随着数据处理过程的不断进行,处理器101会在缓存存储器103中存储一份访问量比较高的好数据,以便后续读取数据时,处理器101可以直接从缓存存储器103读取这些数据,而无需再访问磁盘104。为了便于说明,在本发明实施例中,将缓存存储器103中已经存储的数据定义为缓存命中数据,而将缓存存储器103中未存储、处理器103需要从磁盘104读取的数据定义为缓存缺失数据。
另外,由于有些缓存存储器103的写入次数有限,如SSD,如果向该类缓存存储器103写入的数据为访问量不高的数据,即该数据不是处理器101经常需要的数据。在此种情况下,处理器101后续可能不能从缓存存储器103中获取到所需数据,即缓存命中率不高。此时,为了提高缓存命中率,会对缓存存储器103中已缓存的数据进行替换,以淘汰掉那些访问量不高的数据,并进一步重新向缓存存储器103写入新数据。而这种数据替换行为将会使缓存存储器103的数据进行频繁更新。然而,如果缓存存储器103的写入次数有限,则频繁更新数据将会降低缓存存储器103的使用寿命。综上,向该类缓存存储器103写入数据时,应该保证写入的缓存缺失数据中有较高比例的访问量比较高的好数据,以避免该类缓存存储器103因频繁数据更新数据而降低使用寿命。
结合上述内容,在向该类缓存存储器103写入数据之前,需要从所有的缓存缺失数据中,确定可以写入该类缓存存储器103的缓存缺失数据,以提高缓存存储器103中好数据的比例,进而提高处理器101从缓存存储器103读取数据时的命中率。具体的缓存数据的确定方法详见下述各个实施例:
结合图1所示的系统架构示意图,图2是根据一示例性实施例提供的一种缓存数据的确定方法的流程图。如图2所示,本发明实施例提供的方法流程包括:
201:获取所读取的缓存缺失数据的数据标识,其中,数据标识用于区分不同的缓存缺失数据。
202:基于获取的缓存缺失数据的数据标识选择待定数据的数据标识。
203:将各个待定数据的数据标识进行分组记录。
204:统计各个待定数据的数据标识在各个分组中的出现次数。
在另一个实施例中,各个分组构成一个时间窗口序列,时间窗口序列包括至少两个时间窗口,每个时间窗口包括第一预设数值个第一存储单元,每两个时间窗口之间间隔第二预设数值个第二存储单元;
将各个待定数据的数据标识进行分组记录,包括:
将各个待定数据的数据标识按顺序分别记录至时间窗口序列的各个第一存储单元中;
统计各个待定数据的数据标识在各个分组中的出现次数,包括:
统计各个待定数据的数据标识在时间窗口序列的各个时间窗口中的出现次数。
在另一个实施例中,统计各个待定数据的数据标识在时间窗口序列的各个时间窗口中的出现次数,包括:
统计时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量;
根据记录同一数据标识的第一存储单元的数量,确定各个待定数据的数据标识在时间窗口序列的各个时间窗口中的出现次数。
在另一个实施例中,统计时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量,包括:
当时间窗口序列中的所有第一存储单元均被填满时,统计时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量。
在另一个实施例中,统计时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量之后,还包括:
清除时间窗口序列的各个第一存储单元中存储的数据标识,以通过时间窗口序列记录后续数据读取过程中选择的待定数据的数据标识。
205:根据出现次数选择目标待定数据,将目标待定数据确定为写入缓存存储器的缓存缺失数据。
在另一个实施例中,根据出现次数选择目标待定数据,将目标待定数据确定为写入缓存存储器的缓存缺失数据,包括:
选择出现次数不小于预设阈值的待定数据,将出现次数不小于预设阈值的待定数据作为目标待定数据,将目标待定数据确定为写入缓存存储器的缓存缺失数据。
在另一个实施例中,根据出现次数选择目标待定数据,将目标待定数据确定为写入缓存存储器的缓存缺失数据之后,还包括:
将目标待定数据的数据标识添加至预设白名单中,预设白名单包括第四预设数值个第三存储单元,每个第三存储单元可记录一个目标待定数据的数据标识;
当后续读取的缓存缺失数据命中预设白名单中的任一个数据标识对应的目标待定数据时,将命中的目标待定数据写入缓存存储器。
在另一个实施例中,将命中的目标待定数据写入缓存存储器,包括:
当缓存存储器的存储空间还未满时,根据命中的目标待定数据的数据大小直接将命中的目标待定数据写入缓存存储器;
当缓存存储器的存储空间已满时,根据预设缓存替换算法淘汰缓存存储器中数据大小的存储空间所存储的数据,将命中的目标待定数据写入被淘汰数据对应的存储位置。
本发明实施例提供的方法,通过将各个待定数据的数据标识进行分组记录,并统计各个待定数据的数据标识在各个分组中的出现次数后,根据出现次数选择目标待定数据,并将目标待定数据确定为写入缓存存储器的缓存缺失数据。由于出现次数多能够标识该缓存缺失数据的被读取次数比较多,因而可以从所读取的缓存缺失数据中选择出被读取次数多的好数据,从而能够提高缓存存储器中存储的好数据的比例,进而能够提高处理器后续从缓存存储器读取数 据时的命中率。
上述所有可选技术方案,可以采用任意结合形成本发明的可选实施例,在此不再一一赘述。
结合图1所示的系统架构示意图及图2所示实施例的内容,图3是根据一示例性实施例提供的一种缓存数据的确定方法的流程图。如图3所示,本发明实施例提供的方法流程包括:
301:读取缓存缺失数据。
结合图1所示的系统架构图,当处理器所需的数据未存储于内存或缓存存储器时,处理器需要从磁盘读取数据。其中,处理器从磁盘读取的数据即为缓存缺失数据。然而,在处理器读取的所有缓存缺失数据中,有些缓存缺失数据也可能是处理器需要次数比较多的好数据。针对于这些好数据,可以将其写入缓存存储器,以便于处理器后续直接从缓存存储器读取这些好数据,从而提高数据处理速度。因此,当处理器读取到缓存缺失数据后,触发处理器确定是否将其写入缓存存储器。
302:获取所读取的缓存缺失数据的数据标识,其中,数据标识用于区分不同的缓存缺失数据。
其中,数据标识可以用于区分不同的缓存缺失数据,即一个缓存缺失数据对应一个数据标识。
在本发明实施例中,为了节省确定缓存数据的过程中的系统资源,在确定缓存数据时,可以仅针对缓存数据的数据标识进行处理,因此,在读取到缓存缺失数据后,可以获取所读取的缓存缺失数据的数据标识。
其中,每个缓存缺失数据中可以携带其对应的数据标识。在此基础上,在获取所读取的缓存缺失数据的数据标识时,对于每个缓存缺失数据,可以对其进行解析,以得到其数据标识。
具体地,在获取所读取的缓存缺失数据的数据标识时,可以每读取一个缓 存缺失数据,获取一次该缓存缺失数据的数据标识;也可以读取一些缓存缺失数据后,获取一次这些缓存缺失数据的数据标识。另外,由于确定缓存缺失数据的过程可能为周期性的,如每一天确定一次,或每小时确定一次。因此,在获取缓存缺失数据的数据标识时,也可以在达到一个周期时,统一获取一次该周期中所读取的所有缓存缺失数据的数据标识。
303:基于获取的缓存缺失数据的数据标识选择待定数据的数据标识。
具体地,处理器在获取到所读取的缓存缺失数据的数据标识后,可以对读取到的每个缓存缺失数据的数据标识均进行处理,以根据处理结果确定是否将其写入缓存存储器,也可以仅针对所读取的部分缓存缺失数据的数据标识进行处理,以根据处理结果确定是否将其写入缓存存储器。在本发明实施例中,将需要进行处理的缓存缺失数据的数据标识定义为待定数据的数据标识,即本发明实施例中基于对待定数据的数据标识的处理结果,确定是否将该待定数据写入缓存存储器。
结合上述内容,处理器在获取到所读取的缓存缺失数据的数据标识后,需要基于获取的缓存缺失数据的数据标识选择待定数据的数据标识。其中,处理器可以每获取一个缓存缺失数据的数据标识,即确定一次是否将该缓存数据的数据标识确定为待定数据的数据标识;也可以获取一定数量的缓存缺失数据的数据标识后,基于该一定数量的缓存缺失数据的数据标识,从这些一定数量的缓存缺失数据的数据标识中选择一次待定数据的数据标识;还可以在获取缓存缺失数据的数据标识的时间达到一定时长时,基于该一定时长内获取的缓存缺失数据的数据标识,从该一定时长的缓存缺失数据的数据标识中选择一次待定数据的数据标识等。
具体地,处理器在基于获取到的缓存缺失数据的数据标识选择待定数据的数据标识时,包括但不限于有如下两种实现方式:
第一种实现方式:对所读取的缓存缺失数据的数据标识进行随机抽样,将抽中的各个缓存缺失数据的数据标识作为选择的待定数据的数据标识。
具体地,如果处理器每获取一个缓存缺失数据的数据标识,即确定一次是否将其写入缓存存储器,则会增加很多负载,进而降低处理器的数据处理速度。为了避免该种情况发生,可以对所读取的所有缓存缺失数据的数据标识进行随机抽样,将抽中的缓存缺失数据的数据标识作为待定数据的数据标识,并仅针对待定数据的数据标识进行处理,而针对于未抽中的其它缓存缺失数据的数据标识,则不进行处理,从而优化处理器的负载,以减小确定缓存缺失数据过程中对处理器处理其它数据时的处理速度的影响。
其中,在对所读取的缓存缺失数据的数据标识进行随机抽样,将抽中的各个缓存缺失数据的数据标识作为选择的待定数据的数据标识时,包括但不限于有以下几种方式:
第一种:每隔预设时长抽选一个所读取的缓存缺失数据的数据标识,将抽选的缓存缺失数据的数据标识作为一个待定数据的数据标识。
关于预设时长的具体数值,可以根据经验设定。例如,该预设时长可以为1分钟、2分钟、5分钟等。
具体地,如果在到达预设时长时,处理器获取到一个缓存缺失数据的数据标识,则可以将该缓存缺失数据的数据标识作为抽选的一个待定数据的数据标识;如果到达预设时长时,处理器没有获取到缓存缺失数据的数据标识,则处理器可以将下一次获取的缓存缺失数据的数据标识作为抽选的一个待定数据的数据标识。
例如,处理器每3分钟对所读取的缓存缺失数据的数据标识进行一次抽样,计时起点是14:20:15,如果在到达14:23:15时,处理器正获取到缓存缺失数据a的数据标识,则处理器将缓存缺失数据a的数据标识作为一个抽选的待定数据的数据标识。如果在到达14:23:15时,处理器并未获取缓存缺失数据的数据标识,而在14:23:20时,处理器获取到下一个缓存缺失数据b的数据标识,则处理器将缓存缺失数据b的数据标识作为抽选的一个待定数据的数据标识。
当然,如果达到预设时长时,处理器未获取到缓存缺失数据的数据标识, 则也可以确定这个时间周期未获取到待定数据的数据标识,并继续计时,当再次历时预设时长后,继续抽选待定数据的数据标识。
第二种:每获取第三预设数值个缓存缺失数据的数据标识后抽选一个所获取的缓存缺失数据的数据标识,将抽选的缓存缺失数据的数据标识作为一个待定数据的数据标识。
关于第三预设数值的个数,可以根据需要设定,例如,第三预设数值可以为10、30、50等。
具体地,处理器可以从计时起点开始对所获取的缓存缺失数据的数据标识的数量进行统计,并设置计数数量的初始值为0。在后续获取数据标识的过程中,处理器每获取一个缓存缺失数据的数据标识,则将计数数量加1,直至计数数量达到第三预设数值时,将该达到第三预设数值时获取的缓存缺失数据的数据标识作为抽选的一个待定数据的数据标识。
例如,如果第三预设数值为10,计数数量的初始值为0,当计数数量增加至10时,处理器正获取到缓存缺失数据c的数据标识,则将缓存缺失数据c的数据标识作为抽选的一个待定数据的数据标识。
第三种:在每次读取缓存缺失数据时,均生成一个对应的随机概率;确定每次生成的随机概率是否不大于预设概率;如果每次生成的随机概率不大于预设概率,则将生成随机概率时读取的缓存缺失数据的数据标识作为一个待定数据的数据标识。
具体地,预设概率为预先设置的对所读取的缓存缺失数据进行抽样的概率。关于该预设概率的具体数值,本发明实施例不作具体限定。例如,该预设概率可以为0.2、0.3等。
其中,在生成随机概率时,可以根据预先定义的随机函数实现。关于随机函数的具体内容,本发明实施例不作具体限定,保证根据该随机函数生成的数值在0-1之间即可。
对于一次读取缓存缺失数据的数据读取过程中,生成该次数据读取过程的 随机概率后,为了确定是否将该缓存缺失数据的数据标识作为一个待定数据的数据标识,可以将生成的概率与预设概率进行比对,当生成的概率不小于预设概率时,将生成该随机概率时读取的缓存缺失数据的数据标识作为一个待定数据的数据标识。
例如,如果读取缓存缺失数据d时生成的随机概率为0.13,预设概率为0.2,则由于读取缓存缺失数据d时生成的随机概率0.13小于该预设概率0.2,因此,将缓存缺失数据d的数据标识作为一个待定数据的数据标识。
通过对所读取的各个缓存缺失数据进行抽样,并对抽中的待定数据的数据标识进行处理,而对于未抽中的其它缓存缺失数据的数据标识,则不进行处理,可以避免因对所读取的所有缓存缺失数据的数据标识均进行该处理而消耗处理器过多负载,从而能够减小对数据处理速度的影响。
第二种实现方式:将获取的每个缓存缺失数据的数据标识均作为一个待定数据的数据标识。
在该种实现方式下,针对每次获取的缓存缺失数据的数据标识均会进行处理。通过该种方式能够确保对读取的每个缓存缺失数据均进行识别,使得后续确定的目标待定数据更准确。
304:将各个待定数据的数据标识进行分组记录,并统计各个待定数据的数据标识在各个分组中的出现次数。
在本发明实施例中,为了能够从所有的缓存缺失数据中选择访问次数比较高的好数据,通过将各个待定数据的数据标识进行分组记录,并进一步通过统计各个待定数据的数据标识在各个分组中出现的次数来实现。当某一个待定数据的数据标识在各个分组中的出现次数比较高时,说明该待定数据访问次数比较高,从而可以将该待定数据确定为好数据。另外,如果仅设置一个分组,则某一个待定数据的数据标识在该组数据中的出现次数高具有偶然性,而当设置多个分组时,当某一个待定数据的数据标识在各个分组中的出现次数比较多时,能够排除偶然原因导致的出现次数多,确保能够准确选择出访问次数比较 高的好数据。
其中,各个待定数据的数据标识在各个分组中的出现次数可以表示不同的意义。例如,对于某一个待定数据的数据标识,当各个分组中不记录同一待定数据的数据标识时,该待定数据的数据标识在各个分组中的出现次数表示记录该待定数据的数据标识的组数。当各个分组可以用于记录同一待定数据的数据标识时,该待定数据的数据标识在各个分组中的出现次数表示该待定数据的数据标识在各个分组中的出现次数之和。
关于各个分组的形式,可以有很多种,本发明实施例对此不作具体限定。当各个分组的形式不同时,将各个待定数据的数据标识进行分组记录及统计各个待定数据的数据标识在各个分组中的出现次数的方式也不同。例如,各个分组可以构成一个时间窗口序列,该时间窗口序列包括至少两个时间窗口,每个时间窗口包括第一预设数值个第一存储单元,每两个时间窗口之间间隔第二预设数值个第二存储单元。
其中,时间窗口序列包括的时间窗口的具体数量,本发明实施例不作具体限定。例如,时间窗口序列可以包括3个时间窗口,也可以包括5个时间窗口等。关于第一预设数值和第二预设数值的具体数值,本发明实施例同样不作限定。例如,第一预设数值可以为3、4、5等,第二预设数值可以为1或2等。另外,第一存储单元和第二存储单元可以为大小不同的存储单元,也可以为大小相同的存储单元,本发明实施例对此也不进行限定。
如图4所示,其示出了一种时间窗口序列的结构示意图。图4所示的时间窗口序列包括3个时间窗口,每个时间窗口可以存储3个待定数据的数据标识,每两个时间窗口之间的间隙与存储1个数据标识的第一存储单元的大小相同。
结合上述时间窗口序列的形式,在将各个待定数据的数据标识进行分组记录时,可以将各个待定数据的数据标识分别记录至该时间窗口序列的各个第一存储单元中。具体地,在将各个待定数据的数据标识分别记录至时间窗口序列的第一存储单元中时,可以按照时间窗口序列中各个时间窗口的排列顺序,按 顺序填充各个时间窗口的第一存储单元。
例如,结合图4,可以将获取到的第一个待定数据的数据标识记录至第一个时间窗口的第一个第一存储单元中,将获取到的第二个待定数据的数据标识记录至第一个时间窗口的第二个第一存储单元中,将获取到的第四个待定数据的数据标识记录至第二个时间窗口的第一个第一存储单元中,依此类推。
进一步地,结合上述时间窗口序列的形式,在统计各个待定数据的数据标识在各个分组中的出现次数时,可以通过统计各个待定数据的数据标识在时间窗口序列的各个时间窗口中的出现次数来实现。
具体地,在统计各个待定数据的数据标识在时间窗口序列的各个时间窗口中的出现次数时,包括但不限于通过如下步骤一和步骤二实现:
步骤一:统计时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量。
由于同一待定数据的数据标识相同,且一个数据标识占用一个第一存储单元,因此,可以根据记录同一数据标识的第一存储单元的数量,确定同一待定数据的数据标识在时间窗口序列中的出现次数。结合该内容,为了确定同一待定数据在时间窗口序列的各个时间窗口中的出现次数,需要先统计时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量。
具体地,每一个确定缓存缺失数据的过程可以以该时间窗口序列的所有第一存储单元是否被填满来界定。当该时间窗口序列的所有第一存储单元均被填满时,可以看作一个确定缓存缺失数据的过程结束;否则,一个确定缓存缺失数据的过程还未结束,仍需使用待定数据的数据标识填充未被占用的第一存储单元,直至所有第一存储单元均被填满时,该确定缓存缺失数据的过程结束。因此,在统计时间窗口序列的各个时间窗口中存储同一数据标识的第一存储单元的数量时,可以在时间窗口序列中的所有第一存储单元均被填满时实现。
当然,一个确定缓存缺失数据的过程也可以不仅限于为所有第一存储单元均被填满时结束,也可以为时间窗口序列第一存储单元的占用率达到预设阈值 时认为一个确定缓存缺失数据的过程结束,本发明实施例对一个确定缓存缺失数据的过程的界定标准不进行限定。在此种情况下,当时间窗口序列第一存储单元的占用率达到预设阈值时,统计一次时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量。
步骤二:根据记录同一数据标识的第一存储单元的数量,确定各个待定数据的数据标识在时间窗口序列的各个时间窗口中的出现次数。
其中,记录同一数据标识的第一存储单元的数量即为该待定数据的数据标识在时间窗口序列中的出现次数。例如,如果记录待定数据a的数据标识的第一存储单元的数量为3,则可以表明待定数据a在时间窗口序列的各个时间窗口中的出现次数为3次。
当某一个待定数据的被访问次数比较高时,该待定数据的数据标识将会有很多次机会被记录于时间窗口序列的各个第一存储单元中,因此,可以根据时间窗口序列中各个待定数据的数据标识的出现次数选择出访问次数比较高的好数据。结合图4,由于待定数据a的数据标识a在时间窗口序列中出现的次数最多,因此,可以确定待定数据a为访问次数比较高的好数据。
可选地,结合上述对一个确定缓存缺失数据的过程的描述,当时间窗口序列被填满后,可知一个确定缓存缺失数据的过程结束。此时,为了确保后续还可以使用该时间窗口序列存储后续选择的待定数据的数据标识,当统计时间窗口序列中存储同一数据标识的第一存储单元的数量之后,还可以及时清除时间窗口序列的各个第一存储单元中存储的数据标识。其中,在进行清除时,可以删除所有第一存储单元中写入的待定数据的数据标识。
通过上述方式,能够从所有的待定数据中识别出访问量高的好数据,以便后续可以将这些好数据写入缓存存储器,而对于其它访问量不高的缓存缺失数据则不写入缓存存储器,从而能够提高缓存存储器中好数据的比例。
结合上述时间窗口序列的内容,如果时间窗口序列中仅设置一个时间窗口,则某一个待定数据的数据标识在该时间窗口中的出现次数高具有偶然性, 而当设置多个时间窗口时,当某一个待定数据的数据标识在各个时间窗口中的出现次数比较多时,能够排除偶然原因导致的出现次数多,确保能够准确选择出访问次数比较高的好数据。
305:根据出现次数选择目标待定数据,将目标待定数据确定为写入缓存存储器的缓存缺失数据。
各个待定数据包括访问量高的好待定数据,也包括访问量居中的一般待定数据和访问量低的差待定数据。然而,为了可以将好待定数据确定为写入缓存存储器的缓存缺失数据,可以根据各个待定数据在各个分组中的出现次数选择目标待定数据,将目标待定数据确定为写入缓存存储器的缓存缺失数据。
其中,在根据出现次数选择目标待定数据,将目标待定数据确定为写入缓存存储器的缓存缺失数据时,可以选择出现次数不小于预设阈值的待定数据作为目标的待定数据,并将目标待定数据确定为写入缓存存储器的缓存缺失数据。
具体地,预设阈值可以根据时间窗口序列中的时间窗口的数量以及每个时间窗口中的各个第一存储单元是否记录同一数据标识来确定。如果每个时间窗口中的各个第一存储单元不记录同一数据标识,则预设阈值不大于时间窗口的数量。例如,当时间窗口的数量为5,且每个时间窗口中的各个第一存储单元不记录同一数据标识,则预设阈值可以为3、4、5等。如果每个时间窗口中的各个第一存储单元可以记录同一数据标识,则预设阈值可能大于时间窗口的数量。例如,当时间窗口的数量为5,且每个时间窗口中的各个第一存储单元可以记录同一数据标识,则预设阈值可以为4、5、7、9等。
通过将待定数据的数据标识记录至第一存储单元中,并根据数据标识在第一存储单元中的出现次数确定目标待定数据时,由于数据标识仅需占用很小的存储空间,因此,可以在不占用处理器很多资源的情况下,比较容易地确定需要写入缓存存储器的缓存缺失数据。另外,由于统计各个待定数据的数据标识在各个分组中的出现次数的过程需要一定的时间才能完成,因此,确定写入缓 存存储器的缓存缺失数据的过程需要一定的时间,相对于每次读取到缓存缺失数据时则将其写入缓存存储器的方式,能够减小缓存存储器的数据更新频率,因而,能够提高缓存存储器的寿命。
306:将目标待定数据的数据标识添加至预设白名单中,当后续读取的缓存缺失数据命中预设白名单中的任一个数据标识对应的目标待定数据时,将命中的目标待定数据写入缓存存储器,其中,预设白名单包括第四预设数值个第三存储单元,每个第三存储单元可记录一个目标待定数据的数据标识。
该步骤为可选步骤。当通过步骤301至步骤305确定可以写入缓存存储器的目标待定数据后,可以通过该可选步骤可以将后续数据读取过程中命中的目标待定数据写入缓存存储器。
其中,在将命中的目标待定数据写入缓存存储器时,结合缓存存储器当前的存储情况,包括但不限于有如下两种情况:
第一种情况:缓存存储器的存储空间还未满。
针对该种情况,可以根据命中的目标待定数据的数据大小直接将命中的目标待定数据写入缓存存储器。例如,如果命中的目标待定数据的数据大小为20k,则可以将该命中的目标待定数据写入缓存存储器还未存储数据的20k存储空间中。
第二种情况:缓存存储器的存储空间已满。
针对该种情况,可以根据预设缓存替换算法淘汰缓存存储器中数据大小的存储空间所存储的数据,将命中的目标待定数据写入被淘汰数据对应的存储位置。
其中,该预设缓存替换算法可以为FIFO(First Input First Output,先进先出)算法或LRU(Least Recently Used)算法等。当预设缓存替换算法为FIFO算法时,可以根据命中的目标待定数据的数据大小,淘汰掉缓存存储器中最先存储的该数据大小的存储空间所存储的数据,然后将命中的目标待定数据写入被淘汰数据对应的存储位置。当预设缓存替换算法为LRU算法时,可以根据 命中的目标待定数据的数据大小,淘汰掉缓存存储器中最近最少使用的该数据大小的存储空间所存储的数据,然后将该命中的目标待定数据写入被淘汰数据对应的存储位置。
当然,预设缓存替换算法还可以为其它缓存替换算法,本发明实施例不对预设缓存替换算法的具体内容进行限定。
通过该种方式将命中的目标待定数据写入缓存存储器时,可以不用专门消耗处理器资源来将目标待定数据写入缓存存储器,使得向缓存存储器写入目标待定数据的过程对处理器的数据处理速度的影响较小。
进一步地,当预设白名单被填充满后,也可以根据FIFO算法或LRU算法淘汰掉预设白名单中的部分数据,并将后续确定的目标待定数据的数据标识添加至预设白名单中。另外,预设白名单还可以根据在一个确定缓存缺失数据的过程中获取到的数据标识数量对各个数据标识进行排序。在此基础上,在淘汰预设白名单中的部分数据标识时,还可以根据各个数据标识的数量实现。例如,可以淘汰掉预设白名单中数据标识数量最少的数据标识。
当然,在确定目标待定数据后,也可以不通过该可选步骤而直接将确定的目标待定数据写入缓存存储器,本发明实施例对此不作具体限定。
本发明实施例提供的方法,通过将各个待定数据的数据标识进行分组记录,并统计各个待定数据的数据标识在各个分组中的出现次数后,根据出现次数选择目标待定数据,并将目标待定数据确定为写入缓存存储器的缓存缺失数据。由于出现次数多能够标识该缓存缺失数据的被读取次数比较多,因而可以从所读取的缓存缺失数据中选择出被读取次数多的好数据,从而能够提高缓存存储器中存储的好数据的比例,进而能够提高处理器后续从缓存存储器读取数据时的命中率。
参见图5,本发明实施例提供了一种缓存数据的确定装置,该缓存数据的确定装置可以用于执行上述图2或图3所对应实施例提供的缓存数据的确定方 法。如图5所示,该缓存数据的确定装置包括:
获取模块501,用于获取所读取的缓存缺失数据的数据标识,数据标识用于区分不同的缓存缺失数据;
选择模块502,用于基于获取的缓存缺失数据的数据标识选择待定数据的数据标识;
记录模块503,用于将各个待定数据的数据标识进行分组记录;
统计模块504,用于统计各个待定数据的数据标识在各个分组中的出现次数;
确定模块505,用于根据出现次数选择目标待定数据,将目标待定数据确定为写入缓存存储器的缓存缺失数据。
在另一个实施例中,各个分组构成一个时间窗口序列,时间窗口序列包括至少两个时间窗口,每个时间窗口包括第一预设数值个第一存储单元,每两个时间窗口之间间隔第二预设数值个第二存储单元;
记录模块503,用于将各个待定数据的数据标识按顺序分别记录至时间窗口序列的各个第一存储单元中;
统计模块504,用于统计各个待定数据的数据标识在时间窗口序列的各个时间窗口中的出现次数。
在另一个实施例中,统计模块504包括:
统计单元,用于统计时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量;
确定单元,用于根据记录同一数据标识的第一存储单元的数量,确定各个待定数据的数据标识在时间窗口序列的各个时间窗口中的出现次数。
在另一个实施例中,统计单元,用于当时间窗口序列中的所有第一存储单元均被填满时,统计时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量。
在另一个实施例中,统计模块504还包括:
清除单元,用于清除时间窗口序列的各个第一存储单元中存储的数据标识,以通过时间窗口序列存储后续数据读取过程中选择的待定数据的数据标识。
在另一个实施例中,确定模块505,用于选择出现次数不小于预设阈值的待定数据,将出现次数不小于预设阈值的待定数据作为目标待定数据,将目标待定数据确定为写入缓存存储器的缓存缺失数据。
在另一个实施例中,装置还包括:
添加模块,用于将目标待定数据的数据标识添加至预设白名单中,预设白名单包括第四预设数值个第三存储单元,每个第三存储单元可记录一个目标待定数据的数据标识;
写入模块,用于当后续读取的缓存缺失数据命中预设白名单中的任一个数据标识对应的目标待定数据时,将命中的目标待定数据写入缓存存储器。
在另一个实施例中,写入模块用于:
当缓存存储器的存储空间还未满时,根据命中的目标待定数据的数据大小直接将命中的目标待定数据写入缓存存储器;
当缓存存储器的存储空间已满时,根据预设缓存替换算法淘汰缓存存储器中数据大小的存储空间所存储的数据,将命中的目标待定数据写入被淘汰数据对应的存储位置。
本发明实施例提供的装置,通过将各个待定数据的数据标识进行分组记录,并统计各个待定数据的数据标识在各个分组中的出现次数后,根据出现次数选择目标待定数据,并将目标待定数据确定为写入缓存存储器的缓存缺失数据。由于出现次数多能够标识该缓存缺失数据的被读取次数比较多,因而可以从所读取的缓存缺失数据中选择出被读取次数多的好数据,从而能够提高缓存存储器中存储的好数据的比例,进而能够提高处理器后续从缓存存储器读取数据时的命中率。
上述所有可选技术方案,可以采用任意结合形成本发明的可选实施例,在 此不再一一赘述。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
参见图6,本发明实施例提供了一种缓存数据的确定装置,该缓存数据的确定装置可以用于执行上述图2或图3所对应实施例提供的缓存数据的确定方法。如图6所示,该缓存数据的确定装置包括处理器601、内存604、缓存存储器602和磁盘603。其中:
磁盘603用于存储缓存缺失数据;
缓存存储器602用于存储缓存命中数据;
内存604用于存储处理器601从缓存存储器602或磁盘603读取的数据;
处理器601,用于:
获取所读取的缓存缺失数据的数据标识,数据标识用于区分不同的缓存缺失数据;
基于获取的缓存缺失数据的数据标识选择待定数据的数据标识;
将各个待定数据的数据标识进行分组记录;
统计各个待定数据的数据标识在各个分组中的出现次数;
根据出现次数选择目标待定数据,将目标待定数据确定为写入缓存存储器的缓存缺失数据。
在另一个实施例中,各个分组构成一个时间窗口序列,时间窗口序列包括至少两个时间窗口,每个时间窗口包括第一预设数值个第一存储单元,每两个时间窗口之间间隔第二预设数值个第二存储单元;
处理器601还用于:
将各个待定数据的数据标识按顺序分别记录至时间窗口序列的各个第一存储单元中;
统计各个待定数据的数据标识在时间窗口序列的各个时间窗口中的出现 次数。
在另一个实施例中,处理器601还用于:
统计时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量;
根据记录同一数据标识的第一存储单元的数量,确定各个待定数据的数据标识在时间窗口序列的各个时间窗口中的出现次数。
在另一个实施例中,处理器601还用于:
当时间窗口序列中的所有第一存储单元均被填满时,统计时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量。
在另一个实施例中,处理器601还用于:
清除时间窗口序列的各个第一存储单元中存储的数据标识,以通过时间窗口序列记录后续数据读取过程中选择的待定数据的数据标识。
在另一个实施例中,处理器601还用于:
选择出现次数不小于预设阈值的待定数据,将出现次数不小于预设阈值的待定数据作为目标待定数据,将目标待定数据确定为写入缓存存储器的缓存缺失数据。
在另一个实施例中,处理器601还用于:
将目标待定数据的数据标识添加至预设白名单中,预设白名单包括第四预设数值个第三存储单元,每个第三存储单元可记录一个目标待定数据的数据标识;
当后续读取的缓存缺失数据命中预设白名单中的任一个数据标识对应的目标待定数据时,将命中的目标待定数据写入缓存存储器。
在另一个实施例中,处理器601还用于:
当缓存存储器的存储空间还未满时,根据命中的目标待定数据的数据大小直接将命中的目标待定数据写入缓存存储器;
当缓存存储器的存储空间已满时,根据预设缓存替换算法淘汰缓存存储器 中数据大小的存储空间所存储的数据,将命中的目标待定数据写入被淘汰数据对应的存储位置。
上述所有可选技术方案,可以采用任意结合形成本发明的可选实施例,在此不再一一赘述。
本发明实施例提供的装置,通过将各个待定数据的数据标识进行分组记录,并统计各个待定数据的数据标识在各个分组中的出现次数后,根据出现次数选择目标待定数据,并将目标待定数据确定为写入缓存存储器的缓存缺失数据。由于出现次数多能够标识该缓存缺失数据的被读取次数比较多,因而可以从所读取的缓存缺失数据中选择出被读取次数多的好数据,从而能够提高缓存存储器中存储的好数据的比例,进而能够提高处理器后续从缓存存储器读取数据时的命中率。
需要说明的是:上述实施例提供的缓存数据的确定装置在确定缓存数据时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的缓存数据的确定装置与缓存数据的确定方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (16)

  1. 一种缓存数据的确定方法,其特征在于,所述方法包括:
    获取所读取的缓存缺失数据的数据标识,所述数据标识用于区分不同的缓存缺失数据;
    基于获取的缓存缺失数据的数据标识选择待定数据的数据标识;
    将各个待定数据的数据标识进行分组记录;
    统计所述各个待定数据的数据标识在各个分组中的出现次数;
    根据所述出现次数选择目标待定数据,将所述目标待定数据确定为写入缓存存储器的缓存缺失数据。
  2. 根据权利要求1所述的方法,其特征在于,所述各个分组构成一个时间窗口序列,所述时间窗口序列包括至少两个时间窗口,每个时间窗口包括第一预设数值个第一存储单元,每两个时间窗口之间间隔第二预设数值个第二存储单元;
    所述将各个待定数据的数据标识进行分组记录,包括:
    将各个待定数据的数据标识按顺序分别记录至所述时间窗口序列的各个第一存储单元中;
    所述统计所述各个待定数据的数据标识在各个分组中的出现次数,包括:
    统计所述各个待定数据的数据标识在所述时间窗口序列的各个时间窗口中的出现次数。
  3. 根据权利要求2所述的方法,其特征在于,所述统计所述各个待定数据的数据标识在所述时间窗口序列的各个时间窗口中的出现次数,包括:
    统计所述时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量;
    根据所述记录同一数据标识的第一存储单元的数量,确定所述各个待定数据的数据标识在所述时间窗口序列的各个时间窗口中的出现次数。
  4. 根据权利要求3所述的方法,其特征在于,所述统计所述时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量,包括:
    当所述时间窗口序列中的所有第一存储单元均被填满时,统计所述时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量。
  5. 根据权利要求3所述的方法,其特征在于,所述统计所述时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量之后,还包括:
    清除所述时间窗口序列的各个第一存储单元中存储的数据标识,以通过所述时间窗口序列记录后续数据读取过程中选择的待定数据的数据标识。
  6. 根据权利要求1至5中任一权利要求所述的方法,其特征在于,所述根据所述出现次数选择目标待定数据,将所述目标待定数据确定为写入缓存存储器的缓存缺失数据,包括:
    选择出现次数不小于预设阈值的待定数据,将所述出现次数不小于预设阈值的待定数据作为目标待定数据,将所述目标待定数据确定为写入缓存存储器的缓存缺失数据。
  7. 根据权利要求1至6中任一权利要求所述的方法,其特征在于,所述根据所述出现次数选择目标待定数据,将所述目标待定数据确定为写入缓存存储器的缓存缺失数据之后,还包括:
    将所述目标待定数据的数据标识添加至预设白名单中,所述预设白名单包括第四预设数值个第三存储单元,每个第三存储单元可记录一个目标待定数据的数据标识;
    当后续读取的缓存缺失数据命中所述预设白名单中的任一个数据标识对应的目标待定数据时,将命中的目标待定数据写入所述缓存存储器。
  8. 根据权利要求7所述的方法,其特征在于,所述将命中的目标待定数据写入所述缓存存储器,包括:
    当所述缓存存储器的存储空间还未满时,根据所述命中的目标待定数据的 数据大小直接将所述命中的目标待定数据写入所述缓存存储器;
    当所述缓存存储器的存储空间已满时,根据预设缓存替换算法淘汰所述缓存存储器中所述数据大小的存储空间所存储的数据,将所述命中的目标待定数据写入被淘汰数据对应的存储位置。
  9. 一种缓存数据的确定装置,其特征在于,所述装置包括:
    获取模块,用于获取所读取的缓存缺失数据的数据标识,所述数据标识用于区分不同的缓存缺失数据;
    选择模块,用于基于获取的缓存缺失数据的数据标识选择待定数据的数据标识;
    记录模块,用于将各个待定数据的数据标识进行分组记录;
    统计模块,用于统计所述各个待定数据的数据标识在各个分组中的出现次数;
    确定模块,用于根据所述出现次数选择目标待定数据,将所述目标待定数据确定为写入缓存存储器的缓存缺失数据。
  10. 根据权利要求9所述的装置,其特征在于,所述各个分组构成一个时间窗口序列,所述时间窗口序列包括至少两个时间窗口,每个时间窗口包括第一预设数值个第一存储单元,每两个时间窗口之间间隔第二预设数值个第二存储单元;
    所述记录模块,用于将各个待定数据的数据标识按顺序分别记录至所述时间窗口序列的各个第一存储单元中;
    所述统计模块,用于统计所述各个待定数据的数据标识在所述时间窗口序列的各个时间窗口中的出现次数。
  11. 根据权利要求10所述的装置,其特征在于,所述统计模块包括:
    统计单元,用于统计所述时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量;
    确定单元,用于根据所述记录同一数据标识的第一存储单元的数量,确定 所述各个待定数据的数据标识在所述时间窗口序列的各个时间窗口中的出现次数。
  12. 根据权利要求11所述的装置,其特征在于,所述统计单元,用于当所述时间窗口序列中的所有第一存储单元均被填满时,统计所述时间窗口序列的各个时间窗口中记录同一数据标识的第一存储单元的数量。
  13. 根据权利要求11所述的装置,其特征在于,所述统计模块还包括:
    清除单元,用于清除所述时间窗口序列的各个第一存储单元中存储的数据标识,以通过所述时间窗口序列记录后续数据读取过程中选择的待定数据的数据标识。
  14. 根据权利要求9至13中任一权利要求所述的装置,其特征在于,所述确定模块,用于选择出现次数不小于预设阈值的待定数据,将所述出现次数不小于预设阈值的待定数据作为目标待定数据,将所述目标待定数据确定为写入缓存存储器的缓存缺失数据。
  15. 根据权利要求9至14中任一权利要求所述的装置,其特征在于,所述装置还包括:
    添加模块,用于将所述目标待定数据的数据标识添加至预设白名单中,所述预设白名单包括第四预设数值个第三存储单元,每个第三存储单元可记录一个目标待定数据的数据标识;
    写入模块,用于当后续读取的缓存缺失数据命中所述预设白名单中的任一个数据标识对应的目标待定数据时,将命中的目标待定数据写入所述缓存存储器。
  16. 根据权利要求15所述的装置,其特征在于,所述写入模块用于:
    当所述缓存存储器的存储空间还未满时,根据所述命中的目标待定数据的数据大小直接将所述命中的目标待定数据写入所述缓存存储器;
    当所述缓存存储器的存储空间已满时,根据预设缓存替换算法淘汰所述缓存存储器中所述数据大小的存储空间所存储的数据,将所述命中的目标待定数 据写入被淘汰数据对应的存储位置。
PCT/CN2015/095608 2015-03-11 2015-11-26 缓存数据的确定方法及装置 WO2016141735A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP15884413.4A EP3252609A4 (en) 2015-03-11 2015-11-26 Cache data determination method and device
US15/699,406 US20170371807A1 (en) 2015-03-11 2017-09-08 Cache data determining method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510105461.XA CN104699422B (zh) 2015-03-11 2015-03-11 缓存数据的确定方法及装置
CN201510105461.X 2015-03-11

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/699,406 Continuation US20170371807A1 (en) 2015-03-11 2017-09-08 Cache data determining method and apparatus

Publications (1)

Publication Number Publication Date
WO2016141735A1 true WO2016141735A1 (zh) 2016-09-15

Family

ID=53346603

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/095608 WO2016141735A1 (zh) 2015-03-11 2015-11-26 缓存数据的确定方法及装置

Country Status (4)

Country Link
US (1) US20170371807A1 (zh)
EP (1) EP3252609A4 (zh)
CN (1) CN104699422B (zh)
WO (1) WO2016141735A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112511848A (zh) * 2020-11-09 2021-03-16 网宿科技股份有限公司 直播方法、服务端及计算机可读存储介质

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104699422B (zh) * 2015-03-11 2018-03-13 华为技术有限公司 缓存数据的确定方法及装置
KR102540765B1 (ko) * 2016-09-07 2023-06-08 에스케이하이닉스 주식회사 메모리 장치 및 이를 포함하는 메모리 시스템
CN109189822B (zh) * 2018-08-08 2022-01-14 北京大数据研究院 数据处理方法及装置
CN109491873B (zh) * 2018-11-05 2022-08-02 阿里巴巴(中国)有限公司 缓存监控方法、介质、装置和计算设备
CN109857680B (zh) * 2018-11-21 2020-09-11 杭州电子科技大学 一种基于动态页面权重的lru闪存缓存管理方法
CN111506524B (zh) * 2019-01-31 2024-01-30 华为云计算技术有限公司 一种数据库中淘汰、预加载数据页的方法、装置
CN112565870B (zh) 2019-09-26 2021-09-14 北京字节跳动网络技术有限公司 内容的缓存和读取方法、客户端及存储介质
CN112560118A (zh) * 2019-09-26 2021-03-26 杭州中天微系统有限公司 用于提供可重置的标识符的配置装置和配置方法
CN110716885B (zh) * 2019-10-23 2022-02-18 北京字节跳动网络技术有限公司 数据管理方法、装置、电子设备和存储介质
CN111708720A (zh) * 2020-08-20 2020-09-25 北京思明启创科技有限公司 一种数据缓存方法、装置、设备及介质
CN112163176A (zh) * 2020-11-02 2021-01-01 北京城市网邻信息技术有限公司 数据存储方法、装置、电子设备和计算机可读介质
CN114327672B (zh) * 2021-12-14 2024-04-05 中国平安财产保险股份有限公司 数据缓存时间设置方法、装置、计算机设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6507893B2 (en) * 2001-01-26 2003-01-14 Dell Products, L.P. System and method for time window access frequency based caching for memory controllers
CN1831824A (zh) * 2006-04-04 2006-09-13 浙江大学 缓存数据库数据组织方法
CN101388110A (zh) * 2008-10-31 2009-03-18 深圳市同洲电子股份有限公司 数据快速读取方法及装置
CN103177005A (zh) * 2011-12-21 2013-06-26 深圳市腾讯计算机系统有限公司 一种数据访问的处理方法和系统
CN103514106A (zh) * 2012-06-20 2014-01-15 北京神州泰岳软件股份有限公司 一种数据缓存方法
CN103631528A (zh) * 2012-08-21 2014-03-12 苏州捷泰科信息技术有限公司 用固态硬盘作为缓存器的读写方法、系统及读写控制器
CN104699422A (zh) * 2015-03-11 2015-06-10 华为技术有限公司 缓存数据的确定方法及装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6192450B1 (en) * 1998-02-03 2001-02-20 International Business Machines Corporation Destage of data for write cache
JP4933211B2 (ja) * 2006-10-10 2012-05-16 株式会社日立製作所 ストレージ装置、制御装置及び制御方法
CN102722448B (zh) * 2011-03-31 2015-07-22 国际商业机器公司 管理高速存储器的方法和装置
JP5175953B2 (ja) * 2011-06-02 2013-04-03 株式会社東芝 情報処理装置およびキャッシュ制御方法
BR112012023140A8 (pt) * 2011-09-15 2020-02-27 Ericsson Telecomunicacoes Sa método e sistema de substituição de cache.
CN103186350B (zh) * 2011-12-31 2016-03-30 北京快网科技有限公司 混合存储系统及热点数据块的迁移方法
US8966204B2 (en) * 2012-02-29 2015-02-24 Hewlett-Packard Development Company, L.P. Data migration between memory locations
US9021203B2 (en) * 2012-05-07 2015-04-28 International Business Machines Corporation Enhancing tiering storage performance
US20130339620A1 (en) * 2012-06-15 2013-12-19 Futurewei Technololgies, Inc. Providing Cache Replacement Notice Using a Cache Miss Request
US9135173B2 (en) * 2013-01-22 2015-09-15 International Business Machines Corporation Thinly provisioned flash cache with shared storage pool
WO2014209234A1 (en) * 2013-06-26 2014-12-31 Agency For Science, Technology And Research Method and apparatus for hot data region optimized dynamic management

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6507893B2 (en) * 2001-01-26 2003-01-14 Dell Products, L.P. System and method for time window access frequency based caching for memory controllers
CN1831824A (zh) * 2006-04-04 2006-09-13 浙江大学 缓存数据库数据组织方法
CN101388110A (zh) * 2008-10-31 2009-03-18 深圳市同洲电子股份有限公司 数据快速读取方法及装置
CN103177005A (zh) * 2011-12-21 2013-06-26 深圳市腾讯计算机系统有限公司 一种数据访问的处理方法和系统
CN103514106A (zh) * 2012-06-20 2014-01-15 北京神州泰岳软件股份有限公司 一种数据缓存方法
CN103631528A (zh) * 2012-08-21 2014-03-12 苏州捷泰科信息技术有限公司 用固态硬盘作为缓存器的读写方法、系统及读写控制器
CN104699422A (zh) * 2015-03-11 2015-06-10 华为技术有限公司 缓存数据的确定方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3252609A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112511848A (zh) * 2020-11-09 2021-03-16 网宿科技股份有限公司 直播方法、服务端及计算机可读存储介质

Also Published As

Publication number Publication date
CN104699422B (zh) 2018-03-13
CN104699422A (zh) 2015-06-10
US20170371807A1 (en) 2017-12-28
EP3252609A4 (en) 2018-01-17
EP3252609A1 (en) 2017-12-06

Similar Documents

Publication Publication Date Title
WO2016141735A1 (zh) 缓存数据的确定方法及装置
WO2021120789A1 (zh) 数据写入方法、装置及存储服务器和计算机可读存储介质
US10133679B2 (en) Read cache management method and apparatus based on solid state drive
US10698832B2 (en) Method of using memory allocation to address hot and cold data
CN106547476B (zh) 用于数据存储系统的方法和装置
CN104115134B (zh) 用于管理对复合数据存储设备进行访问的方法和系统
CN110555001B (zh) 数据处理方法、装置、终端及介质
EP3388935B1 (en) Cache management method, cache controller and computer system
CN104503703B (zh) 缓存的处理方法和装置
CN107122130B (zh) 一种数据重删方法及装置
US20160246724A1 (en) Cache controller for non-volatile memory
CN107491523A (zh) 存储数据对象的方法及装置
US10198180B2 (en) Method and apparatus for managing storage device
JP6167646B2 (ja) 情報処理装置、制御回路、制御プログラム、および制御方法
US20180081563A1 (en) Method and apparatus for reducing memory access latency
US20150212744A1 (en) Method and system of eviction stage population of a flash memory cache of a multilayer cache system
WO2017020735A1 (zh) 一种数据处理方法、备份服务器及存储系统
CN105512051A (zh) 一种自学习型智能固态硬盘缓存管理方法和装置
WO2020063355A1 (zh) 数据块的缓存方法、装置、计算机设备及计算机可读存储介质
CN106649143B (zh) 一种访问缓存的方法、装置及电子设备
CN111913913A (zh) 访问请求的处理方法和装置
US10503651B2 (en) Media cache band cleaning
US20040039869A1 (en) Information processing system
CN115495394A (zh) 数据预取方法和数据预取装置
CN110825652B (zh) 淘汰磁盘块上的缓存数据的方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15884413

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2015884413

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE