CN106575263A - Cache line compaction of compressed data segments - Google Patents

Cache line compaction of compressed data segments Download PDF

Info

Publication number
CN106575263A
CN106575263A CN201580041874.4A CN201580041874A CN106575263A CN 106575263 A CN106575263 A CN 106575263A CN 201580041874 A CN201580041874 A CN 201580041874A CN 106575263 A CN106575263 A CN 106575263A
Authority
CN
China
Prior art keywords
data
data segments
computing device
address
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201580041874.4A
Other languages
Chinese (zh)
Inventor
A·E·特纳
G·帕奇拉腊斯
B·雷赫利克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN106575263A publication Critical patent/CN106575263A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0886Variable-length word access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1021Hit rate improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/40Specific encoding of data in memory or cache
    • G06F2212/401Compressed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/608Details relating to cache mapping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Methods, devices, and non-transitory process-readable storage media for compacting data within cache lines of a cache. An aspect method may include identifying, by a processor of the computing device, a base address (e.g., a physical or virtual cache address) for a first data segment, identifying a data size (e.g., based on a compression ratio) for the first data segment, obtaining a base offset based on the identified data size and the base address of the first data segment, and calculating an offset address by offsetting the base address with the obtained base offset, wherein the calculated offset address is associated with a second data segment. In some aspects, the method may include identifying a parity value for the first data segment based on the base address and obtaining the base offset by performing a lookup on a stored table using the identified data size and identified parity value.

Description

The cache line compacting of compressed data section
Background technology
Lossless compressiong can be by the less size of data segments boil down to of configurable size.For example, lossless compress Algorithm can utilize different compression ratios (for example, 4:1、4:2、4:3、4:4) data segments of 256 bytes are compressed into various big The compressed data section of little (for example, 64 bytes, 128 bytes, 192 bytes, 256 bytes).Conventional technique can be to data Section (or data block) is processed so that the data segments to be reduced to the sub-fraction of its original size, so as to contribute to reducing The bandwidth consumed during data read/write between memory cell and stock number.
Such compressed data can be stored in the cache line of cache (for example, L3 caches).Generally, The data segments (or compressed data section) of compressed format are stored in and the data segments of uncompressed form (or source number According to section) the corresponding cache line of physical address in.(for example, each cache line can have the size of setting 128 bytes) or capacity, for any loading operation must be filled with the size or capacity of the setting, but regardless of required data Size how.For example, when the section of 64 bytes or 192 bytes is loaded in the cache line of 128 bytes (B), may Computing device is needed to read and store in cache line other potential unwanted 64 byte, it is slow at a high speed to be filled up completely with Deposit 128 capable bytes.Therefore, the compressed data in cache line (or multiple cache lines) is loaded into and source When data segments (or cache line) are not formed objects, there may be not comprising useful data in physical address space " empty (hole) ".This undesirable loading to unwanted, disabled data is referred to as " cross and take ", and can cause Memory cell (for example, Double Data Rate (DDR) random access memory (RAM) etc.) suboptimum and cache between Resource and bandwidth are used.For example, partial cache-line can be used up in cavity, so as to both increased asked bandwidth of memory, The space in cache is wasted again, and this causes the inhomogeneities that the resource in computing device is used.Due to compression scheme (example Such as, Burrows-Wheeler conversion etc.) many compressed data sections of cache line can be yielded less than, therefore can go out Now significant mistake takes, and so as to cause live load that is greatly inefficient and increasing, this can reduce the benefit using compress technique. Additionally, compressed block size can be less than optimum DDR minimum access length (MAL), so as to cause worse performance.
The content of the invention
Various aspects provide the method for the data in the cache line for the cache for being compacted computing device, set Standby and non-transitory processes readable storage medium storing program for executing.One side method can be with:Base address of the mark for the first data segments;Mark Know the size of data for first data segments;The size of data for being identified based on first data segments and described Base address is offset substantially to obtain;And enter line displacement to the base address to calculate partially by using the basic skew for being obtained Address is moved, wherein, the offset address for being calculated can be associated with the second data segments.
In certain aspects, based on the size of data for being identified and the base address of first data segments obtaining The basic skew by the computing device software of the computing device or can be coupled to the described of the computing device Manage one of special circuit of device to perform, and the base address is carried out partially by using the basic skew for being obtained In-migration calculates the offset address and by the computing device software of the computing device or can be coupled to the calculating One of described special circuit of the processor of equipment is performing.
In certain aspects, the base address can be physical address or virtual address.In certain aspects, identified Size of data can be identified based on the compression ratio being associated with first data segments.In certain aspects, it is described Compression ratio can be in the following:4:1 compression ratio, 4:2 compression ratios, 4:3 compression ratios or 4:4 compression ratios.
In certain aspects, based on the base address of the compression ratio for being identified and first data segments to obtain Stating basic skew can include:The parity values for first data segments are identified based on the base address, and The basic skew is obtained using identified compression ratio and the parity values for being identified.
In certain aspects, using identified compression ratio and the parity values that identified obtaining the basic skew Can include:Stored table is performed lookup to obtain by using the compression ratio for being identified and the parity values for being identified The basic skew.In certain aspects, during the parity values indicate the base address of first data segments One can be odd number or even number.In certain aspects, the parity values can be based on first data segments Two positions in the base address.
In certain aspects, based on the size of data for being identified and the base address of first data segments obtaining The basic skew can include:Obtain for first data segments the first basic skew and the second basic skew and First size of data and the second size of data, and enter line displacement to the base address by using the basic skew for being obtained Calculating the offset address can include:Calculate for the first offset address of first size of data and for described the Second offset address of two size of data.
In certain aspects, methods described is additionally may included at calculated offset address and stores first data field Section, this can include:First data segments are read at the base address as uncompressed data;With the number for being identified First data segments are compressed according to size;And the first compressed data field is stored at the offset address for being calculated Section.
In certain aspects, line displacement is entered to the base address by using the basic skew for being obtained described inclined to calculate Move address can first data segments by compression after complete.In certain aspects, methods described can also include: First data segments are read at the offset address for being calculated.
In certain aspects, methods described can also include:Determine second data segments whether with order to institute The consecutive correct compression ratio of the first data segments is stated, and first data segments is read at the offset address for being calculated Can include:In response to determining that second data segments have the correct compression ratio, using first data segments come Prefetch second data segments.In certain aspects, methods described can also include:The first data segments to being read are entered Row decompression.
Various aspects can include a kind of computing device, and the computing device is configured with processor executable to hold The operation of row method described above.Various aspects can include a kind of computing device, and the computing device has for holding The unit of the function of the operation of row method described above.Various aspects can include being stored thereon with the executable finger of processor The non-transitory processor readable medium of order, the processor executable are configured such that the processor of computing device is held The operation of row method described above.
Description of the drawings
The accompanying drawing for being incorporated herein and constituting the part of this specification shows the illustrative aspects of the present invention, and connects With overall description as given above and specific descriptions given below, to the feature for explaining the present invention.
Fig. 1 is the memorizer knot of exemplary placement of the compressed data segments known in the art in cache line Composition.
Fig. 2 shows the memorizer of the placed offset of data segments in cache line according to aspect compact technique Structure chart.
Fig. 3 show it is in being suitable in terms of some, include it is corresponding with compression ratio and base address parity values Basic deviant tables of data data structure diagram.
Fig. 4 A are to be suitable for examples in various aspects, function for returning to new address (for example, physical address) Property false code.
Fig. 4 B be suitable for it is in various aspects, for calling figure 4A exemplary functions exemplary pseudo-code.
Fig. 4 C show be suitable for it is in various aspects, used during the realization of the exemplary functions of Fig. 4 A The exemplary pseudo-code of example values.
Fig. 5 shows according to another aspect compact technique that the placed offset of data segments is deposited in cache line Reservoir structures figure.
Fig. 6 is showed and is suitable for bases in various aspects, including two positions with compression ratio and corresponding to address The data structure diagram of the tables of data of the corresponding basic deviant of address parity check value.
Fig. 7 A are be suitable in various aspects, function for returning to new address (for example, physical address) another Exemplary pseudo-code.
Fig. 7 B be suitable for it is in various aspects, for calling figure 7A exemplary functions another exemplary puppet generation Code.
Fig. 7 C show be suitable for it is in various aspects, used during the realization of the exemplary functions of Fig. 7 A The another exemplary false code of example values.
Fig. 8 A- Fig. 8 C show the aspect side that the data in the cache line of cache are compacted for computing device The procedure graph of method.
Fig. 9 shows the aspect side of the data compressed and be compacted in the cache line of cache for computing device The procedure graph of method.
Figure 10 is the block component diagram of the computing device being suitable in various aspects.
Specific embodiment
Various aspects will be described in detail with reference to the attached drawings.Whenever possible, will be come using identical reference through accompanying drawing Refer to same or analogous part.The reference of particular example and implementation is for illustrative purposes, it is not intended that limit The scope of the system present invention or claims.
Word " exemplary " used herein expression " is used as example, example or explanation ".Here depicted as " exemplary " Any implementation be not necessarily to be construed as it is preferably or more favourable than other implementations.
Various aspects include to realize with the data segments by being compacted in cache improving in computing device The compactness of the compressed data section in cache memory, so that the side that data segments are overlapped on a cache line Method.By overlapped data on a cache line, little compressed data section can be with share cache lines, so as to realize more Efficient memory access and unwanted mistake less frequently take (overfetching).
Term " computing device " used herein is referring to any one in the following or whole:Cell phone, intelligence Can phone, networking plate (web-pad), tablet PC, the cell phone with internet function, the electronics with WiFi function Equipment, personal digital assistant (PDA), laptop computer, desk computer (or personal computer), server and equipment There is the similar electronic equipment of at least processor.Computing device can utilize various frameworks to realize holding via their processor Row software instruction, and can include one or more memory cells, for example random access memory unit (for example, DDR, RAM etc.) and cache element (for example, L2 caches, L3 caches etc.).
Term " source data section " used herein is referring to the memorizer or cache that can be stored in computing device Uncompressed data segments in unit.In in all fields, the size of source data section can be 256 bytes.Additionally, source number Can be associated with the proper scope of the physical address space that can be stored in the caches according to section, the proper scope it is big Little (for example, the size of uncompressed tile (tile) or data block) identical with source data section.For example, source data section Proper scope can be 256 bytes.Source data section can be stored in more than one cache line, and therefore they Proprietary scope can extend to cover (or partly covering) multiple cache lines.Term " base address " used herein is come Refer to the physical address or virtual address of the beginning of the proper scope for representing source data section in cache.Art used herein Language " offset address " is referring to the physical address or virtual address of the skew as base address (physics is virtual).
Term " compressed data section " used herein is referring to its size by performing conventional data compression algorithm The data segments that computing device reduces.The size (for example, byte) of compressed data section can be less than their corresponding source numbers According to section.For example, compressed data section can be 64 bytes, and its corresponding source data section can be 256 bytes.
Compress technique be generally used for by reduce the data volume transmitted between memory cell and cache element come Improve the performance of computing device.However, no matter the less size of compressed data section, they still can be with cache Particular address (for example, physical address) it is associated or align, the suboptimum for frequently resulting in cache line is used.For example, when making (that is, 4 are used with 128 byte cachelines and by 256 byte data section boil down to, 64 byte:1 compression ratio) or 192 words Section (that is, uses 4:3 compression ratios) compressed data section when, 64 bytes of cache line are taken and unrestrained because of crossing Take.The compressed of little minimal compression size (for example, 64 byte) is wanted using than cache line size (for example, 128 byte) Data segments, the data for being loaded into cache via routine techniquess include the data for wasting, and which increase bandwidth and make compression It is worse.For example, the compressed data section of 64 bytes and 192 byte-sizeds can include (for example, using for a few thing load Family interface (UI) workload) cache transactions signal portion, and therefore can cause to the low of cache memory space Effect is used and unnecessary mistake takes.In many cases, for example, during the scene to the compression of UI workloads is related to, use 4 are utilized in process in the routine techniquess that compressed data is placed in cache line:1 or 4:3 compression ratios are come the data compressed It is probably especially poorly efficient during section.
Fig. 2 further highlights the potential poor efficiency of routine techniquess, and Fig. 1 shows Figure 101, and the Figure 101 is shown in reality In the cache line of the computing device of existing such conventional compact technology known in the art, compressed data section is problematic Placement.Exemplary cache can include cache line 102a-102d, these cache lines align successively and It is respectively provided with fixed size, such as 128 bytes.The first row 110 of Figure 101 shows can store in the caches first Source data section 112 (or tile or data block) and the second source data section 114.First source data section 112 can be with First cache line 102a and the second cache line 102b is corresponding, and (that is, the first source data section 112 can be stored in height In both fast cache lines 102a-102b).Similarly, the second data segments 114 can be with the 3rd cache line 102c and the 4th Cache line 102d is corresponding, and (that is, the second source data section 114 can be stored in both cache line 102c-102d In).
The computing device for being configured to perform various conventional compact algorithms can be using different compression ratios come pressure source number According to section 112,114, such as 4:1 compression ratio (for example, by 256 byte data section boil down to, 64 byte data section), 4:2 pressures Contracting than (for example, by 256 byte data section boil down to, 128 byte data section), 4:3 compression ratios are (for example, by 256 byte numbers According to 192 byte data section of section boil down to), and 4:4 compression ratios are (for example, by 256 byte data section boil down to, 256 word Save data segments or without compression).
For example, second row 120 of Figure 101 shows that to source data section 112,114 first compresses.Specifically, One source data section 112 (that is, can be made by the first compressed data section 122 that computing device boil down to size is 192 bytes With 4:3 compression ratios).First compressed data section 122 can keep (that is, Jing pressures of aliging with the first source data section 112 The beginning of contracting data segments 122 can be stored in the first cache line 102a with not offseting).In the second row 120, close In the second source data section 114, computing device can use 4:4 compression ratios (or not compressing), and therefore the second source data The size of section 114 can remain 256 bytes.For example when the compression algorithm used by computing device, be not configured as can Compress specific data segments or when computing device is otherwise configured to not select compressed data section, can select Property ground pressure source data segments.
First compressed data section 122 can still with the first cache line 102a and the second cache line 102b Correspondence (or alignment), however, due to its less size (that is, less than 256 original bytes), therefore the first compressed data area Section 122 can be merely with a part (or half) of the second cache line 102b.In other words, the first Jing of 192 byte pressures Contracting data segments 122 can be so that 128 byte the first cache line 102a are fully used and 128 bytes second are slow at a high speed Deposit row 102b and only used half.As the half to the second cache line 102b is used, untapped data are (in FIG Be referred to as " OF ") the one 64 byte cross the section 123 that fetches data and also be present in the second cache line 102b.For example, when When reading the first compressed data section 122 from DDR or other memorizeies, first crosses and takes part 123 and can be loaded into the In two cache line 102b.
Another example is lifted, the third line 130 of Figure 101 shows that to source data section 112,114 second compresses.Specifically For, the first source data section 112 can be the second compressed data section 132 of 64 bytes by computing device boil down to size (that is, use 4:1 compression ratio).Second compressed data section 132 can keep aliging (that is, with the first source data section 112 The beginning of two compressed data sections 132 can be stored in the first cache line 102a with not offseting).Second is compressed Data segments 132 can not utilize the second cache line 102b, and therefore the second cache line 102b can be filled with Untapped data segments of two 64 bytes 139 (for example, invalid, empty or other untapped data segments).Substitute Ground, the second cache line 102b can be filled with the invalid data section of 128 bytes.Due to 64 bytes second it is compressed Data segments 132 are only filled with the half of 128 byte the first cache line 102a, thus cross take/untapped data second 64 bytes are crossed the section 133 that fetches data and can also be present in the first cache line 102a.
Similar to the second compressed data section 132 in the third line 130, the second source data section 114 can be by calculating Equipment boil down to size is that the 3rd compressed data section 134 of 64 byte (that is, uses 4:1 compression ratio).3rd compressed data Section 134 can keep aliging with the second source data section 114, and (that is, the beginning of the 3rd compressed data section 134 can not have It is stored in the 3rd cache line 102c to skew).3rd compressed data section 134 can not utilize the 4th cache Row 102d, and therefore the 4th cache line 102d can filled with untapped data segments of two 64 bytes 139 (or Alternatively untapped data segments of 128 byte).However, as the 3rd compressed data section 134 of 64 byte is only filled with The half of the 3rd cache line 102c of 128 byte, thus cross take/the 3rd 64 byte of untapped data crosses the section that fetches data 135 can also be present in the 3rd cache line 102c.
Another example is lifted, the fourth line 140 of Figure 101 shows the 3rd compression to source data section 112,114.Specifically For, the first source data section 112 can be the 4th compressed data section of 128 bytes by computing device boil down to size 142 (that is, use 4:2 compression ratios).4th compressed data section 142 can keep aliging with the first source data section 112 (that is, the beginning of the 4th compressed data section 142 can be stored in the first cache line 102a with not offseting).4th Compressed data section 142 can not utilize the second cache line 102b, and therefore the second cache line 102b can be with Filled with untapped data segments of 128 bytes 149 (or alternatively untapped data segments of two 64 bytes).
Second source data section 114 can be the 5th compressed data area of 192 bytes by computing device boil down to size Section 144 (that is, uses 4:3 compression ratios).5th compressed data section 144 can keep aliging with the second source data section 114 (that is, the beginning of the 5th compressed data section 144 can be stored in the 3rd cache line 102c with not offseting).5th Compressed data section 144 can only partially utilize the 4th cache line 102d, and therefore the 4th cache line 102d can cross the section 145 that fetches data filled with the 4th 64 byte.
In order to improve the poorly efficient placement of the compressed data section of the routine techniquess shown in Fig. 1, various aspects include For the data segments that are compacted in cache so that data segments overlap on a cache line method, equipment and non- Temporary process readable storage medium storing program for executing.In other words, data segments can be placed on the usual of cache another with other The associated address realm (for example, range of physical addresses) of individual data segments is interior rather than is placed on its own conventional address model In enclosing, so as to obtain to the more efficient using (that is, it needs to less cache line is storing same amount of number of cache line According to).Using this overlap, little compressed data section can be with share cache lines, so as to realize more efficient memorizer Access (for example, DDR MAL) and unwanted mistake less frequently take.
In in all fields, computing device can be configured to perform cache line compacting operation, and the operation is by data Proper scope that placement displacement (or skew) of section is associated to the usual of cache with another data segments (or address Space) in.For example, computing device can offset two continuous proper scopes so that they extend to single cache line Each half.In order to realize this compacting, computing device can identify the base that will be applied to the first data segments (or data block) The basic skew of address, so that the first data segments are stored in the cache line being generally associated with the second data segments (for example, in physical address).Basic skew at least can be determined by computing device based on the size of data of data segments.Can be with The size of data is identified based on the compression ratio ratio of compressed size (that is, uncompressed size with) of data segments. For example, when data segments are with the first compression ratio (for example, 4:4), when being compressed, the basic skew of the data segments can be from base First amount (in units of byte) of address, and when data segments are with the second compression ratio (for example, 4:1) when being compressed, the number Can be the second amount from base address according to the basic skew of section.Can be used to compress based on by the computing device by computing device The various compression algorithms of data segments are identifying this compression ratio.
Outside size of data (for example, compression ratio), it is basic to determine that computing device is also based on base address itself Skew.For example, when data segments base address certain position (for example when, 8) position is odd number (that is, ' 1 '), with certain hierarchy compression (for example, 4:1) data segments to compress can offset (for example, 64 byte) with first, and when certain position is even number When (that is, ' 0 '), the data segments can be with the second skew (for example, 256 byte).In certain aspects, the position 8 of base address Value can with base address in every 256 byte and change.For example, for cache in the first address, position 8 can be ' 0 ' value, but at a distance of the second address of 256 bytes, 8 value of position can be ' 1 ' value with the first address.
In certain aspects, the basic skew of compressed data section can be less than 4KB page of size.Therefore, computing device Can perform any virtual address or physical address for various memorizeies or cache element aspect compacting operation or Technology.
In certain aspects, computing device can use compression ratio and with regard to base address (for example, the thing in cache Reason address) information search to identify corresponding basic skew to perform predefined tables of data.However, in some respects In, computing device can be identified using logic, software, circuit and other functions rather than using predetermined tables of data The deviant of the data segments for being compacted in cache.For example, computing device can be via the processor of the computing device Perform software (for example, for accessing stored operation of tables of data etc.) and/or via the place for being coupled to the computing device The special circuit of reason device, identifies deviant and calculates offset address.
In certain aspects, computing device can be by all proper scopes from the half of base address offset cache line (being equal to minimal compression size).In this way, it is possible to by the compressed data for being not filled by cache line section be displaced to separately In the shared cache line of the proper scope of one source data section, so as to realize that the opportunistic to whole cache line is used (that is, continuous compression can be mapped to shared cache line).
Aspect technology can be by computing device for carrying out compaction data section independently of other data segments in buffer. For example, regardless of the compression ratio for adjacent data section, specific data segments can be based only upon the compression ratio of its own With base address in cache line bias internal.In other words, for compaction data section, computing device may not be needed to know which The compression ratio of its data segments and/or skew.
The following is the exemplary application of the aspect method carried out by computing device.Computing device can be configured to perform from The reading of memory cell (for example, DDR) is fetching uncompressed data segments (that is, source data section).It is uncompressed, take The data segments returned generally can be associated with the base address in cache (for example, physical address).Computing device can be commented Fetched data segments are estimated identify may be via the hierarchy compression (or ratio) of compression algorithm application.The compression ratio for being identified Can be that possible maximum is maximized to use cache memory space for data segments.Computing device can be with The compression ratio carrys out compressed data section.The base address being associated based on compression ratio and with uncompressed data segments, calculating are set For the skew that can be identified in cache to store compressed data segments, for example, from 64 byte of base address offset.Calculate Equipment subsequently can be loaded into compressed data segments in offset address.
Various aspects can be beneficial to reduce and (for these computing devices, right in various types of computing devices The improvement that power and/or bandwidth of memory (for example, DDR) are used be it is desirable that) in mistake take correlation unnecessary band It is wide to use.For example, use operation can be come by the mobile device (for example, smart phone) using SOC(system on a chip).Additionally, logical Cross in realizing for L3 caches or other similar memorizeies or memory element it is pre- fetch the performance for improving computing device, respectively Individual aspect can be beneficial.Specifically, can obtain simultaneously the data segments that are stored in shared cache line and Adjacent data section (or adjacent sections), so as to reduce reading.For example, computing device can fetch a Jing of 64 byte-sizeds Second compressed data section of the compressed data section together with 64 adjacent byte-sizeds.
Aspects herein described cache line compacting operation can be performed to reduce to various works by computing device The mistake for making load (including UI workloads) takes.Aspect operation can be particularly useful for reducing to including with 4:1 and 4:3 compression ratios Come the data segments (which is occurred in continuous blocks) compressed and with 4:1 or 4:2 come the workload of data segments that compresses Cross and take.
Routine techniquess include for by cache arrangement be support compressed and uncompressed row operation and It is used to determine whether to want the operation of compressed data.For example, routine techniquess can include:For only because the typical case of squeeze operation Effect and by two adjacent uncompressed rows be compressed to the method in discrete component (for example, in physical address space order Adjacent lines due to squeeze operation can be compressed to load a line in).
Different from routine techniquess, various aspects simultaneously not simply discuss compression or compressed data storage is slow in high speed In depositing.Conversely, aspect technology is provided to can be used for changing this conventional compact technology so that the data of conventional compact can be with The function being preferably placed in cache line.In other words, aspect technology can by computing device compressed data it Afterwards processing stage used in so that (post- after the compression that would generally be placed in different cache row Compression) export (that is, compressed data segments) to be bundled in cache in a more efficient manner.For example, hold The computing device of row aspect operation can be obtained and originally can be loaded into that two different, in the cache line that partly uses two Individual different compressed data section, and by using offset shift by the two compressed data storage of sectors to single high speed In cache lines.In other words, aspect feature is interior to compressed in the address space (for example, physical address space) of cache Data segments shifted, change compressed data section base address, with improve cache use.
For the sake of simplicity, the behaviour of the compressed data section being compacted in cache is may refer to the description of various aspects Make.However, various aspects operation can be applied to be compacted any memory element or framework (for example, SRAM, high speed by computing device Caching etc.) in any kind of data (for example, compressed or uncompressed).In other words, various aspects are not It is intended to be limited to be used together with data compression scheme.For example, computing device can perform aspects herein described method, with Improvement includes any data set of the data segments of various length or any data set (example with cavity to buffer Such as, untapped data segments) compacting.Although base address can be referred to as physical address herein, it will be appreciated that In certain aspects, base address can be virtual address.
In in all fields, computing device can utilize continuous (that is, adjacent storage) cache line (and therefore Using data that are being associated with cache line or being stored in cache line).However, in certain aspects, calculate Equipment can be configured to the data segments in the discontinuous address in storage/access cache.In other words, aspect technology Can realize to be stored in may or may not be in memory diverse location in cache line in data field The compacting of section.
For the sake of simplicity, perform various aspects method computing device can be referred to as identify or otherwise use with The associated compression ratio of data segments is to calculate offset address.It is to be appreciated, however, that the computing device in terms of performing these Any form of instruction of the size of data to such data segments can be identified or otherwise use, so as to according to each Aspect carrys out compaction data section.In other words, various aspects can be used for being compressed or may un-compressed data Section performs compacting.The computing device for performing various aspects technology can be identified based on various compression ratios and data size distribution The size of data of data segments, and therefore the compression ratio example that refers in the description to various aspects be not intended to right The scope of requirement is limited to 4:1、4:2、4:3、4:4 compression ratios.
Fig. 2 shows basis by the aspect cache line compacting operation of computing device, data segments at a high speed Placement in cache lines 220-227.Computing device can be configured to, with conventional compact algorithm to compress each data segments. Specifically, Fig. 2 shows an example, and wherein computing device can use compression algorithm, so as to via 4:1 compression ratio will The source data section 200a-206a of 256 bytes is compressed into the compressed section of 64 bytes, via 4:2 compression ratios are compressed it into The compressed section of 128 bytes, via 4:3 compression ratios compress it into the compressed section of 192 bytes or via 4:4 pressures Contracting ratio compresses it into the compressed section of 256 bytes.For the following description to Fig. 2 it is clear for the sake of, illustrated using vertical line Cache line maps the fixed size (for example, 128 byte) of the cache line 220-227 to indicate to align successively, and Dotted line 270 can indicate the intertexture of 1kB.
Cache line 220-227 can be associated (that is, the first cache with the physical address that from left to right can increase The physical address of row 220 is the digital little numeral than being associated with the physical address of the second cache line 221, according to this class Push away).In certain aspects, the physical address of cache line 220-227 can be represented with least 8.Fig. 2 is shown to each The instruction of the 8th and the 7th of the associated physical address of individual cache line 220-227.For example, the first cache line 220 the 8th (or position 8) can be ' 0 ' value, and the 7th (or the position 7) of the first cache line 220 can be ' 0 ' Value, the 8th (or the position 8) of the second cache line 221 can be ' 0 ' value, and the 7th of the second cache line 221 (or position 7) can be ' 1 ' value, and the 8th (or position 8) of the 3rd cache line 222 can be ' 1 ' value, and the 3rd high speed is slow The 7th (or position 7) for depositing row 222 can be ' 0 ' value, and the rest may be inferred.
The first row 228 of Figure 200 shows that uncompressed source data section 200a-206a is deposited according to routine techniquess Storage in the caches when acquiescence in the 256 byte sections border 210-216 place.In other words, source data section 200a- 206a can be stored in their base address with the interval of 256 bytes.Specifically, the first source data section 200a is (in Fig. 2 It is referred to as ' A ') can be no placed offset in the first cache line 220 and the second cache line 221.Second source number Can be no placed offset in the 3rd cache line 222 and the 4th cache according to section 202a (being referred to as ' B ' in Fig. 2) In row 223.3rd source data section 204a (being referred to as ' C ' in Fig. 2) can be no placed offset in the 5th cache line 224 and the 6th in cache line 225.4th source data section 206a (being referred to as ' D ' in Fig. 2) can be not placed offset In the 7th cache line 226 and the 8th cache line 227.
Row 229,230,240,250,260 in Figure 200 shows the source data area to compressed format according to aspect technology The placed offset Chong Die with base address of section 200a-206a.In certain aspects, as described below, used by computing device Can be for example following based on the lookup performed to predefined tables of data in the basic skew for placing each compressed data section With reference to described by Fig. 3.
Second row 229 of Figure 200 shows 256 byte compressed data section 200b-206b in the base from each of which 64 bytes of address offset substantially in simple placed offset.The mapping may be needed using all compressed of this aspect Distribute 64 other bytes at the ending of data buffer.A Jing corresponding with the first source data section 200a (that is, ' A ') Compressed data section 200b can be placed on the half of the first cache line 220, whole by computing device with 64 byte offsets Individual second cache line 221, and the half of the 3rd cache line 222 in.First compressed data section 200b can be with 280 are overlapped by 64 bytes to overlap onto in the 3rd cache line 222.It is corresponding with the second source data section 202a (that is, ' B ') The second compressed data section 202b the 3rd cache line 222 can be placed on 64 byte offsets by computing device Half, whole 4th cache line 223, and the half of the 5th cache line 224 in.Second compressed data section 202b can overlap 282 by 64 bytes and overlap onto in the 5th cache line 224.With the 3rd source data section 204a (i.e., ' C ') corresponding the 3rd compressed data section 204b can be placed on the 5th at a high speed by computing device with 64 byte offsets The half of cache lines 224, whole 6th cache line 225, and the half of the 7th cache line 226 in.3rd Jing is pressed Contracting data segments 204b can overlap 284 by 64 bytes and overlap onto in the 7th cache line 226.With the 4th source data section The 4th 206a (that is, ' D ') corresponding compressed data section 206b can be placed on 64 byte offsets by computing device The half of the 7th cache line 226, whole 8th cache line 227, and another cache line (not shown) one In half.4th compressed data section 206b can overlap onto another across interweaving and can overlap 286 by 64 bytes In cache line (not shown).Compressed data section 200b-206b can be by computing device with 4:4 compression ratios compressing, Or alternatively do not compress and only offset to shift via basic.
The third line 230 of Figure 200 show to by computing device with 4:1 compression ratio is come the 64 byte data sections that compress Placed offset.Specifically, the one 64 byte compressed data area corresponding with the first source data section 200a (that is, ' A ') Section 231 can be placed on 256 byte offsets in the half of the 3rd cache line 222 by computing device.With the second source number Can be inclined with 64 bytes by computing device according to the 2nd 64 corresponding byte compressed data sections 232 of section 202a (that is, ' B ') In-migration be placed on the 3rd cache line 222 second half in.In other words, 231 He of the one 64 byte compressed data section 2nd 64 byte compressed data section 232 can share the 3rd cache line 222, although cache line 222 is generally only It is associated with the second source data section 202a (that is, ' B ').Due to respectively to the one 64 byte compressed data section 231 and 256 bytes of 2 64 byte compressed data sections 232 and 64 byte offsets are placed, and the first cache line 220, second is at a high speed Cache lines 221 and the 4th cache line 223 can include untapped data 299.In other words, it is possible to need not calculate Equipment is obtaining any data for these cache lines 220,221,223.These untapped cache lines are (i.e., The cache line 220 that is associated with untapped data 299,221,223) can be it is idle so as to cache assignment to Other requests.
Referring also to the third line 230 of Figure 200, three 64 byte corresponding with the 3rd source data section 204a (that is, ' C ') Compressed data section 234 can be placed on the half of the 7th cache line 226 by computing device with 256 byte offsets It is interior.The 192 byte compressed data sections 236 corresponding with the 4th source data section 206a (that is, ' D ') can be by computing device It is placed on 64 byte offsets in second half and whole 8th cache line 227 of the 7th cache line 226.192 Byte compressed data section 236 may be by computing device with 4:3 compression ratios are compressing.3rd 64 byte compressed data area 234 and 192 byte compressed data sections 236 of section can share the 7th cache line 226, although cache line 226 is logical It is often only associated with the 4th source data section 206a (that is, ' D ').Due to respectively to the 3rd 64 byte compressed data section 234 Place with 256 bytes and 64 byte offsets of 192 byte compressed data sections 236, the 5th cache line 224 and the 6th is high Fast cache lines 225 can include untapped data 299.
The fourth line 240 of Figure 200 show to by computing device with 4:2 compression ratios are come the 128 byte data sections that compress Placed offset.Specifically, the one 128 byte compressed data corresponding with the first source data section 200a (that is, ' A ') Section 241 can be placed on 128 byte offsets in the second cache line 221 by computing device.With the second source data area The 2nd 128 corresponding byte compressed data section 242 of section 202a (that is, ' B ') can be by computing device with 128 byte offsets To be placed in the 4th cache line 223.In this way, the one 128 byte compressed data section 241 and the 2nd 128 word Section compressed data section 242 can be remained stored in their typical section boundaries 210,212, but can be shifted by for So that the first cache line 220 and the 3rd cache line 222 include untapped data 299.Stated differently, since 128 Byte offset, may not need computing device to obtain any data for these cache lines 220,223.
Referring also to the fourth line 240 of Figure 200, three 128 word corresponding with the 3rd source data section 200c (that is, ' C ') Section compressed data section 244 can be placed on 128 byte offsets in the 6th cache line 225 by computing device.With 64 4th source data section 206a (that is, ' D ') corresponding byte compressed data sections 246 can be by computing device with 64 words Save skew to be placed in the 7th cache line 226.In this way, the 3rd 128 byte compressed data section 244 and 64 words Section compressed data section 246 can be remained stored in their typical section boundaries 214,216, but can be shifted by for So that the 5th cache line 224 and the 8th cache line 227 include untapped data 299.However, due to 64 bytes Jing Compressed data section 246 is not filled up completely with the 7th cache line 226 and the 3rd 128 byte compressed data section 244 It is not displaced in the 4th section boundaries 216, therefore the half of the 7th cache line 226 can be fetched data filled with crossing 291。
The fifth line 250 of Figure 200 show to by computing device with 4:3 compression ratios are come the 192 byte data sections that compress Placed offset.Specifically, the one 192 byte compressed data corresponding with the first source data section 200a (that is, ' A ') Section 251 can be placed on the second cache line 221, and the 3rd cache by computing device with 128 byte offsets In the half of row 222.The two 192 byte compressed data section corresponding with the second source data section 202a (that is, ' B ') 252 half and the 4th cache that can be placed on the 3rd cache line 222 by computing device with 64 byte offsets In row 223.In this way, the one 192 byte compressed data section 251 and the 2nd 192 byte compressed data section 252 can To be shifted by share the 3rd cache line 222, so that the first cache line 220 can include untapped data 299.Stated differently, since 192 byte offsets, may not need computing device to obtain for the first cache line 220 Any data.
Referring also to the fifth line 250 of Figure 200, the 64 bytes Jing pressure corresponding with the 3rd source data section 200c (that is, ' C ') Contracting data segments 254 can be placed on 256 byte offsets in the 7th cache line 226 by computing device.With the 4th source 256 corresponding byte compressed data sections 256 of data segments 206a (that is, ' D ') can be inclined with 64 bytes by computing device In-migration is placed on the half of the 7th cache line 226, whole 8th cache line 227, and another cache line In the half of (not shown).In this way, the 5th cache line 224 and the 6th cache line 225 can include being not used Data 299.However, 256 byte compressed data sections 256 can across interweave and therefore computing device may need to make Use single affairs.
6th row 260 of Figure 200 show to by computing device with 4:4 compression ratios are come the 256 byte data sections that compress Placed offset.Specifically, the one 256 byte compressed data corresponding with the first source data section 200a (that is, ' A ') Section 261 can be placed on the half of the first cache line 220, whole second high speed by computing device with 64 byte offsets Cache lines 221, and the half of the 3rd cache line 222 in.It is corresponding with the second source data section 202a (that is, ' B ') 2nd 256 byte compressed data section 262 can be placed on the 3rd cache line by computing device with 64 byte offsets 222 half, whole 4th cache line 223, and the half of the 5th cache line 224 in.Additionally, with the 3rd source number Can be by computing device with 64 bytes according to the 3rd 256 corresponding byte compressed data sections 264 of section 200c (that is, ' C ') Offset to be placed on the half of the 5th cache line 224, whole 6th cache line 225, and the 7th cache line In 226 half.The 64 byte compressed data sections 266 corresponding with the 4th source data section 206a (that is, ' D ') can be by Computing device is placed on 64 byte offsets in the half of the 7th cache line 226.In this way, the 8th cache line 227 can include untapped data 299.However, as the one 256 byte compressed data section 261 is not filled up completely with First cache line 220, therefore the half of the first cache line 220 can fetch data 290 filled with crossing.
In certain aspects, data segments can be separated into cross-border independent affairs by computing device, so as to by number Alleviated when being mapped as across more intertexture and/or page boundary according to section.For example and as described in FIG. 5 below, work as meter 256 byte compressed data sections (for example, are utilized 4 by calculation equipment:3 compression ratios are come the section that compresses) be placed on intertexture and/or When in page boundary, it may be necessary to two single affairs.
Fig. 3 show it is in being suitable in terms of some, include and compression ratio and base address information (that is, base address odd even Check value) corresponding deviant tables of data 300.In terms of realization, the computing device of compact technique can be pressed using specific Jing The compression ratio information 310 and base address parity information 302 of contracting data segments to perform lookup to tables of data 300, to identify The size of the basic skew of placement (or access) data segments in cache should be used for.Base address parity information 302 can be the position 8 of the base address to the data segments in cache (for example, physical address) be odd number (that is, ' 1 ' value) also It is the instruction of even number (that is, ' 0 ' value).In certain aspects, tables of data 300 can be stored in the memorizer of computing device Two-dimentional (2D) array.In certain aspects, both hardware and softwares can be used for realizing each technology, such as by computing device profit With table or logic.
For example, in response to for 4:1 compression ratio is come data segments compress and the base address with even number value position 8 To perform tables of data 300 lookup, computing device can identify 256 bytes and offset substantially.In response to for 4:1 compression ratio comes The data segments of the compress and base address with odd number value position 8 to perform tables of data 300 lookup, and computing device can be marked Know 64 bytes to offset substantially.In response to for 4:2 compression ratios are come base address compress and with even number or odd number value position 8 Data segments to perform tables of data 300 lookup, computing device can identify 128 bytes and offset substantially.In response to 4:3 pressures Contracting ratio to perform lookup to tables of data 300 data segments compress and the base address with even number value position 8, computing device 128 bytes can be identified to offset substantially.In response to for 4:3 compression ratios are come the compress and base with odd number value position 8 The data segments of location to perform tables of data 300 lookup, and computing device can identify 64 bytes and offset substantially.In response to for 4:4 compression ratios are performed to tables of data 300 the compress and data segments with even number or the base address of odd number value position 8 and are looked into Look for, computing device can identify 64 bytes and offset substantially.
Fig. 4 A show can be by computing device to identify for compressed data section in cache The exemplary pseudo-code 400 of the function of the new address (for example, physical address) of placed offset.For example, computing device can be matched somebody with somebody It is equipped with and instructs so as to as shown in Fig. 2 above to place data segments as described below.Shown in false code 400 Function can by computing device for example during or after packing routine (for example, the packing routine to data set) is performed, as Using, firmware, application programming interface (API), routine and/or operation it is a part of performing.It should be recognized that, there is provided the pseudo- generation Code 400 is in order at the purpose of general remark, and therefore is not intended to represent any specific programming language, structure or form Change.
False code 400 (being referred to as " false code A " in Fig. 4 A) can be represented and be referred to as " getNewAddress_ The function of oddOrEven () ", the function may need the Base_Address (base _ address) related to specific data segments |input paramete and Compression_Ratio (compression _ compare) |input paramete.In certain aspects, Base_Address can be The binary representation of physical address or virtual address.Function shown in false code 400 can be included for computing device pair Base_Address |input parametes perform shift right operation so as to by the position 8 of the base address of data segments move to rightmost position for Be stored in Shift (displacement) variable (shift right operation be indicated as in Figure 4 A ">>8 " instruction).Shown in false code 400 The function for going out can include performing for the position 8 (that is, Shift variables) to address and ' 1 ' value (for example, " ... 000000001 ") Step-by-step and the instruction of computing (or ' & ' in Fig. 4 A), the computing can be generated and be stored in Parity (even-odd check) variable Value.In other words, when Shift variables are 0, the value in Parity variables can be 0, and when Shift variables are 1, Parity variate-values can be 1.
Function shown in false code 400 can also include using Compression_Ratio defeated for computing device Enter parameter and the Parity variables that calculated to perform predefined table the instruction of search operation, such as described in Fig. 3. Can be stored in Offset variables by searching the value fetched.In certain aspects, search operation could be for access and deposit The operation of information of the storage in two dimension (2D) array in the memorizer of computing device.Finally, the letter shown in false code 400 Number can include Base_Address and Offset additions of variables is (for example, new to generate offset address for computing device Offsetting physical address) instruction, the offset address can be returned so as to by computing device be used for place data segments.
In certain aspects, computing device can be configured with simplified logic, circuit, software instruction and/or routine Generate new address rather than operated using table search, as indicated by using false code 400.
Fig. 4 B- Fig. 4 C show and can be used for calling to the function shown in the false code 400 described in Fig. 4 A Exemplary pseudo-code and value.It should be recognized that the example values in Fig. 4 B- Fig. 4 C are in order at descriptive purpose, it is not intended that Limit the scope of each side.
Fig. 4 B show exemplary pseudo-code 450, the exemplary pseudo-code 450 can by computing device with call as The exemplary functions for determining the new offset address of data segments described in Fig. 4 A.Exemplary pseudo-code 450 can be wrapped Include for arranging the instruction of the value of Base_Address and Compression_Ratio variables, wherein Base_Address and Compression_Ratio variables may serve as the input ginseng called to getNewAddress_oddOrEven () function Number.For example, Base_Address can be configured so that the physical address (for example, " ... 100000000 ") of data segments, wherein thing Reason address includes at least eight.Another example is lifted, Compression_Ratio variables can be configured so that instruction computing device The value (for example, 4 of the hierarchy compression (or compression ratio) of compressed data section is carried out using compression algorithm:4).False code 450 can be with Including calling function getNewAddress_oddOrEven () using these |input parametes and value, and can cause to count Calculation equipment can be stored in the return value from function in new_address (new _ address) variable, for example, realize at a high speed Placed offset, data segments new physicses address in cache lines.
Fig. 4 C show exemplary pseudo-code 460, and the false code 460 still also includes similar to the false code 400 of Fig. 4 A For the example values in each instruction based on the input parameter value described in above figure 4B.In other words, false code 460 It is the function equivalent scheme of the exemplary pseudo-code 400 of Fig. 4 A, which has to property function getNewAddress_ as an example The instruction of a part of increase of the value of calculating to perform of oddOrEven ().
For example, in figure 4 c, function can utilize the Base_Address |input parametes with " ... 100000000 " value with And have " 4:4 " the Compression_Ratio |input parametes of value.Move to right in response to performing to Base_Address |input parametes Computing, computing device can generate the value of " ... 000000010 " so as to the base address (for example, physical address) by data segments Position 8 moves to rightmost position for storage in Shift variables.Function can include for Shift variables and 1 value (or " ... 000000001 " instruction of step-by-step and computing) is performed, the computing can generate the value being stored in Parity variables.For example, when When Shift variables have value " 000000010 ", step-by-step and computing can produce " ... 000000000 " (or simply 0) Parity variate-values.Using exemplary Parity variate-values 0 and the example values 4 of Compression_Ratio:4, in Fig. 3 The result of the lookup of described predefined table can return the Offset variate-values of 64 bytes.Function can subsequently return number According to the new offset address of section, the new offset address is Offset variate-values (for example, 64 byte) and Base_Address (examples Such as, " ... 100000000 ") combination.
In certain aspects, computing device can be configured to:By will not across in the way of page boundary or intertexture by data Section is placed offset in cache line.In other words, 64 byte data sections can be mapped to 512 words by computing device Outside section part (or two 256 byte sections), so that data segments surround the beginning of 512 byte sections to avoid across page Border.In terms of illustrating these, Fig. 5 is shown according to using the circular another aspect cache line compact technique, right The placed offset of the data segments compressed using various compression ratios.Figure 50 0 of Fig. 5 is similar to above with reference to described by Fig. 2 Figure 200, difference are that the aspect placed offset shown in Fig. 5 may not allow for data segments to be placed on cache In so that they across more than 512 bytes page or intertexture border.Aspect placed offset shown in Fig. 6 can also be obtained Use for the more uniform cache bank that data segments are placed.However, putting with the aspect skew shown in Fig. 2 Put and compare, the alternative aspect shown in Fig. 5 can be averagely caused with 4:4 compression ratios are come the half point in the data segments compressed From into two affairs.In other words, for placing with 4:The discontinuous request of data segments of 4 compression ratios to compress can be obtained Than the more detached request of aspect technology shown in Fig. 2.Aspect technology shown in Fig. 5 can be beneficial to reduce great majority Cross and take, particularly when being used together with L3 caches.Additionally, can be real in the case of with and without L3 caches Now simply prefetch.In in all fields, 4:4 compression ratios can be compression ratio most unlikely, and therefore by data segments It can be extreme case to be separated into two affairs.
As shown in Figure 5, cache line 520-527 can be associated with the physical address that from left to right can increase (that is, the physical address of the first cache line 520 is the numeral than being associated with the physical address of the second cache line 521 Little numeral is wanted, the rest may be inferred).In certain aspects, the physical address of cache line 520-527 can with least 9 come Represent.Fig. 5 shows the nine, the 8th and the 7th s' of the associated physical address to each cache line 520-527 Indicate.For example, the 9th (or position 9) of the first cache line 520 can be ' 0 ' value, the 8th of the first cache line 520 the Position (or position 8) can be ' 0 ' value, and the 7th (or position 7) of the first cache line 520 can be ' 0 ' value, and the 5th The 9th (or position 9) of cache line 524 can be ' 1 ' value, and the 8th (or the position 8) of the 5th cache line 524 can be with It is ' 0 ' value, and the 7th (or the position 7) of the 5th cache line 524 can be ' 0 ' value.
As described above with reference to Figure 2, the first row 228 of Figure 50 0 shows source data section 202a-206a in basis Acquiescence when routine techniquess are to store in the caches in the 256 byte sections border 210-216 is placed.First source data area Section 200a (being referred to as ' A ' in Fig. 5) can not have placed offset in the first cache line 520 and the second row by computing device In 521.Second source data section 202a (being referred to as ' B ' in Fig. 5) can not have placed offset in three-hypers by computing device In fast cache lines 522 and the 4th cache line 523.3rd source data section 204a (being referred to as ' C ' in Fig. 5) can be by calculating Equipment does not have placed offset in the 5th cache line 524 and the 6th cache line 525.4th source data section 206a (being referred to as ' D ' in Fig. 5) can not have placed offset in the 7th cache line 526 and the 8th cache by computing device In row 527.
However, the row 529,530,540,550,560 of Figure 50 0 according to operated by the aspect of computing device show it is right The placed offset Chong Die with base address of the source data section 200a-206a of compressed format.In certain aspects, by computing device For place each compressed data section basic skew can based on the lookup performed to predefined tables of data, for example under Described in the Fig. 6 of face.
Second row 529 of Figure 50 0 show to 256 byte compressed data sections 501,502a-502b, 504a-504b, 506 placed offset.The first compressed data section 501 corresponding with the first source data section 200a (that is, ' A ') can be by Computing device be placed on 64 byte offsets the half of the first cache line 520, whole second cache line 521, with And the 3rd cache line 522 half in.First compressed data section 501 can overlap overlap onto the by 64 bytes In three cache lines 522.The second compressed data section 502a corresponding with the second source data section 202a (that is, ' B ') The half and whole 4th high speed that the 3rd cache line 522 can be placed on 64 byte offsets by computing device is delayed Deposit in row 523.However, in order to avoid across intertexture, computing device can be configured to 256 the second compressed data of byte areas Section 502a is stored in two single parts.Therefore, the remainder 502b of 256 byte the second compressed data section 502a (being referred to as " B-Rem " in Fig. 5) can be placed in the first half of the first cache line 520 by computing device.With the 3rd source The 3rd corresponding compressed data section 504a of data segments 204a (that is, ' C ') can be put with not offseting by computing device Put the 5th cache line 524, and the half of the 6th cache line 525 in.Similar to the second compressed data section 502a, the 3rd compressed data section 504a can be separated into two parts to avoid across intertexture.Therefore, the 3rd is compressed The remainder 504b of data segments 504a can be placed in the half of the 8th cache line 527.With the 4th source data area The 4th corresponding compressed data section 506 of section 206a (that is, ' D ') can be placed with minus 64 byte offset by computing device Half, whole 7th cache line 526 in the 6th cache line 525, and the 8th cache line 527 half It is interior.
The third line 530 of Figure 50 0 show to by computing device with 4:1 compression ratio is come the 64 byte data sections that compress Placed offset.Specifically, the one 64 byte compressed data area corresponding with the first source data section 200a (that is, ' A ') Section 531 can be placed on 256 byte offsets in the half of the 3rd cache line 522 by computing device.With the second source number Can be inclined with 64 bytes by computing device according to the 2nd 64 corresponding byte compressed data sections 532 of section 202a (that is, ' B ') In-migration be placed on the 3rd cache line 522 second half in.In other words, 531 He of the one 64 byte compressed data section 2nd 64 byte compressed data section 532 can share the 3rd cache line 522, although cache line 522 is generally only It is associated with the second source data section 202a (that is, ' B ').Due to respectively to the one 64 byte compressed data section 531 and 256 bytes of 2 64 byte compressed data sections 532 and 64 byte offsets are placed, and the first cache line 520, second is at a high speed Cache lines 521 and the 4th cache line 523 can include untapped data 599.In other words, it is possible to need not calculate Equipment is obtaining any data for these cache lines 520,521,523.These untapped cache lines are (i.e., The cache line 520 that is associated with untapped data 599,521,523) can be it is idle so as to cache assignment to Other requests.
Referring also to the third line 530 of Figure 50 0, three 64 byte corresponding with the 3rd source data section 204a (that is, ' C ') Compressed data section 534 can be placed on the half of the 6th cache line 525 by computing device with 128 byte offsets It is interior.The 192 byte compressed data sections 536 corresponding with the 4th source data section 206a (that is, ' D ') can be by computing device It is placed on minus 64 byte offset in second half and whole 7th cache line 526 of the 6th cache line 525. 192 byte compressed data sections 536 may by computing device 4:3 compression ratios are compressing.Due to respectively to the 3rd 64 word 128 bytes and minus 64 byte offset of section compressed data section 534 and 192 byte compressed data sections 536 are placed, and the 5th Cache line 524 and the 8th cache line 527 can include untapped data 599.
The fourth line 540 of Figure 50 0 show to by computing device with 4:2 compression ratios are come the 128 byte data sections that compress Placed offset.Specifically, the one 128 byte compressed data corresponding with the first source data section 200a (that is, ' A ') Section 541 can be placed on 128 byte offsets in the second cache line 521 by computing device.With the second source data area The 2nd 128 corresponding byte compressed data section 542 of section 202a (that is, ' B ') can be by computing device with 128 byte offsets To be placed in the 4th cache line 523.In this way, the one 128 byte compressed data section 541 and the 2nd 128 word Section compressed data section 542 can be remained stored in their typical section boundaries 210,212, but can be shifted by for So that the first cache line 520 and the 3rd cache line 522 include untapped data 599.Stated differently, since 128 Byte offset, may not need computing device to obtain any data for these cache lines 520,523.
Referring also to the fourth line 540 of Figure 50 0, three 128 word corresponding with the 3rd source data section 200c (that is, ' C ') Section compressed data section 544 can not have placed offset in the 5th cache line 524 by computing device.With the 4th source 64 corresponding byte compressed data sections 546 of data segments 206a (that is, ' D ') can be inclined with minus 64 byte by computing device In-migration is placed in the half of the 6th cache line 525.In this way, the 3rd 128 byte compressed data section 544 and 64 Byte compressed data section 546 still can be shifted by cause the 7th cache line 526 and the 8th cache line 527 Including untapped data 599.However, as 64 byte compressed data sections 546 are not filled up completely with the 6th cache Row 525, therefore the half of the 6th cache line 525 can fetch data 590 filled with crossing.
The fifth line 550 of Figure 50 0 show to by computing device with 4:3 compression ratios are come the 192 byte data sections that compress Placed offset.Specifically, the one 192 byte compressed data corresponding with the first source data section 200a (that is, ' A ') Section 551 can be placed on the second cache line 521, and the 3rd cache by computing device with 128 byte offsets In the half of row 522.The two 192 byte compressed data section corresponding with the second source data section 202a (that is, ' B ') 552 half and the 4th cache that can be placed on the 3rd cache line 522 by computing device with 64 byte offsets In row 523.In this way, the one two 192 byte compressed data section 551 and the 2nd 192 byte compressed data section 552 Can be shifted by share the 3rd cache line 522, so that the first cache line 520 can include untapped number According to 599.Stated differently, since 192 byte offsets, may not need computing device to obtain for the first cache line 520 any data.
Referring also to the fifth line 550 of Figure 50 0, the 64 bytes Jing pressure corresponding with the 3rd source data section 200c (that is, ' C ') Contracting data segments 554 can be placed on 128 byte offsets in the half of the 6th cache line 525 by computing device.With 256 4th source data section 206a (that is, ' D ') corresponding byte compressed data sections 556 can be by computing device with negative 64 byte offsets are being placed on the half of the 6th cache line 525, whole 7th cache line 526, and the 8th at a high speed In the half of cache lines 527.In this way, the 5th cache line 524 can include untapped data 599.However, cross taking Data 591 can be in the half of the 8th cache line 527.
6th row 560 of Figure 50 0 show to by computing device with 4:4 compression ratios are come the 256 byte data sections that compress Placed offset.Specifically, the one 256 byte compressed data corresponding with the first source data section 200a (that is, ' A ') Section 561 can be placed on the half of the first cache line 520, whole second high speed by computing device with 64 byte offsets Cache lines 521, and the half of the 3rd cache line 522 in.It is corresponding with the second source data section 202a (that is, ' B ') 2nd 256 byte compressed data section 562a can be placed on the 3rd cache line by computing device with 64 byte offsets In 522 half and whole 4th cache line 523.However, computing device can be configured to the 2nd 256 byte Compressed data section 562a is stored in two single parts.Therefore, the 2nd 256 byte compressed data section 562a Remainder 562b (being referred to as " B-Rem " in Fig. 5) can be placed on the first half of the first cache line 520 by computing device It is interior.The threeth compressed data section 564a corresponding with the 3rd source data section 204a (that is, ' C ') can not had by computing device Be placed on offsetting the 5th cache line 524, and the half of the 6th cache line 525 in.Press similar to the 2nd Jing Contracting data segments 502a, the 3rd compressed data section 564a can be separated into two parts to avoid across intertexture.Therefore, The remainder 564b of the 3rd compressed data section 564a can be placed in the half of the 8th cache line 527.With 64 four source data section 206a (that is, ' D ') corresponding byte compressed data sections 566 can be by computing device with minus 64 word Section offsets to be placed in the half of the 6th cache line 525.However, crossing, fetch data 592 can be in the 8th cache line In 527 half.
Fig. 6 show it is in being suitable in terms of some, include it is corresponding with compression ratio and base address parity values Basic skew tables of data figure.As described above with reference to Figure 5, in terms of realization, the computing device of compact technique can make Tables of data 600 is performed with the compression ratio information 610 and base address parity information 602 of specific compressed data section Search, to identify the size that should be used for the basic skew that (or access) data segments are placed in cache.At some In aspect, tables of data 600 can be stored in the two dimension in the memorizer of computing device (2D) array.
The tables of data 600 can be similar to the tables of data 300 above with reference to described by Fig. 3, and difference is the base of Fig. 6 Address parity check information 602 can be the instruction of the value of the position 8 and position 9 of the base address to data segments (for example, for two Position 0 or 1).In other words, the lookup for performing to tables of data 600 may need the value and compression ratio of two positions to set to calculate It is standby to identify basic deviant to be applied to the base address (for example, physical address) of data segments.Retouched in tables of data 600 and Fig. 3 Another difference between the tables of data 300 stated is that tables of data 600 can be stored for some compression ratios (for example, 4:4 compressions Than) more than one basic deviant and two positions associated with data segments address (for example, physical address) value.Tool For body, when need by segments apart into two affairs with avoid transnational page boundary or interweave when, tables of data 600 can include pin The first basic deviant and the second basic deviant for the second affairs to the first affairs.In these cases, data Table 600 can indicate the size of each affairs being associated with each basic deviant.In certain aspects, computing device can be with Lookup is performed using other combinations of position, wherein this combination can be based on the hash to institute's data storage.
For example, in response to for 4:1 compression ratio is come compressing and have 0 value and for its position 8 with for its position 9 The data segments of the base address with 0 value to perform lookup to tables of data 600, and it is substantially inclined that computing device can identify 256 bytes Move.In response to for 4:1 compression ratio is come compressing and 9 have 0 value and for which is 8 with 1 value with for which The data segments of base address to perform tables of data 600 lookup, and computing device can identify 64 bytes and offset substantially.In response to pin To with 4:1 compression ratio come it is compressing and with for its 9 have 1 value and for its 8 with 0 value base address data Section to perform tables of data 600 lookup, and computing device can identify 128 bytes and offset substantially.In response to for 4:1 compression Than come compressing and 9 there is 1 value and for its 8 data segments of base address with 1 value carrys out logarithm with for which Lookup is performed according to table 600, computing device can identify minus 64 byte and offset substantially.
Another example is lifted, in response to for 4:2 compression ratios come it is compressing and with for its position 9 have 0 value and For the data segments of its 8 base address with 0 value to perform tables of data 600 lookup, computing device can identify 128 words Section is basic to be offset.In response to for 4:2 compression ratios are come compressing and have 0 value and for its position 8 has with for its position 9 The data segments for having the base address of 1 value to perform tables of data 600 lookup, and computing device can identify 128 bytes and offset substantially. In response to for 4:2 compression ratios come it is compressing and with for its 9 have 1 value and for its 8 with 0 value base The data segments of location to perform tables of data 600 lookup, and computing device can identify 0 byte and be offset (that is, without substantially inclined substantially Move).In response to for 4:2 compression ratios are come compressing and 9 have 1 value and for which is 8 with 1 value with for which The data segments of base address to perform tables of data 600 lookup, and computing device can identify 0 byte and offset substantially (that is, no base This skew).
Another example is lifted, in response to for 4:3 compression ratios come it is compressing and with for its position 9 have 0 value and For the data segments of its 8 base address with 0 value to perform tables of data 600 lookup, computing device can identify 128 words Section is basic to be offset.In response to for 4:3 compression ratios are come compressing and have 0 value and for its position 8 has with for its position 9 The data segments for having the base address of 1 value to perform tables of data 600 lookup, and computing device can identify 64 bytes and offset substantially. In response to for 4:3 compression ratios come it is compressing and with for its 9 have 1 value and for its 8 with 0 value base The data segments of location to perform tables of data 600 lookup, and computing device can identify 0 byte and be offset (that is, without substantially inclined substantially Move).In response to for 4:3 compression ratios are come compressing and 9 have 1 value and for which is 8 with 1 value with for which The data segments of base address to perform tables of data 600 lookup, and computing device can identify minus 64 byte and offset substantially.
Another example is lifted, in response to for 4:4 compression ratios come it is compressing and with for its position 9 have 0 value and For the data segments of its 8 base address with 0 value to perform tables of data 600 lookup, computing device can identify 64 words Section is basic to be offset.In response to for 4:4 compression ratios are come compressing and have 0 value and for its position 8 has with for its position 9 The data segments for having the base address of 1 value to perform tables of data 600 lookup, and it is 64 bytes that computing device can be identified for size Minus 256 byte of the first affairs offset substantially and for size be that 64 bytes of the second affairs of 192 bytes are offset substantially. In response to for 4:4 compression ratios come it is compressing and with for its 9 have 1 value and for its 8 with 0 value base The data segments of location to perform tables of data 600 lookup, computing device can identify for size be 192 bytes the first affairs 0 byte offset substantially and for size be that 448 bytes of the second affairs of 64 bytes are offset substantially.In response to for 4: 4 compression ratios come it is compressing and with for its 9 have 1 value and for its 8 have 1 value base address data segments Lookup is performed to tables of data 600, computing device can identify minus 64 byte and offset substantially.
Fig. 7 A show according to another aspect cache line compact technique, can be by computing device to identify For the exemplary pseudo-code 700 of the function of the new address of placed offset of the compressed data section in cache.For example, Computing device can be configured with and instruct so as to as shown in Fig. 5 above to place data segments as described below. False code 400 of the false code 700 of Fig. 7 A similar to Fig. 4 A described above, difference are that false code 700 can be wrapped Include for so that computing device utilizes the two of base address (for example, physical address) when search operation is performed to identify basic skew The instruction of individual position (that is, position 8 and position 9), and can be additionally configured to potentially return two offset address.Depending on base address It is physical address or virtual address, this offset address can be offsetting physical address or skew virtual address.By false code Function shown in 700 can by computing device for example during packing routine (for example, the packing routine to data set) is performed or Afterwards, as a part of performing of application, firmware, application programming interface (API), routine and/or operation.It should be recognized that The purpose that the false code 700 is in order at general remark is provided, and therefore is not intended to represent exercisable any specific volume Cheng Yuyan, structure or formatting.
False code 700 (being referred to as " false code B " in Fig. 7 A) can be represented and be referred to as " getNewAddress_bit8-9 The function of () ", the function may need the Base_Address |input paramete related to specific data segments and Compression_Ratio |input parametes.In certain aspects, Base_Address can be the binary form of physical address Show.The operation content of function can include for computing device to Base_Address |input parametes perform the first shift right operation with Just the position 8 of the base address of data segments is moved to into rightmost position for storage in the instruction in Shift variables.The displacement may be used also So that the position 9 of Base_Address to be moved to the second rightmost position in Shift variables.Function can be included for shifted Location (that is, Shift variables) and ' 1 ' value (for example, " ... 011 ") for two rightmost positions perform step-by-step and computing (or in Fig. 7 A ' & ') instruction, the computing can generate the value being stored in Parity variables.In other words, step-by-step can be by Jing with computing Any other zero in displacement address (that is, Shift variables), so that shifted address may serve as looking into for performing The parity values looked for.
Function shown in false code 700 can also include being input into using Compression_Ratio for computing device Parameter and the Parity variables for being calculated to perform predefined table the instruction of search operation, such as with reference to described by Fig. 5. However, two kinds of possible situations that more than one basic deviant is returned from table search may be needed due to existing, therefore function can Whether to be equal to 4 including assessment Compression_Ratio:Whether indicating bit 9 is identical with position 8 for the value of 4 or Parity variables (that is, " ... 00 " or " ... 11 ").If computing device determines that Compression_Ratio is equal to 4:4 or Parity variables Value indicating bit 9 (that is, " ... 00 " or " ... 11 ") identical with position 8, then computing device can using Compression_Ratio and Parity performing search operation, to fetch single deviant for storage in Offset variables.The single deviant can To combine with Base_Address and return for placed offset.
If however, computing device determines that Compression_Ratio is equal to 4:The value indicating bit of 4 and Parity variables 9 and position 8 different (that is, " ... 01 " or " ... 10 "), then computing device can be held using Compression_Ratio and Parity Row search operation, to fetch two deviants for storage in Offset [] aray variable.Each in two deviants Deviant individually can be combined with Base_Address, to generate the new address of two affairs for data segments.At each In aspect, the size and deviant of two affairs can be returned.In certain aspects, it is possible to use calling or function in addition Individually to calculate and/or be returned for the size and deviant of these affairs.
In certain aspects, computing device can be configured with simplified logic, circuit, software instruction and/or routine Generate new address rather than utilize table search, as indicated by using false code 700.
Fig. 7 B- Fig. 7 C show the example that can be used in the calling of the function to the false code 700 described in Fig. 7 A Property false code and value.It should be recognized that the example values in Fig. 7 B- Fig. 7 C are in order at descriptive purpose, it is not intended that limit each The scope of aspect.
Fig. 7 B show can by computing device with call as described in Fig. 7 A above for determining data field The exemplary pseudo-code 750 of the exemplary functions of the new offset address of section.Exemplary pseudo-code 750 can be included for arranging The instruction of the value of Base_Address and Compression_Ratio variables, wherein Base_Address and Compression_ Ratio variables may serve as the |input paramete called to getNewAddress_bit8-9 () function.For example, Base_ Address can be configured so that the physical address (for example, " ... 100000000 ") of data segments, wherein physical address include to It is few nine.Another example is lifted, Compression_Ratio variables can be configured so that instruction computing device is calculated using compression Method carrys out the value (for example, 4 of the hierarchy compression (or compression ratio) of compressed data section:4).False code 750 can include using these |input paramete and value are called to function getNewAddress_bit8-9 (), and can enable computing device in the future The number that the placed offset in cache line is for example realized in new_address variables is stored in from the return value of function According to the new physicses address of section.
Fig. 7 C show exemplary pseudo-code 760, and the false code 760 still also includes similar to the false code 700 of Fig. 7 A For the example values of each instruction based on the input parameter value described in above figure 7B.In other words, false code 760 is The function equivalent scheme of the exemplary pseudo-code 700 of Fig. 7 A, which has to property function getNewAddress_bit8- as an example The instruction of a part of increase of the example values of calculating to perform of 9 ().
For example, in fig. 7 c, function can utilize the Base_Address |input parametes with " ... 100000000 " value with And have " 4:4 " the Compression_Ratio |input parametes of value.Move to right in response to performing to Base_Address |input parametes Computing, computing device can generate the value of " ... 000000010 " so as to the base address (for example, physical address) by data segments Position 8 moves to rightmost position for storage in Shift variables.Function can be included for Shift variables and " ... 11 " value The instruction of step-by-step and computing is performed, the computing can generate the value being stored in Parity variables.For example, when Shift becomes measurer When having the value of " ... 000000010 ", step-by-step and computing can produce the Parity variate-values of " ... 000000010 ".
As exemplary Compression_Ratio is " 4:4 " and exemplary Parity variate-values are not equal to " ... 00 " Or " ... 11 ", therefore computing device may not perform from look-up table the operation for fetching single basic deviant.Conversely, computing device The instruction of the lookup that two basic deviants are returned for execution can be performed.Specifically, utilize showing for " ... 000000010 " " the 4 of example property Parity variate-values and Compression_Ratio:4 " example values, to predefined described in Fig. 6 The result of the search operation of table can return the second basic deviant of the first basic deviant and 448 bytes of 0 byte.One In a little aspects, the result of lookup can also include the size of the affairs being associated with basic deviant.Function can subsequently be returned For two new basic offset address of two affairs of data segments, the first address is the group of 0 byte and Base_Address Close (for example, " ... 100000000 "+0 byte), and the second address be the combination of 448 bytes and Base_Address (for example, " ... 100000000 "+448 bytes).
Fig. 8 A show the aspect method 800 that the data in the cache line of cache are compacted for computing device. Method 800 can by computing device as routine, application, function or can with reference to or in response to computing device conventional compact Algorithm and other operations for occurring are performing.For example, in response to compressed data section, computing device can perform method 800 Operation is so that placed offset compressed data section is so that data segments are overlapped on a cache line.Although following retouches State and may refer to can be placed in the individual data section in offset address (for example, offsetting physical address), it will be appreciated that meter Calculation equipment can be configured to execution method 800 for example to place multiple data segments in the loop.Additionally, should realize Arrive, computing device can perform the operation of method 800 with the data (compressed or uncompressed) by any variable-length It is placed in various types of memory cells or cache element.In certain aspects, computing device can be configured to The operation of execution method 800 is placing the data segments compressed by another equipment.
In frame 802, (for example, base is physically for the base address that the processor of computing device can identify for data segments Location or base virtual address).As described above, the base address can be the generally associated cache line of data segments or Cached address.For example, base address can be data segments (or source data section) in uncompressed state when be stored in The address of the first cache line therein.Base address can be the initial initial address of data segments, but which may not refer to Show the quantity of the cache line (or cache memory space) that may require data storage section.In certain aspects, base address Can be the initial address of the block asked in memorizer, which can be represented by the row distributed in cache, but generally Can not be represented by the row distributed in cache, because cache can be allocated for compressed version.
In frame 804, the processor of computing device can identify size of data for data segments (for example, based on pressure Contracting ratio).For example, the size of data for being identified can be based on the compression ratio of data segments, such as by computing device to another data Buffer performs read operation to obtain what compression ratio was identified.In in all fields, it is configured with depending on computing device Compression algorithm type, it is understood that there may be for the multiple available compression ratio of data segments.For example, when computing device is configured to During using compression algorithm, can be with 4:1 compression ratio, 4:2 compression ratios (or 2:1 compression ratio), 4:3 compression ratios, and/or 4:4 compressions Than (or 1:1 compression ratio) carry out compressed data section.The size of data for data segments for being identified can be based on so that data The size maximum possible of section reduces the compression ratio of (or maximum compression).In in all fields, the size of data that identified (or pressure Contracting size) can be different for different types of data, data structure and/or context.For example, computing device can be identified For the first size of data of the first compression ratio of the first data segments of the first data type, and identify for the second data class Second size of data of the second compression ratio of the second data segments of type.
In frame 806, the processor of computing device can be based on the size of data for being identified of data segments and base address To obtain basic skew.Basic skew can be a quantity of byte, such as 64 bytes, 128 bytes, 192 bytes and/or 256 bytes.In certain aspects, basic skew can be the byte of negative, such as minus 64 byte.In certain aspects, substantially The size of skew can be the multiple of the size of the half of the cache line of computing device high speed caching.In some respects In, computing device can be performed to tables of data and search and fetch basic deviant, as above with reference to described by Fig. 3 or Fig. 6. In in terms of some, computing device can determine basic skew using equation, routine, circuit and/or software module.Fig. 8 B are illustrated For obtaining the particular aspects operation of basic skew.
In block 808, the processor of computing device can be carried out to base address by using the basic skew for being obtained partially In-migration calculates offset address.For example, computing device can be by the basic skew for being obtained (for example, a quantity of byte) and base Address is combined to obtain new address (for example, new physicses address), the new address occur base address in the caches it It is front or afterwards.When the basic skew for being obtained be on the occasion of when, the calculating can obtain the offset address bigger than base address, and work as When the basic skew for being obtained is negative value, the offset address less than base address is obtained.In optional frame 810, the place of computing device Reason device can be in data storage section at the offset address for being calculated.In other words, the offset address for being calculated can be used for filling out Fill cache.For example, computing device can read data segments from memorizer (for example, DDR) and be loaded into data segments With in one or more cache lines that offset address starts.In certain aspects, computing device can be deposited from compressed Reservoir reads data segments as compressed data section.In optional frame 812, the processor of computing device can calculated Offset address at read data segments.In other words, the offset address for being calculated can be used for fetching or finding being previously stored Data segments in cache line.For example, computing device can use calculated offset address to fetch from cache Operation etc. of the data segments for application.Additionally, reading for operating in optional frame 812 can have the size specified, example Such as size of data determined above.In optional frame 813, the processor of computing device can enter to the data segments for being read Row decompression.In certain aspects, decompressed data segments can be stored in locally, for example, be stored in its associated base In address.
In certain aspects, data segments in the caches, and may can not need to obtain from memorizer and insert Enter in cache.In certain aspects, may not there is cache, and prefetch can by various aspects technology come Be benefited.
Fig. 8 B show the aspect method 850 that the data in the cache line of cache are compacted for computing device. Operation of the operation of method 850 similar to the method 800 above with reference to described by Fig. 8 A, difference are that method 850 can be with Including the operation for performing search operation in the predefined tables of data of deviant.For example, computing device can be configured It is to fetch deviant from tables of data using the information with regard to base address and compression ratio, as above with reference to described by Fig. 3 and Fig. 6 's.
In frame 802-804,808-812 operation can with it is identical above with reference to described by Fig. 8 A.In frame 852, meter The processor of calculation equipment can identify the parity values for data segments based on base address.In certain aspects, calculate Equipment can assess position 8 in base address to identify parity values, the parity values indicating bit 8 be even number value (that is, 0) also It is odd number value (that is, 1).The use of the parity values is shown in Fig. 2-Fig. 4 C.In certain aspects, in order to identify odd even school Test value, computing device shift right operation can be performed to the binary representation of base address so that position 8 becomes rightmost position, and to " ... 001 " the binary result application step-by-step of binary value and shift right operation and computing are identifying parity values.In some respects In, computing device can assess the position 9 and position 8 of base address with identify the two positions whether with may serve as tables of data or 2D numbers The predefined combination (for example, " 00 ", " 01 ", " 10 ", " 11 ") of the index in group matches.The odd even is shown in Fig. 5-Fig. 7 C The use of check value.In certain aspects, in order to identify parity values, computing device can be to the binary representation of base address Shift right operation is performed so that position 8 becomes rightmost position, and the binary result application to " ... 011 " binary value and shift right operation is pressed Position with computing with identify parity values (for example, in the binary result by shift right operation all values zero, the two of base address Except 8 and 9 that system is represented).
In frame 854, the processor of computing device can be by using the size of data for being identified (for example, based on being marked The size of the compression ratio of knowledge) and the table of the parity values that identified to being stored perform lookup, obtain basic skew.Example Such as, computing device can use identified compression ratio (for example, 4:1 etc.) and parity values expression or code as number According to table or the index of 2D arrays, go out as shown in figs. 3 and 6.In certain aspects, the skew for being obtained can include being directed to First skew of the first affairs and the second skew for the second affairs, as described above.For example, in order to avoid permission Across interweaving or page boundary, data segments can be separated into two affairs to the data segments of skew, the two affairs with from base First skew (for example, 0 byte) and the second skew (for example, 448 byte) of location is associated.Computing device can continue frame 808 In operation calculating offset address.In certain aspects, when the basic skew for being obtained includes two skews, computing device Two offset address can be calculated for detached section is placed in cache.
Fig. 8 C show the aspect method 870 that the data in the cache line of cache are compacted for computing device. Operation of the operation of method 870 similar to the method 800 above with reference to described by Fig. 8 A, difference are that method 870 can be with Including for prefetching the operation of next data segments.Operation in frame 802-804,808-812 can with above with reference to Fig. 8 A institutes What is described is identical.
In decision box 872, the processor of computing device can determine that next data segments (or second data segments) are It is no with using the operation in frame 812 come the consecutive correct compression ratio of data segments (or first data segments) that reads. For example, when computing device is configured to larger read requests size to perform optional prefetching, computing device can fetch or Read and data segments (performing the operation of frame 802-808 in cache to be sized and identify for the data segments) Close to subsequent data chunk (or compressed data block).This determination can be based on the data field being configured to determine that with read Section close to data block whether with the logic testing routine of the consecutive correct hierarchy compression of data segments for being read or Circuit.In other words, computing device can determine whether next data segments are compressed to a certain being sized such that to data field The reading of section can also include next data segments.In response to determining that next data segments have consecutive correct compression ratio (namely it is decided that frame 872="Yes"), the processor of computing device can in frame 874 using the reading to another data segments come Prefetch next data segments.In response to determining that next data segments do not have consecutive correct compression ratio (namely it is decided that frame 872 ="No"), the processor of computing device can be with ending method 870.
Fig. 9 shows The aspect method 900 of data.Operation of the operation of method 900 similar to the method 800 above with reference to described by Fig. 8 A, difference Place is that method 900 can be included for computing device also for example by holding before data segments are stored in offset address Row compression algorithm carrys out the operation of compressed data section.
In frame 802,808-808 operation can with it is identical above with reference to described by Fig. 8 A.In frame 901, calculating sets Standby processor can identify the compression ratio for data segments, for example can be with the optimal compression ratio of compressed data section (for example, 4:1、4:2、4:3、4:4 etc.).Operation in frame 901 can operate class with those described by the frame 804 above with reference to Fig. 8 Seemingly.In box 902, the processor of computing device for example by fetching data section from cache read can be read at base address Section fetch data as uncompressed data.In frame 904, the processor of computing device can be pressed with the compression ratio for being identified Contracting data segments.For example, the compression ratio (for example, 4 for being identified based on the maximum that can be used for data segments:1、4:2、4:3、4:4 etc. Deng), computing device can perform compression algorithm or routine so that the size of data segments reduces.As described above, count Calculation equipment can obtain basic skew in frame 806 and calculate offset address in block 808.In frame 906, computing device Processor can store compressed data section at the offset address for being calculated.
Various forms of computing devices (for example, personal computer, smart phone, laptop computer etc.) can be used for Realize various aspects.This computing device generally includes the component shown in Figure 10, and wherein Figure 10 shows example calculation Equipment 1000.In in all fields, computing device 1000 can include processor 1001, and wherein processor 1001 is coupled to touch Screen controller 1004 and internal storage 1002.Processor 1001 can be specified for universal or special process task one Individual or multiple multinuclear IC.Internal storage 1002 can be volatibility or nonvolatile memory, and can be safety and/or Encrypted memory or dangerous and/or unencryption memorizer or its combination in any.Touch screen controller 1004 and process Device 1001 is also coupled to touch panel 1012, such as resistance-type sensing touch screen, capacitance type sensing touch screen, infrared sense Survey touch screen etc..In certain aspects, computing device 1000 can have one or more 1008 (examples of wireless signal transceiver Such as,Wi-Fi, RF radio) and the antenna for sending and receiving 1010, these antenna is coupled to each other and/or is coupled to processor 1001.Transceiver 1008 and antenna 1010 can with it is mentioned above Circuit be used together to realize various wireless transmission protocol stacks and interface.In certain aspects, computing device 1000 can be wrapped Cellular network wireless modem chip 1016 is included, which is realized the communication via cellular network and is coupled to processor.Meter Calculation equipment 1000 can include the ancillary equipment connecting interface 1018 for being coupled to processor 1001.Ancillary equipment connecting interface 1018 Can individually be configured to receive a type of connection, or multiple be configured to receive various types of public or special Some physics and communication connection, such as USB, FireWire, Thunderbolt or PCIe.Ancillary equipment connecting interface 1018 is also It may be coupled to the ancillary equipment connectivity port (not shown) of similar configuration.Computing device 1000 can also be included for providing sound The speaker 1014 of frequency output.The shell that computing device 1000 can also include by the combination of plastics, metal or each material constructing Body 1020 is for comprising all or some component in component discussed in this article.Computing device 1000 can include being coupled to The power supply 1022 of processor 1001, such as disposable or rechargeable battery.Rechargeable battery is also coupled to periphery and sets Standby connectivity port receives charging current with the source from outside computing device 1000.
Processor 1001 can be configured to perform various functions (including above by software instruction (for example, applying) The function of described various aspects) any programmable microprocessor, pico computer or multiple processor chips.Set at each In standby, multiple processors can be provided, such as one processor is exclusively used in radio communication function, and a processor is exclusively used in Operation other application.Generally, software application is accessed and can be stored in internal storage before being loaded into processor 1001 In 1002.Processor 1001 can include being enough to store the internal storage of application software instructions.In many equipment, inside is deposited Reservoir can be volatibility or nonvolatile memory (for example, flash memory) or the mixing of the two.For purposes of this description, it is right The general reference of memorizer refers to the memorizer that can be accessed by processor 1001, including internal storage or is inserted into each and sets Removable memory and the memorizer in processor 1001 in standby.
Preceding method description and procedure graph are provided as just illustrated examples, it is not intended that require or hint must be used The step of given order is to perform various aspects.As skilled in the art will be aware of, can with any order come Perform the order of the step in aforementioned aspect.Such as " hereafter ", " and then ", the word of " following " etc is not intended to step Order limited;These words are only used for guiding reader through the description to method.Additionally, in the singular will to right Any reference of key element is sought, for example, be should not be construed as and key element is limited to into odd number using article " ", " one " or " described ".
Various illustrative box, module, circuit and algorithm steps with reference to described by each side disclosed herein can To be embodied as electronic hardware, computer software or combination.In order to be clearly shown that this of hardware and software can be mutual It is transsexual, general description has been carried out around its function to various illustrative components, frame, module, circuit and step above. Hardware or software are implemented as this function, this depends on the design specifically applied and be applied in total system about Beam.Technical staff can be directed to each application-specific and realize described function in a different manner, but this realization is determined Plan should not be interpreted as causing deviation the scope of the present invention.
Using being designed to perform the general processor of functionality described herein, digital signal processor (DSP), special Integrated circuit (ASIC), field programmable gate array (FPGA) or other PLDs, discrete gate or transistor are patrolled Collect device, discrete hardware components or its combination in any, it is possible to achieve or perform for realizing with reference to each side disclosed herein The hardware of described various illustrative logical block, box, module and circuit.General processor can be microprocessor Device, but in replacement scheme, the processor can be any conventional processor, controller, microcontroller or state machine.Process Device is also implemented as the combination of computing device, for example, the combination of DSP and microprocessor, multi-microprocessor and DSP core With reference to one or more microprocessors, or any other such configuration.Alternatively, can be by the electricity specific to given function Road is performing some steps or method.
In one or more illustrative aspects, institute can be realized in hardware, software, firmware or its combination in any The function of description.If realized in software, the function can be stored in non-temporary as one or more instructions or code When property processor is readable, on computer-readable or server computer-readable recording medium or non-transitory processor readable storage medium or Person is transmitted by which.The step of method disclosed herein or algorithm can be embodied in the executable software module of processor or In processor executable, the executable software module of the processor or processor executable may reside within non-transitory Computer-readable recording medium, non-transitory server readable storage medium storing program for executing and/or non-transitory processor readable storage medium On.For example, such instruction can be stored processor executable software instruction.Tangible non-transitory computer-readable is deposited Storage media can be by any usable medium of computer access.Mode nonrestrictive by way of example, it is this non-temporary When property computer-readable medium can include RAM, ROM, EEPROM, CD-ROM or other optical disc storages, disk storage or other Magnetic storage apparatus, or can be used for store with instruction or data structure form expectation program code and can be by computer Any other medium of access.As it is used herein, disk (disk) and CD (disc) are including compact disk (CD), laser The usual magnetically replicate data of CD, CD, digital versatile disc (DVD), floppy disk and Blu-ray Disc, wherein disk, and light Disk utilizes laser to optically replicate data.Combinations of the above is also included within the scope of non-transitory computer-readable medium It is interior.In addition, the operation of method or algorithm can reside in as one of code and/or instruction or combination in any or set can be with Tangible non-transitory processor readable storage medium and/or the computer-readable medium being merged in computer program On.
Being previously described so that any person skilled in the art can implement or use this to disclosed aspect is provided It is bright.Various modifications in terms of these will be apparent to those skilled in the art, and without departing from this In the case of bright spirit or scope, generic principles defined herein can apply to other side.Therefore, the present invention not Be intended to be limited to aspect shown herein, but to be given with appended claims and principles disclosed herein and The consistent widest scope of novel features.

Claims (62)

1. a kind of method of the data for being compacted in the cache line of the cache of computing device, including:
The base address for the first data segments is identified by the processor of the computing device;
The size of data for first data segments is identified by the processor of the computing device;
By the computing device based on first data segments the size of data for being identified and the base address obtaining base This skew;And
The base address is entered line displacement to calculate offset address by using the basic skew for being obtained by the computing device, Wherein, the offset address for being calculated is associated with the second data segments.
2. method according to claim 1, wherein, the size of data for being identified and institute based on first data segments State base address obtain the basic skew be by the computing device the computing device software or be coupled to described One of special circuit of the processor of computing device performing, and
Wherein, line displacement is entered to the base address by using the basic skew for being obtained to calculate the offset address be by institute The Special electric of the computing device software for stating computing device or the processor for being coupled to the computing device One of road is performing.
3. method according to claim 1, wherein, the base address is physical address or virtual address.
4. method according to claim 1, wherein, the size of data for being identified be based on the first data segments phase The compression ratio of association is identifying.
5. method according to claim 4, wherein, the compression ratio is in the following:4:1 compression ratio, 4:2 Compression ratio, 4:3 compression ratios or 4:4 compression ratios.
6. method according to claim 1, wherein, by the computing device based on the compression ratio for being identified and described first The base address of data segments come obtain it is described it is basic skew include:
By the processor of the computing device based on the base address identifying the odd even for first data segments Check value;And
By the processor of the computing device using the compression ratio for being identified and the parity values for being identified to obtain State basic skew.
7. method according to claim 6, wherein, obtained using identified compression ratio and the parity values for being identified Obtaining the basic skew includes:By the computing device the processor is by using the compression ratio that identified and is identified Parity values perform lookup to obtain the basic skew to the table for being stored.
8. method according to claim 6, wherein, the parity values indicate the base of first data segments One in address is odd number or even number.
9. method according to claim 6, wherein, the parity values are based on described in first data segments Two positions in base address.
10. method according to claim 1, wherein, by the computing device being marked based on first data segments The size of data of knowledge and the base address include obtaining the basic skew:
The the first basic skew and the second basic skew and for first data segments is obtained by the computing device One size of data and the second size of data;And by the computing device by using the basic skew for being obtained to the base Line displacement is entered in location to be included calculating the offset address:First for first size of data is calculated by the computing device Offset address and the second offset address for second size of data.
11. methods according to claim 1, also include:It is inclined what is calculated by the processor of the computing device Move storage first data segments at address.
12. methods according to claim 11, wherein, by the processor of the computing device in the skew for being calculated First data segments are stored at address to be included:
First data segments are read as uncompressed in the base address by the processor of the computing device Data;
By the processor of the computing device with the size of data that identified compressing first data segments;And
Compressed first data segments are stored in the offset address for being calculated by the processor of the computing device.
13. methods according to claim 11, wherein, it is right by using the skew substantially for being obtained by the computing device The base address enter line displacement calculate the offset address first data segments by compression after complete.
14. methods according to claim 1, also include:It is inclined what is calculated by the processor of the computing device Move.
15. methods according to claim 14, also include:Determine that by the computing device second data segments are It is no with order to the consecutive correct compression ratio of first data segments,
Wherein, the first data segments bag is read in the offset address for being calculated by the processor of the computing device Include:In response to determining that second data segments have the correct compression ratio, by the processor profit of the computing device Second data segments are prefetched with first data segments.
16. methods according to claim 14, also include:By the computing device the processor to read One data segments are decompressed.
A kind of 17. computing devices, including:
The unit of the base address of the first data segments is directed to for mark;
The unit of the size of data of first data segments is directed to for mark;
The list of basic skew is obtained for based on the size of data for being identified of first data segments and the base address Unit;And
For entering line displacement to the base address to calculate the unit of offset address by using the basic skew for being obtained, its In, the offset address for being calculated is associated with the second data segments.
18. computing device according to claim 17, wherein, the base address is physical address or virtual address.
19. computing devices according to claim 17, wherein, the size of data for being identified be based on first data Section associated compression ratio is identifying.
20. computing devices according to claim 19, wherein, the compression ratio is in the following:4:1 compression Than, 4:2 compression ratios, 4:3 compression ratios or 4:4 compression ratios.
21. computing devices according to claim 17, wherein, for based on the compression ratio for being identified and first data The base address of section includes come the unit for obtaining the basic skew:
The unit of the parity values for first data segments is identified for based on the base address;And
The unit of the basic skew is obtained for using the compression ratio for being identified and the parity values for being identified.
22. computing devices according to claim 21, wherein, for the odd even for using identified compression ratio He identified Check value includes come the unit for obtaining the basic skew:For by using the compression ratio for being identified and the odd even school for being identified Test table of the value to being stored and perform the unit searched to obtain the basic skew.
23. computing devices according to claim 21, wherein, the parity values indicate first data segments One in the base address is odd number or even number.
24. computing devices according to claim 21, wherein, the parity values are based on first data segments The base address in two positions.
25. computing devices according to claim 17, wherein:
The basic skew is obtained for based on the size of data for being identified of first data segments and the base address Unit include:For obtaining the first basic skew and the second basic skew and the first number for first data segments According to size and the unit of the second size of data;And
For entering line displacement to the base address to calculate the unit of the offset address by using the basic skew for being obtained Including:For calculating for the first offset address of first size of data and for the second of second size of data The unit of offset address.
26. computing devices according to claim 17, also include:For storage described the at the offset address for being calculated The unit of one data segments.
27. computing devices according to claim 26, wherein, for storing described first at the offset address for being calculated The unit of data segments includes:
For first data segments are read at the base address as the unit of uncompressed data;
The unit of first data segments is compressed for the size of data that identified;And
For the unit of the first compressed data segments is stored at the offset address for being calculated.
28. computing devices according to claim 26, wherein, for by using the basic skew for being obtained to the base The unit that line displacement is entered to calculate the offset address in address includes:For first data segments by compression after lead to Cross and utilize the basic skew for being obtained to enter line displacement to the base address to calculate the unit of the offset address.
29. computing devices according to claim 17, also include:For described the is read at the offset address for being calculated The unit of one data segments.
30. computing devices according to claim 29, also include:For determining whether second data segments have it is Unit with the consecutive correct compression ratio of first data segments,
Wherein, the unit for first data segments are read at the offset address for being calculated includes:For in response to true Fixed second data segments have the correct compression ratio, and second data field is prefetched using first data segments The unit of section.
31. computing devices according to claim 29, also include:Decompress for the first data segments to being read The unit of contracting.
A kind of 32. computing devices, including processor, the processor are configured with for performing the process for including following operation Device executable instruction:
Base address of the mark for the first data segments;
Size of data of the mark for first data segments;
Substantially offset to obtain based on the size of data for being identified and the base address of first data segments;And
The base address is entered line displacement to calculate offset address by using the basic skew for being obtained, wherein, calculated Offset address is associated with the second data segments.
33. computing devices according to claim 32, also including special circuit, the special circuit is coupled to the process Device and be configured to perform include following operation:
Based on the size of data for being identified and the base address of first data segments obtaining the basic skew;And
Enter line displacement to the base address to calculate the offset address by using the basic skew for being obtained.
34. computing devices according to claim 32, wherein, the base address is physical address or virtual address.
35. computing devices according to claim 32, wherein, the size of data for being identified be by the processor based on First data segments associated compression ratios is identifying.
36. computing devices according to claim 35, wherein, the compression ratio is in the following:4:1 compression Than, 4:2 compression ratios, 4:3 compression ratios or 4:4 compression ratios.
37. computing devices according to claim 32, wherein, the processor be configured with processor executable with Operation is performed, so that obtaining described basic based on the base address of the compression ratio for being identified and first data segments Skew includes:
The parity values for first data segments are identified based on the base address;And
The basic skew is obtained using identified compression ratio and the parity values for being identified.
38. computing devices according to claim 37, wherein, the processor be configured with processor executable with Operation is performed, so that the basic skew is obtained using the compression ratio for being identified and the parity values for being identified including: Lookup is performed to the table for being stored by using the compression ratio for being identified and the parity values for being identified described basic to obtain Skew.
39. computing devices according to claim 37, wherein, the parity values indicate first data segments One in the base address is odd number or even number.
40. computing devices according to claim 37, wherein, the parity values are based on first data segments The base address in two positions.
41. computing devices according to claim 32, wherein, the processor be configured with processor executable with Execution is operable so that:
Based on first data segments the size of data for being identified and the base address come obtain it is described it is basic skew include: Obtain the first basic skew and the second basic skew and the first size of data and the second number for first data segments According to size;And
Entering line displacement to the base address by using the basic skew for being obtained includes calculating the offset address:Calculate pin The first offset address and the second offset address for second size of data to first size of data.
42. computing devices according to claim 32, wherein, the processor be configured with processor executable with Perform:First data segments are stored at the offset address for being calculated.
43. computing devices according to claim 42, wherein, the processor be configured with processor executable with Operation is performed, so that first data segments are stored at the offset address for being calculated including:
First data segments are read at the base address as uncompressed data;
First data segments are compressed with the size of data that identified;And
The first compressed data segments are stored at the offset address for being calculated.
44. computing devices according to claim 42, wherein, the processor be configured with processor executable with Operation is performed, so that entering line displacement to the base address to calculate the offset address by using the basic skew for being obtained First data segments by compression after complete.
45. computing devices according to claim 32, wherein, the processor be configured with processor executable with Perform:First data segments are read at the offset address for being calculated.
46. computing devices according to claim 45, wherein, the processor be configured with processor executable with Perform:Determine second data segments whether with order to mutually continuous with first data segments Correct compression ratio,
Wherein, reading first data segments at the offset address for being calculated includes:In response to determining second data Section has the correct compression ratio, and second data segments are prefetched using first data segments.
47. computing devices according to claim 45, wherein, the processor be configured with processor executable with Perform:The first data segments to being read are decompressed.
A kind of 48. non-transitory processor readable storage mediums for being stored thereon with processor executable, the processor Executable instruction is configured such that the computing device of computing device includes following operation:
Base address of the mark for the first data segments;
Size of data of the mark for first data segments;
Substantially offset to obtain based on the size of data for being identified and the base address of first data segments;And
The base address is entered line displacement to calculate offset address by using the basic skew for being obtained, wherein, calculated Offset address is associated with the second data segments.
49. non-transitory processor readable storage mediums according to claim 48, wherein, the base address is physically Location or virtual address.
50. non-transitory processor readable storage mediums according to claim 48, wherein, the size of data for being identified is By the processor based on the compression ratio being associated with first data segments identifying.
51. non-transitory processor readable storage mediums according to claim 50, wherein, the compression ratio is following One in:4:1 compression ratio, 4:2 compression ratios, 4:3 compression ratios or 4:4 compression ratios.
52. non-transitory processor readable storage mediums according to claim 48, wherein, the processor for being stored can be held Row instruction is configured such that the computing device operation of the computing device so that based on the compression ratio for being identified and The base address of first data segments come obtain it is described it is basic skew include:
The parity values for first data segments are identified based on the base address;And
The basic skew is obtained using identified compression ratio and the parity values for being identified.
53. non-transitory processor readable storage mediums according to claim 52, wherein, the processor for being stored can be held Row instruction is configured such that the computing device operation of the computing device so that use identified compression ratio and The parity values for being identified obtain it is described it is basic skew include:By using the compression ratio for being identified and the odd even for being identified Check value performs lookup to obtain the basic skew to the table for being stored.
54. non-transitory processor readable storage mediums according to claim 52, wherein, the parity values are indicated One in the base address of first data segments is odd number or even number.
55. non-transitory processor readable storage mediums according to claim 52, wherein, the parity values are bases Two positions in the base address of first data segments.
56. non-transitory processor readable storage mediums according to claim 48, wherein, the processor for being stored can be held Row instruction is configured such that the computing device of the computing device is operable so that:
Based on first data segments the size of data for being identified and the base address come obtain it is described it is basic skew include: Obtain the first basic skew and the second basic skew and the first size of data and the second number for first data segments According to size;And
Entering line displacement to the base address by using the basic skew for being obtained includes calculating the offset address:Calculate pin The first offset address and the second offset address for second size of data to first size of data.
57. non-transitory processor readable storage mediums according to claim 48, wherein, the processor for being stored can be held Row instruction is configured such that the computing device of the computing device also includes following operation:In the skew for being calculated First data segments are stored at address.
58. non-transitory processor readable storage mediums according to claim 57, wherein, the processor for being stored can be held Row instruction is configured such that the computing device operation of the computing device, so that at the offset address for being calculated Storing first data segments includes:
First data segments are read at the base address as uncompressed data;
First data segments are compressed with the size of data that identified;And
The first compressed data segments are stored at the offset address for being calculated.
59. non-transitory processor readable storage mediums according to claim 57, wherein, the processor for being stored can be held Row instruction is configured such that the computing device operation of the computing device, so that basic by using what is obtained Skew line displacement is entered to the base address come calculate the offset address be first data segments by compression after it is complete Into.
60. non-transitory processor readable storage mediums according to claim 48, wherein, the processor for being stored can be held Row instruction is configured such that the computing device of the computing device also includes following operation:In the skew for being calculated First data segments are read at address.
61. non-transitory processor readable storage mediums according to claim 60, wherein, the processor for being stored can be held Row instruction is configured such that the computing device of the computing device also includes following operation:Determine second number According to section whether with order to the consecutive correct compression ratio of first data segments,
Wherein, reading first data segments at the offset address for being calculated includes:In response to determining second data Section has the correct compression ratio, and second data segments are prefetched using first data segments.
62. non-transitory processor readable storage mediums according to claim 60, wherein, the processor for being stored can be held Row instruction is configured such that the computing device of the computing device also includes following operation:To read first Data segments are decompressed.
CN201580041874.4A 2014-08-05 2015-07-09 Cache line compaction of compressed data segments Pending CN106575263A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US14/451,639 2014-08-05
US14/451,639 US9361228B2 (en) 2014-08-05 2014-08-05 Cache line compaction of compressed data segments
PCT/US2015/039736 WO2016022247A1 (en) 2014-08-05 2015-07-09 Cache line compaction of compressed data segments

Publications (1)

Publication Number Publication Date
CN106575263A true CN106575263A (en) 2017-04-19

Family

ID=53758529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201580041874.4A Pending CN106575263A (en) 2014-08-05 2015-07-09 Cache line compaction of compressed data segments

Country Status (5)

Country Link
US (2) US9361228B2 (en)
EP (1) EP3178005B1 (en)
JP (1) JP6370988B2 (en)
CN (1) CN106575263A (en)
WO (1) WO2016022247A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111367831A (en) * 2020-03-26 2020-07-03 超验信息科技(长沙)有限公司 Deep prefetching method and component for translation page table, microprocessor and computer equipment
CN112699063A (en) * 2021-03-25 2021-04-23 轸谷科技(南京)有限公司 Dynamic caching method for solving storage bandwidth efficiency of general AI processor
CN114422499A (en) * 2021-12-27 2022-04-29 北京奇艺世纪科技有限公司 File downloading method, system and device

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9361228B2 (en) 2014-08-05 2016-06-07 Qualcomm Incorporated Cache line compaction of compressed data segments
JP2016091242A (en) * 2014-10-31 2016-05-23 富士通株式会社 Cache memory, access method to cache memory and control program
US10025956B2 (en) * 2015-12-18 2018-07-17 Intel Corporation Techniques to compress cryptographic metadata for memory encryption
US9916245B2 (en) * 2016-05-23 2018-03-13 International Business Machines Corporation Accessing partial cachelines in a data cache
US10042737B2 (en) 2016-08-31 2018-08-07 Microsoft Technology Licensing, Llc Program tracing for time travel debugging and analysis
US10031834B2 (en) 2016-08-31 2018-07-24 Microsoft Technology Licensing, Llc Cache-based tracing for time travel debugging and analysis
US10031833B2 (en) 2016-08-31 2018-07-24 Microsoft Technology Licensing, Llc Cache-based tracing for time travel debugging and analysis
US10489273B2 (en) 2016-10-20 2019-11-26 Microsoft Technology Licensing, Llc Reuse of a related thread's cache while recording a trace file of code execution
US10310977B2 (en) 2016-10-20 2019-06-04 Microsoft Technology Licensing, Llc Facilitating recording a trace file of code execution using a processor cache
US10324851B2 (en) 2016-10-20 2019-06-18 Microsoft Technology Licensing, Llc Facilitating recording a trace file of code execution using way-locking in a set-associative processor cache
US10310963B2 (en) 2016-10-20 2019-06-04 Microsoft Technology Licensing, Llc Facilitating recording a trace file of code execution using index bits in a processor cache
US10540250B2 (en) 2016-11-11 2020-01-21 Microsoft Technology Licensing, Llc Reducing storage requirements for storing memory addresses and values
US10318332B2 (en) 2017-04-01 2019-06-11 Microsoft Technology Licensing, Llc Virtual machine execution tracing
US10296442B2 (en) 2017-06-29 2019-05-21 Microsoft Technology Licensing, Llc Distributed time-travel trace recording and replay
US10459824B2 (en) 2017-09-18 2019-10-29 Microsoft Technology Licensing, Llc Cache-based trace recording using cache coherence protocol data
US10558572B2 (en) 2018-01-16 2020-02-11 Microsoft Technology Licensing, Llc Decoupling trace data streams using cache coherence protocol data
US11907091B2 (en) 2018-02-16 2024-02-20 Microsoft Technology Licensing, Llc Trace recording by logging influxes to an upper-layer shared cache, plus cache coherence protocol transitions among lower-layer caches
US10496537B2 (en) 2018-02-23 2019-12-03 Microsoft Technology Licensing, Llc Trace recording by logging influxes to a lower-layer cache based on entries in an upper-layer cache
US10642737B2 (en) 2018-02-23 2020-05-05 Microsoft Technology Licensing, Llc Logging cache influxes by request to a higher-level cache
KR20200006379A (en) * 2018-07-10 2020-01-20 에스케이하이닉스 주식회사 Controller and operating method thereof
US10942808B2 (en) * 2018-12-17 2021-03-09 International Business Machines Corporation Adaptive data and parity placement using compression ratios of storage devices
US10997085B2 (en) * 2019-06-03 2021-05-04 International Business Machines Corporation Compression for flash translation layer
US11601136B2 (en) 2021-06-30 2023-03-07 Bank Of America Corporation System for electronic data compression by automated time-dependent compression algorithm
US11567872B1 (en) * 2021-07-08 2023-01-31 Advanced Micro Devices, Inc. Compression aware prefetch
US11573899B1 (en) * 2021-10-21 2023-02-07 International Business Machines Corporation Transparent interleaving of compressed cache lines
US12014047B2 (en) * 2022-08-24 2024-06-18 Red Hat, Inc. Stream based compressibility with auto-feedback

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030131184A1 (en) * 2002-01-10 2003-07-10 Wayne Kever Apparatus and methods for cache line compression
US20040073747A1 (en) * 2002-10-10 2004-04-15 Synology, Inc. Method, system and apparatus for scanning newly added disk drives and automatically updating RAID configuration and rebuilding RAID data
US20060184734A1 (en) * 2005-02-11 2006-08-17 International Business Machines Corporation Method and apparatus for efficiently accessing both aligned and unaligned data from a memory
CN102141905A (en) * 2010-01-29 2011-08-03 上海芯豪微电子有限公司 Processor system structure
CN102541747A (en) * 2010-10-25 2012-07-04 马维尔国际贸易有限公司 Data compression and encoding in a memory system

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07129470A (en) * 1993-11-09 1995-05-19 Hitachi Ltd Disk control method
JP3426385B2 (en) * 1995-03-09 2003-07-14 富士通株式会社 Disk controller
US6658552B1 (en) * 1998-10-23 2003-12-02 Micron Technology, Inc. Processing system with separate general purpose execution unit and data string manipulation unit
US7143238B2 (en) 2003-09-30 2006-11-28 Intel Corporation Mechanism to compress data in a cache
US7162584B2 (en) 2003-12-29 2007-01-09 Intel Corporation Mechanism to include hints within compressed data
US7162583B2 (en) 2003-12-29 2007-01-09 Intel Corporation Mechanism to store reordered data with compression
US7257693B2 (en) 2004-01-15 2007-08-14 Intel Corporation Multi-processor computing system that employs compressed cache lines' worth of information and processor capable of use in said system
US8341380B2 (en) 2009-09-22 2012-12-25 Nvidia Corporation Efficient memory translator with variable size cache line coverage
US9361228B2 (en) 2014-08-05 2016-06-07 Qualcomm Incorporated Cache line compaction of compressed data segments

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030131184A1 (en) * 2002-01-10 2003-07-10 Wayne Kever Apparatus and methods for cache line compression
US20040073747A1 (en) * 2002-10-10 2004-04-15 Synology, Inc. Method, system and apparatus for scanning newly added disk drives and automatically updating RAID configuration and rebuilding RAID data
US20060184734A1 (en) * 2005-02-11 2006-08-17 International Business Machines Corporation Method and apparatus for efficiently accessing both aligned and unaligned data from a memory
CN102141905A (en) * 2010-01-29 2011-08-03 上海芯豪微电子有限公司 Processor system structure
CN102541747A (en) * 2010-10-25 2012-07-04 马维尔国际贸易有限公司 Data compression and encoding in a memory system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ALAMELDEEN A R ET AL: "Adaptive cache compression for high-performance processors", 《COMPUTER ARCHITECTURE, 2004. PROCEEDINGS. 31ST ANNUAL INTERNATIONAL SY MPOSIUM ON MUNCHEN》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111367831A (en) * 2020-03-26 2020-07-03 超验信息科技(长沙)有限公司 Deep prefetching method and component for translation page table, microprocessor and computer equipment
CN111367831B (en) * 2020-03-26 2022-11-11 超睿科技(长沙)有限公司 Deep prefetching method and component for translation page table, microprocessor and computer equipment
CN112699063A (en) * 2021-03-25 2021-04-23 轸谷科技(南京)有限公司 Dynamic caching method for solving storage bandwidth efficiency of general AI processor
CN112699063B (en) * 2021-03-25 2021-06-22 轸谷科技(南京)有限公司 Dynamic caching method for solving storage bandwidth efficiency of general AI processor
CN114422499A (en) * 2021-12-27 2022-04-29 北京奇艺世纪科技有限公司 File downloading method, system and device
CN114422499B (en) * 2021-12-27 2023-12-05 北京奇艺世纪科技有限公司 File downloading method, system and device

Also Published As

Publication number Publication date
JP6370988B2 (en) 2018-08-08
EP3178005A1 (en) 2017-06-14
US9361228B2 (en) 2016-06-07
WO2016022247A1 (en) 2016-02-11
EP3178005B1 (en) 2018-01-10
US10261910B2 (en) 2019-04-16
US20160203084A1 (en) 2016-07-14
JP2017529591A (en) 2017-10-05
US20160041905A1 (en) 2016-02-11

Similar Documents

Publication Publication Date Title
CN106575263A (en) Cache line compaction of compressed data segments
CN106537327B (en) Flash memory compression
US20170177497A1 (en) Compressed caching of a logical-to-physical address table for nand-type flash memory
US11010079B2 (en) Concept for storing file system metadata within solid-stage storage devices
US20150019834A1 (en) Memory hierarchy using page-based compression
TWI750243B (en) Nonvolatile memory storage device
CN106663059B (en) Power-aware filling
TWI619018B (en) Garbage collection method for data storage device
CN104281528A (en) Data storage method and device
CN105373369A (en) Asynchronous caching method, server and system
JP2017501504A (en) System and method for defragmenting memory
CN106575262B (en) The method and apparatus of supplement write-in cache command for bandwidth reduction
CN106687937B (en) Cache bank expansion for compression algorithms
CN106610790A (en) Repeated data deleting method and device
US9477605B2 (en) Memory hierarchy using row-based compression
CN107077423A (en) The local sexual system of efficient decompression for demand paging
CN109313609A (en) The system and method to interweave for odd mode storage channel
CN108604211A (en) System and method for the multi-tiling data transactions in system on chip
CN107078746A (en) The decompression time is reduced without influenceing compression ratio
US11079955B2 (en) Concept for approximate deduplication in storage and memory
CN107111560A (en) System and method for providing improved delay in non-Unified Memory Architecture
US20210200679A1 (en) System and method for mixed tile-aware and tile-unaware traffic through a tile-based address aperture
CN110235110A (en) It is reduced or avoided when the write operation of pause occurs from the uncompressed cache memory compressed in storage system through evicting the buffering of high-speed buffer memory data from
CN107844579B (en) Method, system and equipment for optimizing distributed database middleware access
CN108780422A (en) It is compressed using compression indicator CI hint directories to provide bandwidth of memory in the system based on central processing unit CPU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170419