EP3476051A1 - General purpose data compression using simd engine - Google Patents

General purpose data compression using simd engine

Info

Publication number
EP3476051A1
EP3476051A1 EP16738472.6A EP16738472A EP3476051A1 EP 3476051 A1 EP3476051 A1 EP 3476051A1 EP 16738472 A EP16738472 A EP 16738472A EP 3476051 A1 EP3476051 A1 EP 3476051A1
Authority
EP
European Patent Office
Prior art keywords
hash
subsets
processed
data stream
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP16738472.6A
Other languages
German (de)
English (en)
French (fr)
Inventor
Michael Hirsch
Yehonatan DAVID
Yair Toaff
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of EP3476051A1 publication Critical patent/EP3476051A1/en
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3086Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing a sliding window, e.g. LZ77
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3088Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3091Data deduplication
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6017Methods or arrangements to increase the throughput
    • H03M7/6023Parallelization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/3017Runtime instruction translation, e.g. macros
    • G06F9/30178Runtime instruction translation, e.g. macros of compressed or encrypted instructions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Definitions

  • the present invention in some embodiments thereof, relates to data compression and, more specifically, but not exclusively, to data compression using a single instruction multiple data (SIMD) engine.
  • SIMD single instruction multiple data
  • Data compression is widely used for a plurality of applications to reduce the data volume for storage and/or transfer in order to reduce storage space for storing the data and/or network bandwidth for transferring the data.
  • Data compression involves encoding the data using fewer bits than the original representation of the data. While the data compression may significantly reduce the storage and/or networking resources, it may require additional processing and/or computation resources, for example, processing engines, memory resources and/or processing time. Many data compression methods, techniques and/or algorithms are currently available each employing a trade-off between the compression ratio and the required processing resources.
  • a system for compressing an input data stream to create a compressed output data stream comprising a memory for storing a hash table, the hash table comprising a plurality of hash entries, each hash entry comprising a hash value of an associated one of a plurality of subsets of following data items of a plurality of data items of an input data stream and a pointer to a memory location of the associated subset; and a processor coupled to the memory adapted to: execute the following operations while at least one of the operations is executed by instructing a single instruction multiple data, SIMD, engine to concurrently execute the at least one operation for each processed subset of a group of consecutive subsets of the plurality of subsets: calculate the hash value for each one of the processed subsets, search the hash table for a match of each calculated hash value, and update the hash table according to the match result; update the compressed output data stream according to the match result and a comparison result of a comparison
  • each of the plurality of associated subsets includes a predefined number of data items defined according to the SIMD engine architecture.
  • the number of processed subsets in the group is set according to the SIMD engine architecture.
  • the match result indicates a match of the each calculated hash value with an existing hash value present in the hash table.
  • the comparison is conducted to produce the comparison result in case the match result indicates a match of the calculated hash value with a matching hash value in the hash table, and wherein the comparison comprises comparing between the data items of the processed subset with the calculated hash value and the data items of the associated subset pointed by the pointer in the matching hash entry.
  • the processed subset in case the comparison result indicates the data items of the processed subset and the associated subset are identical, the processed subset is replaced with a pointer to the associated subset in the compressed output data stream, and in case the comparison result indicates the data items of the processed subset and the associated subset are not identical, the processed subset is updated in the compressed output data stream and the hash table is updated with a new hash entry for the processed subset.
  • the concurrent calculation comprises the processor loading the group of processed subsets to at least one SIMD register of the SIMD engine and the SIMD engine processing concurrently the group of subsets
  • the concurrent processing comprises: spacing the processed subsets of the group from each other, shifting the processed subsets using a different shift value for each processed subset, and processing the processed subsets to create a hash value for each of the processed subsets.
  • the concurrent search for the match of each of the processed subsets in the hash table comprises the processor instructing the SIMD engine to search concurrently for a match of each of the calculated hash values with hash values stored in the hash table.
  • the concurrent update of the hash table with at least one processed subset comprises the processor instructing the SIMD engine to update concurrently the hash table with an entry associated with the at least one processed subset.
  • the compressed output data stream is compliant with a standard compressed output data stream compressed using legacy compression methods, the compressed output data stream is decompressed using legacy decompression methods.
  • a method for compressing an input data stream to create a compressed output data stream comprising: storing a hash table comprising a plurality of hash entries, each hash entry comprising a hash value of an associated one of a plurality of subsets of data items of a plurality of data items of an input data stream and a pointer to a memory location of the associated subset; executing the following operations while at least one of the operations is executed by instructing a single instruction multiple data, SIMD, engine of a processor to execute concurrently the at least one operation for each processed subset of a group of consecutive subsets of the plurality of associated subsets: calculate the hash value for each one of the processed subsets, search the hash table for a match of each calculated hash value, and update the hash table according to the match result; updating the compressed output data stream according to the match result and a comparison result of a comparison that depends on the match result; and repeating the calculation, search and update throughout the pluralit
  • the concurrent calculation comprises the processor loading the group of processed subsets to at least one SIMD register of the SIMD engine, and the SIMD engine concurrently processing the group of processed subsets, the concurrent processing comprises: spacing the processed subsets of the group from each other, and shifting the processed subsets using a different shift value for each processed subset, and processing the processed subsets to create a hash value for each of the processed subsets.
  • the concurrent search for the match of each of the processed subsets in the hash table comprises the processor instructing the SIMD engine to search concurrently for a match of each of the calculated hash values with hash values stored in the hash table.
  • the concurrent update of the hash table with at least one processed subset comprises the processor instructing the SIMD engine to update concurrently the hash table with an entry associated with the at least one processed subsets.
  • the compressed output data stream is compliant with a standard compressed output data stream compressed using legacy compression methods, the compressed output data stream is decompressed using legacy decompression methods.
  • FIG. 1 is a schematic illustration of an exemplary system for compressing an input data stream using a SIMD engine, according to some embodiments of the present invention
  • FIG. 2 is a flowchart of an exemplary process for compressing an input data stream using a SIMD engine, according to some embodiments of the present invention
  • FIG. 3 A is a schematic illustration of an exemplary sequence for loading simultaneously a plurality of consecutive bytes of an input data stream into registers of a SIMD engine, according to some embodiments of the present invention
  • FIG. 3B is a schematic illustration of an exemplary sequence for calculating simultaneously a hash value for a plurality of subsets each comprising consecutive bytes of an input data stream using a SIMD engine, according to some embodiments of the present invention
  • FIG. 4 is a schematic illustration of an exemplary sequence for searching simultaneously for a match of a plurality of hash values in a hash table using a SIMD engine, according to some embodiment of the present invention.
  • FIG. 5 is a schematic illustration of an exemplary sequence for updating simultaneously a plurality of hash table entries using a SIMD engine, according to some embodiment of the present invention.
  • the present invention in some embodiments thereof, relates to data compression and, more specifically, but not exclusively, to data compression using a SIMD engine.
  • the present invention presents systems and methods for general-purpose data compression using a SIMD engine of one or more processors for a plurality of applications requiring the data compression in order to reduce the amount (volume) of data, for example, data storage and/or data transfer.
  • An input data stream comprising a plurality of data items, for example, bytes, words, double-words and/or pixels is compressed by replacing repetitive data sequences with pointers to previous instances of the repetitive data sequences.
  • the compression systems and methods presented herein utilize lossless compression methods and/or algorithms as known in the art, for example, Lempel-Ziv (LZ77 and LZ78), Lempel-Ziv- Welch (LZW), Lempel-Ziv-Oberhumer (LZO) and/or LZ4.
  • the compression methods are explained in the present invention only to the extent required to demonstrate the compression operations executed by the SIMD engine of the processor(s) to enhance the compression process, for example reduce compression resources and/or compression time. It is however expected of a person skilled in the art to be familiar with all aspects of the compression methods.
  • the compression scheme utilizes the SIMD engine for concurrent execution of one or more operations during the compression process, processing subsets of consecutive data items to calculate respective hash values, searching for a match of the hash values in a hash table and/or updating the hash table with hash values and pointers to the associated subsets.
  • the SIMD engine supports execution of a single instruction (processor instruction) over multiple data items concurrently.
  • the compression methods and/or algorithms may be somewhat manipulated to support the concurrent execution by the SIMD engine.
  • SIMD engine technology may present significant advantages compared to currently existing sequential compression methods (legacy and/or standard compression methods).
  • Vector processing technology in general and SIMD technology in particular is rapidly advancing in many aspects, for example, a number of data items that may be processed in parallel and/or processing power of the processor(s).
  • the sequential data compression employed by the currently existing compression methods may be a major time consuming and/or processor intensive operation. Since the data items of the input data stream may be regarded as independent from each other with respect to the fundamental operation of the compression process, simultaneous processing of the input data stream may take full advantage of the SIMD engine and/or technology. The compression time and/or computation resources may be significantly reduced using the SIMD engine.
  • Executing even one of the compression operations concurrently may significantly increase the compression performance, therefore applying the SIMD engine to execute two or all the compression operations, for example, processing the subsets to calculate the hash values, searching for a match of the hash values and/or updating the hash table, may present an even more significant compression performance improvement.
  • the format of the compressed data (stream) compressed using the SIMD engine may be fully compliant with compressed data using some legacy compression methods.
  • the full compliance of the compressed data using the SIMD engine allows decompression of the compressed data using standard decompression methods, techniques and/or tools as known in the art for decompressing the compressed data.
  • the decompression methods, techniques and/or tools may need to be selected appropriately according to the used compression format.
  • LZ4 decompression may be employed to decompress compressed data compressed utilizing the SIMD engine according to LZ4 compressed data format.
  • the present invention may be a system, a method, and/or a computer program product.
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
  • FPGA field-programmable gate arrays
  • PLA programmable logic arrays
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • FIG. 1 is a schematic illustration of an exemplary system for compressing an input data stream using a SIMD engine, according to some embodiments of the present invention.
  • a system 100 includes an input/output (I/O) interface 102 for receiving and/or designating an input data stream 120 and outputting a compressed output data stream 130, a processor(s) 104 comprising a SIMD engine 106 for compressing the input data stream 120 to create the compressed data stream 130, a memory 108 and a program store 110.
  • the input data stream 120 may be received in one or more formats, for example, a data file, a media file, a streaming data and the like.
  • the input data stream 120 includes a plurality of data items, for example, bytes, words, double-words and/or pixels which may be arranged in sequence as a stream.
  • the I/O interface 102 may include one or more interfaces, for example, a network interface, a memory interface and/or a storage interface.
  • the I/O interface 102 may be used by the processor(s) 104 to receive and/or transmit the data streams 120 and/or 130 over a network and/or one or more local peripheral interfaces, for example, a universal serial bus (USB), a secure digital (SD) card interface and/or the like.
  • USB universal serial bus
  • SD secure digital
  • the I O interface 102 may also be used by the processor(s) 104 to fetch and/or store the data streams 120 and/or 130 to a memory such as the memory 108 device and/or a storage device such as the program store 110.
  • the processor(s) 104 homogenous or heterogeneous, maybe arranged for parallel processing, as clusters and/or as one or more multi core processor(s) each having one or more SIMD engines 106.
  • the SIMD engine 106 comprises a plurality of processing pipelines for vector processing, for example processing multiple data items concurrently.
  • the program store 110 may include one or more non-transitory persistent memory storage devices, for example, a hard drive, a Flash array and/or the like.
  • the program store 1 10 may further comprise one or more network storage devices, for example, a storage server, a network accessible storage (NAS) a network drive and/or the like.
  • NAS network accessible storage
  • FIG. 2 is a flowchart of an exemplary process for compressing an input data stream using a SIMD engine, according to some embodiments of the present invention.
  • a compression process 200 for compressing the input data stream may be performed by a system such as the system 100.
  • the compression process 200 employs the SIMD engine 106 to process concurrently multiple data items of the input data stream 120 to produce the compressed output data stream 130.
  • the compression process 200 may be done by one or more software modules such as, for example, a compressor 112 that comprises a plurality of program instructions executed by the processor(s) 104 and/or the SIMD engine 106 from the program store 110.
  • the compressor 112 may be executed by a processing unit of the processor(s) 104 to manage and/or coordinate the compression process, for example, load data to the SIMD engine 106, collect data from the SIMD engine 106, synchronize data, synchronize tasks, update the compressed output data stream 130 and/or the like.
  • the processor(s) 104 executing the compressor 112 may instruct the SIMD engine 106 to process concurrently multiple data items of the input data stream 120 and/or interim products during the compression process 200 in order to expedite the compression process 200 thus reducing processing resources and/or processing time.
  • the concurrent processing is applied by the processor(s) 104 initiating a single instruction to the SIMD engine that executes concurrently the operation (instruction) over multiple data items and/or interim products using the plurality of processing pipelines.
  • the compressor 112 may create a one or more data structures in the memory 108 to control the compression sequences 200, for example, a history array 114, a hash table 116 and/or the like.
  • the process 200 starts with the compressor 112 receiving the input data stream 120 using the I/O interface 102, for example, receiving the input data stream 120 from a remote device over the network(s), fetching the input data stream 120 from the local peripheral interface(s), from the memory 108 and/or the program store 110.
  • the system 100 executing the compression process 200 compresses the input data stream 120 using one or more lossless compression methods, for example as known in the art, for instance Lempel-Ziv (LZ77 and LZ78), Lempel-Ziv- Welch (LZW), Lempel-Ziv-Oberhumer (LZO) and/or LZ4.
  • LZ77 and LZ78 Lempel-Ziv
  • LZW Lempel-Ziv- Welch
  • LZO Lempel-Ziv-Oberhumer
  • LZO Lempel-Ziv-Oberhumer
  • the basic concept of the compression methods is to identify duplicate sequences of data in the input data stream 120 and replace the duplicated sequences with pointers to a previous instance of the same sequence instead of placing the duplicated sequence itself in the compressed output data stream 130.
  • a sliding window is applied to the input data stream 120 to designate rolling sequences comprising consecutive data items of the input data stream 120.
  • the data items of the rolling sequences are stored in a history array such as the history table 1 14.
  • a hash value is calculated for each of the rolling sequences and stored in a hash table entry of a hash table such as the hash table 116.
  • Each of the hash table entries comprises pairs of the calculated hash value and a pointer to an associated rolling sequence in the history array 1 14. For every new rolling sequence, the hash value is calculated and searched for a match in the hash table 116 to check if an identical hash value exists in the hash table 1 16. In case a match is found, the new rolling sequence may be identical to a previous rolling sequence associated with the matching hash value.
  • a plurality of hash functions may be used to calculate the hash value. Selection of the hash functions may present a trade-off between computation complexity and/or processing time and certainty of similarity of the two subsets. It is possible to calculate a complex hash value that will be unambiguous such that each of the rolling sequences is associated with a unique hash value.
  • the complex hash value calculation may be very computation demanding.
  • Lower complexity hash functions may produce less complex hash values for the rolling sequences, however, some level of ambiguity may exist, for example, the calculated hash value may be the same for two or more not similar rolling sequences.
  • the actual data items of the new rolling sequence and the previous rolling sequence having the same hash value need to be compared to determine the match.
  • the match is detected, indicating the new rolling sequence is identical to the matching previous rolling sequence, the new rolling sequence may not be included in the compressed output data stream 130 but rather be replaced with a pointer to the location of the matching previous rolling sequence. The pointer may be placed at the appropriate position in the compressed output data stream 130 where the replaced rolling sequence needs to be inserted.
  • the rolling sequence is included in the compressed output data stream 130.
  • the hash table may be updated accordingly.
  • the hash table may be updated to include the new hash value calculated for the new rolling sequence.
  • one or more hash values associated with the previous rolling sequences may be omitted from the hash table, for example, the least frequently matching hash entry and/or the like.
  • the size of the history array 114 may vary.
  • the probability for the match is increased, and hence leading to improved compression.
  • the size of the history array 1 16 is typically 2KB, 4KB, 8KB, 16KB and/or 32KB to achieve best trade-off between the compression efficiency and the consumed processing and/or memory resources.
  • the compression methods employ a serial sequence to calculate the hash value for each of the rolling sequence, search the hash table and update the hash table accordingly.
  • the compression process 200 may perform one or more of the calculation, search and/or update operations concurrently using the SIMD engine 106 to expedite the compression process 200.
  • the compressor 112 creates the history array 1 14 in the memory 108 to store the most recent subsets.
  • the typical size of the history array 1 14 is 2KB, 4KB, 8KB, 16KB and/or 32KB. Other sizes may be assigned to the history array 1 14 according to availability of processing resources and/or a size of the memory 108.
  • the compressor 112 also creates the hash table 1 16 in the memory 108 to store hash entries comprising a pointer to one of the subsets stored in the history array 1 14 and a hash value calculated for the associated subset.
  • the history array 114 and the hash table 116 are empty and are gradually filled with the subsets (rolling sequences) as the sliding window is applied to the input data stream 120.
  • the compressor 1 12 applies the rolling window over the input data stream 120.
  • the size of the window that dictates the size of each of the subsets may be adapted according to the architecture of the processor 104 and/or the architecture of the SIMD engine 106.
  • the compressor 112 sides the sliding window over the input data stream 120 such that during every slide of the window (shift), the earliest (first) data item of a previous rolling sequence is omitted and a new data item is added to create a new rolling sequence.
  • the compressor 112 calculates the hash values for the new rolling sequence using the SIMD engine 106. In order to allow concurrent processing of the rolling sequence, the rolling sequence is split to a plurality of processed subsets each comprising consecutive data items of the rolling sequence.
  • the group (rolling sequence) of processed subsets is processed concurrently using the SIMD engine 106.
  • the number of the processed subsets in the group may be adapted according to the architecture of the processor 104 and/or the architecture of the SIMD engine 106.
  • the compressor 1 12 loads the processed subsets to one or more registers of the SIMD engine 106 to calculate the hash value for each of the subsets.
  • the type, synopsis, characteristics and/or usage of the load instructions issued by the compressor 1 12 to SIMD engine may be adapted according to the architecture of the processor(s) 104 and/or the SIMD engine 106.
  • FIG. 3 A is a schematic illustration of an exemplary sequence for loading simultaneously a plurality of consecutive bytes of an input data stream into registers of a SIMD engine, according to some embodiments of the present invention.
  • FIG. 3B is a schematic illustration of an exemplary sequence for calculating simultaneously a hash value for a group of subsets each comprising consecutive bytes of an input data stream using a SIMD engine, according to some embodiments of the present invention.
  • a compressor such as the compressor 1 12 loads consecutive data items 310 to four registers 302A through 302D of the SIMD engine 106, in such a way that each successive register will contain a window of the data slid by one item.
  • the number of consecutive data items loaded to registers 302 dictates the size of each of the processed subsets and/or the size of the group of subsets.
  • the exemplary sequence presented in FIG. 3 A describes an SIMD engine utilizing a 16 byte architecture, for example each register is 16 bytes wide allowing concurrent processing of a group of 8 subsets, for example calculating 8 hash values 320 each calculated for a subset comprising 4 consecutive data items 310.
  • the data items 310 need to be spaced apart to allow the SIMD engine 106 to calculate concurrently the 8 hash values 320.
  • the data items 310 are spaced apart such that each byte (8-bits) occupies a word (16-bits) space, thus the 32 data items 310 occupy the four registers of 16 bytes to fit the register width of the exemplary SIMD engine 106.
  • Other architectures of the SIMD engine 106 for example, 32, 64, 128 256 bytes and/or the like may allow loading a different numbers of consecutive data items 310 to the register 302 of the SIMD engine 106. Since the hash values 320 are calculated for every 4 consecutive data items 310, the 32 bytes loaded to the SIMD engine 106 are composed of 11 consecutive data items S 31 OA through S K +io 31 OK.
  • processor 104 is, for example, an Intel Architecture (IA) processor employing a Streaming SIMD Extensions (SSE) instruction set for operating the 16 bytes SIMD engine 106
  • IA Intel Architecture
  • SSE Streaming SIMD Extensions
  • the compressor 1 12 converts the data items bytes (8-bits) to words (16-bit) such that each data item occupies a word space as shown in FIG. 3B.
  • the compressor 1 12 instructs the SIMD engine 106 to shift the data items 310 loaded in the register 302.
  • a different shift is applied to each of the register portions 302A-302D such that: -
  • the data items SK 310A through SK+7 3 lOH stored in the register portion 302A are shifted 6 bits to the left.
  • the data items SK+3 31 OD through SK+IO 31 OK stored in the register portion 302D are not shifted at all.
  • the degree of shifting applied to the register 302 depends on the number of consecutive data items 310 loaded into the register 302 hence the degree of shifting depends on the architecture of the SIMD engine 106.
  • the compressor 1 12 may instruct the SIMD engine 106 to calculate 330 concurrently the hash value 320 for each of the processed subsets.
  • the type, synopsis, characteristics and/or usage of the calculation instructions issued by the compressor 1 12 to SIMD engine may be adapted according to the architecture of the processor(s) 104 and/or the SIMD engine 106.
  • the calculation 330 of the hash values 320 may be a simple XOR operation performed over subsets of consecutive data items 310. The subsets of the consecutive data items 310 are referred to as the processed subsets.
  • Each of the processed subsets comprises 4 data consecutive items 310, for example, a first processed subset includes the data items S 31 OA through SK+3 310D, a second processed subset includes the data items SK+I 3 lOB through SK+43 lOE and so on to a last processed subset that includes the data items SK+7 3101 through SK+IO 310K.
  • the SIMD engine 106 calculates concurrently the hash values 320 for all the processed subsets by applying the calculation 330 that may be a simple XOR operation over the respective 4 data items 310 included in each of the processed subset.
  • the SIMD engine 106 produces 8 hash values 320, a hash value 320A for the data items SK 310A through SK+3 310D, a hash value 320B for the data items SK+I 310B through SK+4 310E, a hash value 320C for the data items SK+2 3 IOC through SK+5 31 OF, a hash value 320D for the data items SK+3 310D through SK+6 310G, a hash value 320E for the data items SK+4 310E through SK+7 31 OH, a hash value 320F for the data items SK+5 31 OF through SK+8 3101, a hash value 320F for the data items SK+5 31 OF through
  • the compressor 1 12 searches for a match of each of the calculated hash values 320 by comparing each of the calculated hash values 320 with each of a plurality of hash values available in hash entries of the hash table 1 16. A match is found for one of the calculated hash values 320 in case an identical hash value is found in one of the hash entries of the hash table 1 16.
  • the compressor 1 12 may issue instruction(s) to instruct the SIMD engine 106 to search concurrently for a match of each of the calculated hash values 320 in the hash table 1 16.
  • the type, synopsis, characteristics and/or usage of the search instructions issued by the compressor 112 to SIMD engine may be adapted according to the architecture of the processor(s) 104 and/or the SIMD engine 106.
  • the compressor 1 12 may use the "gather" instruction from the SSE instruction set as presented in function 1 below to instruct the SIMD engine 106 to execute the search operation.
  • CPUID Flags A VX512F for AVX-512, KNCNI for KNC
  • the compressor 1 12 may issue the "gather" instruction as expressed in pseudo code excerpt 1 below to instruct the SIMD engine 106 to execute the search operation.
  • the compressor 112 may initiate further comparison to determine whether the processed subset is identical to the associated subset pointed by the hash entry that includes the matching stored hash value.
  • the further comparison may be required since the hash function calculation 330 used by the compressor 1 12 maybe a simple XOR operation that may present ambiguous results, for example an identical hash value 320 may be calculated for different subsets with different data items 310.
  • the further comparison includes comparing the data items 310 included in the processed subset and the data items 310 included in the associated subset associated with the matching stored hash value in the hash table 1 16. In case the data items 310 of both the processed subset and the associated subset are similar, the compressor 1 12 issues a match indication for the processed subset.
  • FIG. 4 is a schematic illustration of an exemplary sequence for searching simultaneously for a match of a plurality of hash values in a hash table using a SIMD engine, according to some embodiment of the present invention.
  • a compressor such as the compressor 1 12 instructs a SIMD engine such as the SIMD engine 106 to search concurrently for a match of a plurality of calculated hash values such as the hash values 320 with hash values stored in hash entries 402 in a hash table such as the hash table 116.
  • the exemplary sequence 400 follows the previous examples of the exemplary load sequence 300 and the exemplary concurrent calculation sequence 301.
  • the SIMD engine 106 initiates 8 comparison operations 450 concurrently to compare each of the calculated hash values 320 stored in the register 304 to each of the stored hash values available in hash entries 402 in the hash table 116.
  • Each search operation is associated with one of the calculated hash values 320, for example, a search operation 450A is associated with the calculated hash value 320A, a search operation 450B is associated with the calculated hash value 320B and so on up to a search operation 450H is associated with the calculated hash value 320H.
  • a search operation 450A is associated with the calculated hash value 320A
  • a search operation 450B is associated with the calculated hash value 320B
  • so on up to a search operation 450H is associated with the calculated hash value 320H.
  • no match is found between the respective calculated hash values 320A.
  • an empty hash entry 402G is detected in the hash table 1 16.
  • a match is found between the respective calculated hash value 320B and a stored hash value in a hash entry 402C. Another match is detected during the search operation 45 OH between the respective calculated hash value 320H and a stored hash value in a hash entry 402K. Since the hash function calculation 330 used by the SIMD engine 106 may be a simple XOR operation over the data items 310, the hash value 320 may be ambiguous. Therefore, the actual data items 310 of the processed subset and the subset associated with the matching stored hash value in the hash table 1 16 need to be compared to determine a definite match.
  • the compressor 1 12 may initiate a comparison operation 460 for each matching calculated hash value 320 in the history array 114.
  • the SIMD engine 106 indicated a match between the calculated hash value 320B and a hash value stored in the hash entry 402C.
  • the compressor therefore initiates a comparison operation 460A to compare the data items 310 of the dataset associated with the hash entry 402C.
  • the compressor 112 compares data items SK+I 31 OB through SK+431 OB (producing the hash value 320B) with respective data items SK-5 31 OP through SK-2 310M to determine a match.
  • the compressor 112 may indicate a match.
  • the compressor 1 12 initiates a comparison operation 460B to compare the data items 310 of the dataset associated with the hash entry 402K. For example, assuming the hash entry 402K is associated with a subset starting at data item SK-3 3 ION, the compressor 1 12 compares data items SK+7 31 OH through SK+IO 31 OK (producing the hash value 320H) with respective data items SK-3 3 ION through S 310A to determine a match. In case the data items SK+7 31 OH through SK+IO 31 OK are similar to the data items SK-3 3 ION through SK 310A, the compressor 1 12 may indicate a match.
  • the compressor 1 12 updates the hash table 1 16 according to the match results.
  • the compressor 1 12 may issue instruction(s) to instruct the SIMD engine 106 to update concurrently one or more of the hash entries 402 with new hash entries 402 associated with respective one or more processed subsets.
  • each of the new hash entries 402 comprises the calculated hash value 320 for the respective subset and a pointer to the respective subset in the history array 1 14.
  • the hash table 1 16 may be updated in one or more scenarios using one or more update schemes. In one scheme, all the processed subsets are associated with entries in the hash table 1 16.
  • the compressor 1 12 may apply one or more schemes for updating the hash table 1 16. For example, in case one or more empty hash entries such as the hash entries 402 are detected during the match search operation of the SIMD engine 106, one or more of the hash entries 402 with each of the empty hash entries may be updated to be associated with one of the processed subsets. This means that the respective hash entry 402 is created to include a calculated hash value such as the calculated hash value 320 of the respective processed subset and a pointer to the first data item such as the data item 310 of the processed subset.
  • one or more calculated hash values 320 match one or more hash values stored in the hash table 1 16.
  • the compressor 1 12 indicates the contents (data items) of the processed subset(s) and the contents (data items) of the associated subset (pointed by the matching hash entry) are not the same.
  • the compressor 112 may update the respective hash entry 402 in the hash table 1 16 with a pointer pointing to the first data item 310 of the processed subset.
  • the hash value is naturally the same and therefore the compressor 112 does not alter it.
  • the compressor 1 12 may further apply one or more methods and/or techniques for dropping one or more of the hash entries 402 to allow newly created hash entries 402 comprising new calculated hash values 320 associated with recent subsets.
  • the type, synopsis, characteristics and/or usage of the update instruction(s) issued by the compressor 1 12 to SIMD engine may be adapted according to the architecture of the processor(s) 104 and/or the SIMD engine 106.
  • the compressor 1 12 may use the "scatter" instruction from the SSE instruction set as presented in function 2 below to instruct the SIMD engine 106 to execute the update operation in the hash table 1 16.
  • CPUID Flags A VX512F for AVX-512, KNCNI for KNC
  • the update operation 51 OA is directed at updating the hash entry 402G with the calculated hash value 320A calculated for the processed subset comprising the data items S 310A through S +3 310D and with an updated pointer pointing to the data item S 3 lOA that is the first data item 310 of the processed subset.
  • the processed subset comprising the data items SK 310A through SK+3 310D is considered an associated subset.
  • the update operation 510B is directed at updating the hash entry 402C with the calculated hash value 320B calculated for the processed subset comprising the data items SK+I 31 OB through SK+4 3 lOE and with an updated pointer pointing to the data item SK+I 3 lOB that is the first data item 310 of the processed subset.
  • the processed subset comprising the data items SK+I 31 OB through SK+4 310E is considered an associated subset.
  • the concurrent update operations 510 are similar for all the processed subsets, all the way to the update operation 51 OH that is directed at updating the hash entry 402K.
  • the compressor 1 12 updates the compressed output stream 130 with the processed subsets. For each processed subset indicted as matching (having the same data items 310) an associated subset (previous subset) in the input data stream 120, the compressor 1 12 replaces the processed subset in the compressed output stream 130 with a pointer to the location of the associated subset. For each processed subset indicted as not matching any associated subset (previous subset) in the input data stream 120, the compressor 1 12 places the processed subset itself in the compressed output stream 130. As shown at 218, which is a decision point, the compressor 1 12 checks if additional data items are available in the input data stream 120.
  • the process 200 branches to step 206 and the steps 206 through 216 are repeated for additional groups of subsets.
  • the process 200 branches to 220. As shown at 220, after the compressor 112 processes the input data stream 120, the compressor 1 12 outputs the compressed output stream 130 using, for example, the I/O interface 102.
  • the compression process 200 using the SIMD engine 106 presents a significant performance increase of -40% compared to the legacy (standard) compression process.
  • SIMD vector processing
  • composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.
  • a compound or “at least one compound” may include a plurality of compounds, including mixtures thereof.
  • range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Advance Control (AREA)
  • Storage Device Security (AREA)
EP16738472.6A 2016-07-14 2016-07-14 General purpose data compression using simd engine Ceased EP3476051A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2016/066798 WO2018010800A1 (en) 2016-07-14 2016-07-14 General purpose data compression using simd engine

Publications (1)

Publication Number Publication Date
EP3476051A1 true EP3476051A1 (en) 2019-05-01

Family

ID=56409635

Family Applications (1)

Application Number Title Priority Date Filing Date
EP16738472.6A Ceased EP3476051A1 (en) 2016-07-14 2016-07-14 General purpose data compression using simd engine

Country Status (5)

Country Link
US (1) US10489160B2 (zh)
EP (1) EP3476051A1 (zh)
JP (1) JP6921936B2 (zh)
CN (1) CN108141225B (zh)
WO (1) WO2018010800A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109361398B (zh) * 2018-10-11 2022-12-30 南威软件股份有限公司 一种基于并行和流水线设计的lz过程硬件压缩方法及系统
CN110572160A (zh) * 2019-08-01 2019-12-13 浙江大学 一种指令集模拟器译码模块代码的压缩方法
CN110830938B (zh) * 2019-08-27 2021-02-19 武汉大学 一种针对室内信号源部署方案筛选的指纹定位快速实现方法
CN111370064B (zh) * 2020-03-19 2023-05-05 山东大学 基于simd的哈希函数的基因序列快速分类方法及系统
CN113886652B (zh) * 2021-10-09 2022-06-17 北京欧拉认知智能科技有限公司 一种内存优先的多模图数据存储与计算方法及系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3913004B2 (ja) * 2001-05-28 2007-05-09 キヤノン株式会社 データ圧縮方法及び装置及びコンピュータプログラム及び記憶媒体
KR101956031B1 (ko) * 2012-10-15 2019-03-11 삼성전자 주식회사 데이터 압축 장치 및 방법, 데이터 압축 장치를 포함하는 메모리 시스템
CN103023509A (zh) * 2012-11-14 2013-04-03 无锡芯响电子科技有限公司 一种硬件lz77压缩实现系统及其实现方法
US8766827B1 (en) * 2013-03-15 2014-07-01 Intel Corporation Parallel apparatus for high-speed, highly compressed LZ77 tokenization and Huffman encoding for deflate compression
JP6048251B2 (ja) * 2013-03-21 2016-12-21 富士通株式会社 データ圧縮装置、データ圧縮方法、およびデータ圧縮プログラム、並びにデータ復元装置、データ復元方法、およびデータ復元プログラム
US9690488B2 (en) * 2015-10-19 2017-06-27 Intel Corporation Data compression using accelerator with multiple search engines

Also Published As

Publication number Publication date
CN108141225A (zh) 2018-06-08
JP6921936B2 (ja) 2021-08-18
JP2019522940A (ja) 2019-08-15
US10489160B2 (en) 2019-11-26
US20190146801A1 (en) 2019-05-16
CN108141225B (zh) 2020-10-27
WO2018010800A1 (en) 2018-01-18

Similar Documents

Publication Publication Date Title
US10489160B2 (en) General purpose data compression using SIMD engine
Lemire et al. Stream VByte: Faster byte-oriented integer compression
EP4012928B1 (en) Methods, devices and systems for semantic-value data compression and decompression
US8392489B2 (en) ASCII to binary floating point conversion of decimal real numbers on a vector processor
US20090045991A1 (en) Alternative encoding for lzss output
CN112514270B (zh) 数据压缩
Andrzejewski et al. GPU-WAH: Applying GPUs to compressing bitmap indexes with word aligned hybrid
EP4030628A1 (en) Near-storage acceleration of dictionary decoding
US9137336B1 (en) Data compression techniques
US7319417B2 (en) Compression using multiple Markov chain modeling
US10637498B1 (en) Accelerated compression method and accelerated compression apparatus
US20190349001A1 (en) Compression and decompression engines and compressed domain processors
Afroozeh et al. The fastlanes compression layout: Decoding> 100 billion integers per second with scalar code
EP2888819A2 (en) Format identification for fragmented image data
CN107534445B (zh) 用于分割哈希值计算的向量处理
US10879926B2 (en) Accelerated compression method and accelerated compression apparatus
CN107430506B (zh) 发现向量内的重复值的多个实例的方法和装置及到排序的应用
EP3340038A1 (en) Processor instructions for determining two minimum and two maximum values
US12001237B2 (en) Pattern-based cache block compression
GB2524515A (en) Method to improve compression ratio for a compression engine
US10637499B1 (en) Accelerated compression method and accelerated compression apparatus
Lu et al. G-Match: a fast GPU-friendly data compression algorithm
CN107656756B (zh) 查找第一个目标数的方法和装置、查找单元和处理器
Choi et al. False history filtering for reducing hardware overhead of FPGA-based LZ77 compressor
US20220200623A1 (en) Method and apparatus for efficient deflate decompression using content-addressable data structures

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20190125

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20200103

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

REG Reference to a national code

Ref country code: DE

Ref legal event code: R003

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20210521