CN117394865A - Memory device, compression method of symbol stream and generation method - Google Patents

Memory device, compression method of symbol stream and generation method Download PDF

Info

Publication number
CN117394865A
CN117394865A CN202310845805.5A CN202310845805A CN117394865A CN 117394865 A CN117394865 A CN 117394865A CN 202310845805 A CN202310845805 A CN 202310845805A CN 117394865 A CN117394865 A CN 117394865A
Authority
CN
China
Prior art keywords
symbol
value
state value
obtaining
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310845805.5A
Other languages
Chinese (zh)
Inventor
格雷戈里·W·库克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Display Co Ltd
Original Assignee
Samsung Display Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/939,643 external-priority patent/US20240022260A1/en
Application filed by Samsung Display Co Ltd filed Critical Samsung Display Co Ltd
Publication of CN117394865A publication Critical patent/CN117394865A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4031Fixed length to variable length coding
    • H03M7/4037Prefix coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application discloses a memory device, a compression method and a generation method of a symbol stream. The memory device includes: a memory; and at least one processor configured to: obtaining a symbol stream comprising a plurality of symbols; determining a Huffman tree corresponding to the symbol stream, wherein each symbol of the plurality of symbols is assigned a corresponding prefix code of the plurality of prefix codes based on the Huffman tree; generating a prefix length table based on the Huffman tree, wherein the prefix length table indicates the length of the corresponding prefix code of each symbol; generating a log-frequency table based on the prefix length table, wherein the log-frequency table indicates a log of the frequency count for each symbol; generating an accumulated frequency table indicating accumulated frequency counts corresponding to each symbol; generating a compressed bit stream by iteratively applying an encoding function to the plurality of symbols based on the log-frequency table and the cumulative-frequency table; and storing the compressed bit stream in a memory.

Description

Memory device, compression method of symbol stream and generation method
Cross Reference to Related Applications
The present application is based on and claims priority from U.S. provisional patent application No. 63/388,352 filed on 7.12 2022, the disclosure of which is incorporated herein by reference in its entirety.
Technical Field
The present disclosure relates to techniques for encoding and decoding digital data (e.g., compressing and decompressing digital data for storage in a memory device).
Background
Memory devices, such as embedded memory devices, may utilize codec techniques such as entropy codec. Entropy coding may refer to a lossless coding for compressing digital data. In entropy coding, frequently occurring modes are encoded with fewer bits, and rarely occurring modes are encoded with many bits. The limit of lossless compression is called shannon limit.
Entropy coding is often difficult to operate in parallel, especially in decoders, due to the removal of redundant information. Although there are indeed some general parallel techniques such as resynchronization markers, sub-stream multiplexing and parallel resynchronization, these techniques are often complex to implement.
Thus, there is a need for an embedded memory encoding/decoding or compression/decompression algorithm that has low storage space requirements, low complexity, high throughput (e.g., by being parallelizable), and near optimal compression.
Disclosure of Invention
According to some embodiments, a memory device includes: a memory; and at least one processor configured to: obtaining a symbol stream comprising a plurality of symbols; determining a Huffman tree corresponding to the symbol stream, wherein each symbol of the plurality of symbols is assigned a corresponding prefix code of the plurality of prefix codes based on the Huffman tree; generating a prefix length table based on the Huffman tree, wherein the prefix length table indicates the length of the corresponding prefix code of each symbol; generating a log-frequency table based on the prefix length table, wherein the log-frequency table indicates a log of the frequency count for each symbol; generating an accumulated frequency table indicating accumulated frequency counts corresponding to each symbol; generating a compressed bit stream by iteratively applying an encoding function to the plurality of symbols based on the log-frequency table and the cumulative-frequency table; and storing the compressed bit stream in a memory.
To generate the log-frequency table, the at least one processor may be configured to subtract the length of the corresponding prefix code for each symbol from the maximum length of the plurality of prefix codes.
To generate the cumulative frequency table, the at least one processor may be configured to: obtaining a frequency count for each symbol by left-shifting the integer value 1 based on the logarithm of the frequency count for each symbol; and the cumulative frequency count for each symbol is obtained by adding the frequency count for each symbol to the sum of the frequency counts of previous symbols in the plurality of symbols.
To apply the encoding function to each symbol, the at least one processor may be configured to: obtaining a current state value; obtaining a shifted state value by right-shifting the current state value based on the logarithm of the frequency count of each symbol; obtaining a first value by shifting left the shifted state value based on a maximum length of the plurality of prefix codes; obtaining a frequency count for each symbol by left-shifting the integer value 1 based on the logarithm of the frequency count; obtaining a second value by performing a bitwise AND operation on the current state value AND the frequency count of each symbol minus 1; and obtaining an updated state value by adding the first value, the second value, and the cumulative frequency count corresponding to each symbol.
To apply the encoding function to each symbol, the at least one processor may be further configured to: determining whether a difference between a most significant set bit of an initial state value corresponding to a symbol stream and a logarithm of a frequency count corresponding to each symbol is greater than or equal to a minimum bit length of a codeword corresponding to the symbol stream, and based on determining that the difference is greater than or equal to the minimum bit length of the codeword: determining a third value by subtracting the logarithm of the frequency count for each symbol from the most significant set bit; obtaining a shifted third value by right-shifting the third value based on the logarithm of the minimum bit length of the codeword; determining a number of bits to be shifted out of the initial state value by shifting the shifted third value left based on the logarithm of the minimum bit length of the codeword; outputting the determined number of bits to the compressed bit stream; and obtaining the current state value by right-shifting the initial state value based on the determined number of bits.
The at least one processor may include a plurality of processors configured to execute the encoding function on the plurality of symbols in parallel, wherein, to execute the encoding function, each processor of the plurality of processors may be assigned a corresponding initial state value, wherein, after each processor determines a number of bits to be transferred from the corresponding initial state value, each processor may be assigned a corresponding memory location to output the determined number of bits, and wherein, after the determined number of bits is output to the compressed bitstream, each processor may be further configured to determine a corresponding current state value.
According to some embodiments, a memory device includes: a memory; and at least one processor configured to: obtaining a compressed bit stream from a memory, wherein the compressed bit stream corresponds to a symbol stream comprising a plurality of symbols; obtaining a log-frequency table from the compressed bit stream, wherein the log-frequency table indicates a log of the frequency count for each of the plurality of symbols; an accumulated frequency table is generated based on the logarithmic frequency table, wherein the accumulated frequency table indicates an accumulated frequency count corresponding to each symbol, an inverse symbol table is generated based on the logarithmic frequency table and the accumulated frequency table, and a symbol stream is generated by iteratively applying a decoding function to the plurality of symbols based on the accumulated frequency table and the inverse symbol table.
To generate the cumulative frequency table, the at least one processor may be configured to: obtaining a frequency count for each symbol by left-shifting the integer value 1 based on the logarithm of the frequency count for each symbol; and the cumulative frequency count for each symbol is obtained by adding the frequency count for each symbol to the sum of the frequency counts of previous symbols in the plurality of symbols.
To generate the inverse symbol table, the at least one processor may be configured to determine an inverse symbol value for each symbol by performing a bitwise AND operation on the current state value AND a maximum length of a plurality of prefix codes corresponding to the compressed bit stream minus 1, wherein the inverse symbol value is greater than or equal to the cumulative frequency count of each symbol, AND wherein the inverse symbol value is less than the cumulative frequency count of the next symbol.
To apply a decoding function to each symbol, the at least one processor may be configured to: obtaining each symbol based on an inverse symbol value corresponding to each symbol in the inverse symbol table; obtaining a shifted state value by right-shifting the current state value based on the maximum lengths of the plurality of prefix codes; obtaining a first value by shifting the shifted state value to the left based on the logarithm of the frequency count of each symbol; obtaining a total frequency count by left-shifting the integer value 1 based on the maximum lengths of the plurality of prefix codes; obtaining a second value by performing a bitwise AND operation on the current state value AND a maximum length minus 1 of the plurality of prefix codes; and obtaining an updated state value by subtracting the cumulative frequency count corresponding to each symbol from the sum of the first value and the second value.
To apply a decoding function to each symbol, the at least one processor may be further configured to: determining whether a difference between a maximum length of the plurality of prefix codes and a most significant set bit of an initial state value corresponding to the compressed bit stream is greater than 0; and based on determining that the difference is greater than 0: obtaining a third value by shifting left the integer value 1 based on the logarithm of the minimum bit length of the codeword corresponding to the symbol stream; obtaining a fourth value by adding the difference to the third value minus 1; obtaining a shifted fourth value by right-shifting the fourth value based on the logarithm of the minimum bit length of the codeword; determining the number of bits to be transferred into the initial state value by shifting the shifted fourth value to the left based on the logarithm of the minimum bit length of the codeword; obtaining additional bits from the compressed bit stream based on the determined number of bits, obtaining a shifted state value by shifting the initial state value to the left based on the determined number of bits; and the current state value is obtained by adding the shifted state value and the additional bit.
The at least one processor may comprise a plurality of processors configured to perform the decoding function on the plurality of symbols in parallel, wherein, to perform the decoding function, each processor of the plurality of processors may be assigned a corresponding initial state value, wherein, after each processor determines the number of bits to transfer into the corresponding initial state value, each processor may be assigned a corresponding memory location from which to obtain additional bits, and wherein, after the additional bits are obtained from the compressed bitstream, each processor may be further configured to determine a corresponding current state value.
According to some embodiments, a method of compressing a symbol stream for storage in a memory device is performed by at least one processor and comprises: obtaining a symbol stream comprising a plurality of symbols; determining a Huffman tree corresponding to the symbol stream, wherein each symbol of the plurality of symbols is assigned a corresponding prefix code of the plurality of prefix codes based on the Huffman tree; generating a prefix length table based on the Huffman tree, wherein the prefix length table indicates the length of the corresponding prefix code of each symbol; generating a log-frequency table based on the prefix length table, wherein the log-frequency table indicates a log of the frequency count for each symbol; generating an accumulated frequency table indicating accumulated frequency counts corresponding to each symbol; generating a compressed bit stream by iteratively applying an encoding function to the plurality of symbols based on the log-frequency table and the cumulative-frequency table; and storing the compressed bit stream in a memory device.
The generating of the log-frequency table may include: the length of the corresponding prefix code for each symbol is subtracted from the maximum length of the plurality of prefix codes.
The generating of the cumulative frequency table may include: obtaining a frequency count for each symbol by left-shifting the integer value 1 based on the logarithm of the frequency count for each symbol; and the cumulative frequency count for each symbol is obtained by adding the frequency count for each symbol to the sum of the frequency counts of previous symbols in the plurality of symbols.
Applying the encoding function to each symbol may include: obtaining a current state value; obtaining a shifted state value by right-shifting the current state value based on the logarithm of the frequency count of each symbol; obtaining a first value by shifting left the shifted state value based on a maximum length of the plurality of prefix codes; obtaining a frequency count for each symbol by left-shifting the integer value 1 based on the logarithm of the frequency count; obtaining a second value by performing a bitwise AND operation on the current state value AND the frequency count of each symbol minus 1; and obtaining an updated state value by adding the first value, the second value, and the accumulated frequency count corresponding to each symbol.
Applying the encoding function to each symbol may further include: determining whether a difference between a most significant set bit of an initial state value corresponding to the symbol stream and a logarithm of a frequency count corresponding to each symbol is greater than or equal to a minimum bit length of a codeword corresponding to the symbol stream; and based on determining that the difference is greater than or equal to the minimum bit length of the codeword: determining a third value by subtracting the logarithm of the frequency count for each symbol from the most significant set bit; obtaining a shifted third value by right-shifting the third value based on the logarithm of the minimum bit length of the codeword; determining a number of bits to be shifted out of the initial state value by shifting the shifted third value left based on the logarithm of the minimum bit length of the codeword; outputting the determined number of bits to the compressed bit stream; and obtaining the current state value by right-shifting the initial state value based on the determined number of bits.
The at least one processor may comprise a plurality of processors, wherein the encoding function is applied by the plurality of processors to the plurality of symbols in parallel, wherein each processor of the plurality of processors may be assigned a corresponding initial state value in order to perform the encoding function, wherein after each processor determines the number of bits to be transferred from the corresponding initial state value, each processor may be assigned a corresponding memory location to output the determined number of bits, and wherein after the determined number of bits is output to the compressed bitstream, the method may further comprise: a corresponding current state value for each processor is determined.
According to some embodiments, a method of generating a symbol stream based on a compressed bitstream is performed by at least one processor and comprises: obtaining a compressed bit stream from a memory, wherein the compressed bit stream corresponds to a plurality of symbols included in a symbol stream; obtaining a log-frequency table from the compressed bit stream, wherein the log-frequency table indicates a log of the frequency count for each of the plurality of symbols; generating an accumulated frequency table based on the logarithmic frequency table, wherein the accumulated frequency table indicates an accumulated frequency count corresponding to each symbol; generating an inverse symbol table based on the log-frequency table and the cumulative-frequency table; and generating a symbol stream by iteratively applying a decoding function to the plurality of symbols based on the cumulative frequency table and the inverse symbol table.
The generating of the cumulative frequency table may include: obtaining a frequency count for each symbol by left-shifting the integer value 1 based on the logarithm of the frequency count for each symbol; and the cumulative frequency count for each symbol is obtained by adding the frequency count for each symbol to the sum of the frequency counts of previous symbols in the plurality of symbols.
The generation of the inverse symbol table may include: an inverse symbol value is determined for each symbol by performing a bitwise AND operation on the current state value AND a maximum length of a plurality of prefix codes corresponding to the compressed bit stream, wherein the inverse symbol value is greater than or equal to the cumulative frequency count of each symbol, AND wherein the inverse symbol value is less than the cumulative frequency count of the next symbol.
Applying the decoding function to each symbol may include: obtaining each symbol based on an inverse symbol value corresponding to each symbol in the inverse symbol table; obtaining a shifted state value by right-shifting the current state value based on the maximum lengths of the plurality of prefix codes; obtaining a first value by shifting the shifted state value to the left based on the logarithm of the frequency count of each symbol; obtaining a total frequency count by left-shifting the integer value 1 based on the maximum lengths of the plurality of prefix codes; obtaining a second value by performing a bitwise AND operation on the current state value AND a maximum length minus 1 of the plurality of prefix codes; and obtaining an updated state value by subtracting the cumulative frequency count corresponding to each symbol from the sum of the first value and the second value.
Applying the decoding function to each symbol may further include: determining whether a difference between a maximum length of the plurality of prefix codes and a most significant set bit of an initial state value corresponding to the compressed bit stream is greater than 0; and based on determining that the difference is greater than 0: obtaining a third value by shifting left the integer value 1 based on the logarithm of the minimum bit length of the codeword corresponding to the symbol stream; obtaining a fourth value by adding the difference to the third value minus 1; obtaining a shifted fourth value by right-shifting the fourth value based on the logarithm of the minimum bit length of the codeword; determining the number of bits to be transferred into the initial state value by shifting the shifted fourth value to the left based on the logarithm of the minimum bit length of the codeword; obtaining additional bits from the compressed bit stream based on the determined number of bits; obtaining a shifted state value by shifting the initial state value to the left based on the determined number of bits; and the current state value is obtained by adding the shifted state value and the additional bit.
The at least one processor may comprise a plurality of processors, wherein the decoding function is applied by the plurality of processors in parallel to the plurality of symbols, wherein, to perform the decoding function, each processor of the plurality of processors may be assigned a corresponding initial state value, wherein, after each processor determines the number of bits to transfer into the corresponding initial state value, each processor may be assigned a corresponding memory location from which to obtain additional bits, and wherein, after the additional bits are obtained from the compressed bitstream, the method may further comprise: a corresponding current state value for each processor is determined.
Drawings
These and/or other aspects will become apparent and more readily appreciated from the following description taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a block diagram schematically illustrating an example of a memory system in accordance with some embodiments;
FIG. 2A is a block diagram schematically illustrating a controller according to some embodiments;
FIG. 2B is a block diagram schematically illustrating an example of an encoder/decoder in accordance with some embodiments;
fig. 3 is a block diagram schematically illustrating an example of a communication system in accordance with some embodiments;
FIG. 4A is a diagram illustrating an example of a frequency table according to some embodiments;
FIG. 4B is a diagram illustrating an example of a forest of single-node trees in accordance with some embodiments;
FIG. 4C is a diagram illustrating an example of a Huffman tree in accordance with some embodiments;
FIG. 4D is a diagram illustrating an example of a table including information about a Huffman tree, in accordance with some embodiments;
FIG. 5 is a diagram illustrating an example of an algorithm for determining the most significant set bits, according to some embodiments;
FIG. 6A is a block diagram schematically illustrating an example of an encoder according to some embodiments;
FIG. 6B is a flow chart of a process for encoding a symbol stream to generate a compressed bitstream, according to some embodiments;
FIG. 7A is a block diagram schematically illustrating an example of a decoder according to some embodiments;
FIG. 7B is a flow chart of a process for decoding a compressed bitstream to generate a symbol stream, according to some embodiments;
FIG. 8A is a block diagram schematically illustrating an example of a streaming encoder in accordance with some embodiments;
FIG. 8B is a flow chart of a process for encoding a symbol stream to generate a compressed bitstream, according to some embodiments;
FIG. 8C is a diagram illustrating an example of an algorithm for determining the number of bits to be translated out of a current state value, according to some embodiments;
fig. 9A is a block diagram schematically illustrating an example of a streaming decoder according to some embodiments;
FIG. 9B is a flow chart of a process for decoding a compressed bitstream to generate a symbol stream, according to some embodiments;
FIG. 9C is a diagram illustrating an example of an algorithm for determining the number of bits to transfer into an intermediate state value, in accordance with some embodiments;
FIG. 10 is a block diagram schematically illustrating an example of a parallel encoder in accordance with some embodiments;
FIG. 11 is a block diagram schematically illustrating an example of a parallel decoder in accordance with some embodiments;
12A-12C are flowcharts of a process for encoding a symbol stream to generate a compressed bitstream, according to some embodiments; and is also provided with
Fig. 13A-13C are flowcharts of a process for decoding a compressed bitstream to generate a symbol stream, according to some embodiments.
Detailed Description
Hereinafter, example embodiments of the present disclosure will be described in detail with reference to the drawings, in which like reference numerals refer to like elements throughout. However, it will be understood that the present disclosure is not limited to the embodiments described herein, and features and components in one embodiment may be included in or omitted from another embodiment. Repeated descriptions of identical or similar elements may be omitted for convenience.
Furthermore, it will be understood that, as used herein, expressions such as "at least one of" when located after a column of elements, modify the entire column of elements without modifying individual elements in the column. For example, the expression "at least one of [ a ], [ B ] and [ C ] means only a, only B, only C, A and B, B and C, A and C, or A, B and C.
It will be further understood that, although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms (e.g., should not be interpreted as specifying the relative order or importance). These terms are only used to distinguish one element from another element.
In addition, as used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise or is indicated by the context in the vicinity.
The following description is presented to enable one of ordinary skill in the art to make and use the disclosure and to incorporate it into the context of a particular application. While the following is directed to particular examples, other and further examples may be devised without departing from the basic scope thereof.
Various modifications and various uses in different applications will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to a wide variety of embodiments. Thus, the present disclosure is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
In the description provided, numerous specific details are set forth in order to provide a more thorough understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure may be practiced without limitation to these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present disclosure.
All the features disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
Hereinafter, various features are described with reference to the drawings. It should be noted that the drawings are only intended to facilitate the description of the features. They are not intended as an exhaustive description of the claimed invention or as a limitation on the scope of the claimed invention. Additionally, the illustrated examples need not have all of the aspects or advantages shown. Aspects or advantages described in connection with a particular example are not necessarily limited to that example, and may be implemented in any other example, even if not so illustrated or otherwise explicitly described.
Furthermore, any element in the claims recited in the "means for" or "step" for "performing the specified function is not to be construed as a" means "or" step "clause specified in the relevant specification of china. In particular, the use of "a step" or "an action" in the claims herein is not intended to refer to the relevant provision of china.
Note that if used, the labels left, right, front, back, top, bottom, forward, reverse, clockwise, and counterclockwise are for convenience only and are not intended to imply any particular fixed orientation. Rather, they are used to reflect the relative position and/or orientation between portions of an object.
Furthermore, the terms "system," "component," "module," "interface," or "model," and the like are generally intended to refer to a computer-related entity: hardware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a controller and the controller can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
Unless expressly stated otherwise, each numerical value and range may be construed as being approximate as if the word "about" or "approximately" preceded the value or range. Signals and corresponding nodes or ports may be referred to by the same names and are interchangeable herein for each purpose.
Although the embodiments are described with respect to circuit functions, embodiments of the present disclosure are not limited. Possible implementations may be embodied in a single integrated circuit, a multi-chip module, a single card, a system-on-chip, or a multi-card circuit package. As will be apparent to those of skill in the art, various embodiments may also be implemented as part of a larger system. Such embodiments may be used in conjunction with, for example, a digital signal processor, a microcontroller, a field programmable gate array, an application specific integrated circuit, or a general purpose computer.
As will be apparent to one of skill in the art, the various functions of the circuit elements may also be implemented as processing blocks in a software program. Such software may be used in, for example, a digital signal processor, a microcontroller, or a general purpose computer. Such software may be embodied in the form of program code embodied in tangible media, such as magnetic recording media, optical recording media, solid state memory, floppy diskettes, CD-ROMs, hard drives, or any other non-transitory machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the disclosure. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits. The described embodiments may also be manifested in the form of a bit stream or other sequence of signal values that are transmitted electronically or optically through a medium, stored magnetic field variations in a magnetic recording medium, etc., as produced using the methods and/or apparatus described herein.
Embodiments are described in terms of functional blocks, units, and/or modules and are illustrated in the drawings as conventional in the art. Those skilled in the art will appreciate that the blocks, units, and/or modules are physically implemented by electronic (or optical) circuits such as logic circuits, discrete components, microprocessors, hardwired circuits, memory elements, wired connections, or the like, which may be formed using semiconductor-based manufacturing techniques or other manufacturing techniques. Where the blocks, units, and/or modules are implemented by microprocessors or the like, they may be programmed using software (e.g., microcode) to perform the various functions recited herein, and may optionally be driven by firmware and/or software. Alternatively, each block, unit, and/or module may be implemented by dedicated hardware, or may be implemented as a combination of dedicated hardware for performing some functions and a processor (e.g., one or more programmed microprocessors and associated circuitry) for performing other functions. Furthermore, each block, unit, and/or module in embodiments may be physically separated into two or more interacting and discrete blocks, units, and/or modules without departing from the present scope. Furthermore, blocks, units, and/or modules in embodiments may be physically combined into more complex blocks, units, and/or modules without departing from the present scope.
FIG. 1 is a block diagram schematically illustrating a memory system 1000 according to some embodiments. Referring to fig. 1, a memory system 1000 may include a memory 100 and a controller 200.
The memory 100 may be configured to perform writing, reading, and erasing operations according to the control of the controller 200. In an embodiment, memory 100 may be, for example, a non-volatile memory or a volatile memory.
The controller 200 may be connected to a host (not shown) and the memory 100. The controller 200 may access the memory 100 in response to a request from a host. For example, the controller 200 may be configured to control write, read, and erase operations of the memory 100. The controller 200 may be configured to provide an interface between the memory 100 and a host. The controller 200 may be configured to drive firmware for controlling the memory 100.
The controller 200 may receive input data from a host. The controller 200 may encode input DATA (e.g., DATA shown in fig. 1) to generate encoded DATA data_c. The controller 200 may be configured to provide the control signal CTRL and the address ADDR to the memory 100. The controller 200 may be configured to exchange the encoded DATA data_c with the memory 100. The controller 200 may receive the encoded DATA data_c from the memory 100 to decode the encoded DATA data_c. The controller 200 may transmit the decoded DATA (e.g., DATA shown in fig. 1) to the host.
In an embodiment, memory system 1000 may include or may be implemented as a Solid State Drive (SSD) including a memory card form factor such as including Secure Digital (Secure Digital) and variants thereof, a standard Hard Disk Drive (HDD) form factor, including mini serial AT attachment (mSATA), PCI Express mini card, M.2, etc., a disk-on-module form factor with an interface such as parallel ATA (PATA) or SATA, a box form factor for applications such as rack-mounted systems, a die form factor including PCI (PCIe), mini PCIe, mini-in-line memory module (DIMM), MO-297, etc., and a ball grid array form factor.
The memory 100 may include, but is not limited to, a flash memory device, a NAND flash memory device, a phase change RAM (PRAM), a Ferroelectric RAM (FRAM), a Magnetic RAM (MRAM), and the like. Memory 100 may have a planar structure or a three-dimensional (3D) memory cell structure (with stacked memory cells). Each of the memory cells may include a level for storing a corresponding bit of data. The memory 100 may be implemented as, for example, a memory chip (e.g., a NAND chip). Although only one memory 100 is illustrated in fig. 1 for simplicity, the memory system 1000 may include several memory devices (e.g., memory chips) arranged in a variety of ways and connected to the controller 200 via a plurality of channels.
Fig. 2A is a block diagram schematically illustrating a controller 200 according to some embodiments. Referring to fig. 2A, the controller 200 may include a system bus 210, a processor 220, a RAM 230, a host interface 240, a memory interface 250, and an encoder/decoder 260. The controller 200 shown in fig. 2A may be or may correspond to the controller 200 shown in fig. 1.
The system bus 210 may provide a path between the components 220 through 260 of the controller 200. The processor 220 may control the overall operation of the controller 200. RAM 230 may be used as at least one of a working memory, a cache memory, and a buffer memory. The host interface 240 may communicate with an external device (e.g., host) via at least one of various communication standards such as USB (universal serial bus), MMC (multimedia card), PCI (peripheral component interconnect), PCI-E (PCI-express), ATA (advanced technology attachment), serial-ATA, parallel-ATA, SCSI (small computer small interface), ESDI (enhanced small disk interface), IDE (integrated drive electronics), and firewire.
The memory interface 250 may be connected with a memory device (e.g., memory 100 as shown in fig. 1). The memory interface 250 may include a NAND circuit interface or a NOR circuit interface.
The encoder/decoder 260 may perform encoding on data received from an external host and decoding on data received from the memory 100. For example, the encoder/decoder 260 may codec received input DATA (e.g., DATA as shown in fig. 1) to generate encoded DATA data_c. In addition, the encoder/decoder 260 may receive the encoded DATA data_c and decode the encoded DATA data_c to reconstruct or recover the DATA and output the reconstructed or recovered DATA (such as the DATA shown in fig. 1).
Fig. 2B is a block diagram schematically illustrating an example of an encoder/decoder 260 according to some embodiments. Referring to fig. 2A and 2B, the encoder/decoder 260 may include one or both of the encoder 202 and the decoder 204.
Encoder 202 may receive a plurality of information word bits. In an embodiment, the information word bits may be included in a symbol stream, which may include a plurality of symbols. The symbol stream and/or information word bits may be received, for example, from a host. Encoder 202 may perform encoding on the symbol stream and/or the information word bits to generate encoded values. In an embodiment, the encoded value may be, for example, an encoded or compressed bitstream that may include a plurality of prefix codes. The encoded values may be programmed at memory 100. The data programmed at the memory 100 may be read as encoded values. Decoder 204 may perform decoding on the read encoded values to generate a stream of information word bits and/or symbols, for example, by reconstructing or recovering the stream of information word bits and/or symbols received from the host.
In an embodiment, encoder 202 may be referred to as a compressor, decoder 204 may be referred to as a decompressor, and encoder/decoder 260 may be referred to as a compressor/decompressor. For example, the encoder 202 may compress the symbol stream into a compressed bitstream, and the decoder 204 may decompress the compressed bitstream to reconstruct or otherwise generate the symbol stream.
In an embodiment, one or more of the encoder 202 and decoder 204 may be included in a system or device other than the memory system 1000. For example, fig. 3 is a block diagram schematically illustrating an example of a communication system 3000 according to an embodiment. As can be seen in fig. 3, the communication system 3000 may include a transmitter 302 and a receiver 304 that may communicate with each other, for example, over a wired or wireless communication channel. In an embodiment, the transmitter 302 may include the encoder 202 shown in fig. 2B, and the encoder 202 may be used to generate a compressed bitstream based on the symbol stream. The transmitter 302 may transmit the compressed bit stream generated by the encoder 202 to the receiver 304 using a wired or wireless communication channel. In an embodiment, the receiver 304 may include the decoder 204 shown in fig. 2B that may decompress the compressed bitstream to reconstruct the symbol stream.
In an embodiment, one or more of the encoder 202 and decoder 204 may use entropy encoding techniques. For example, in an embodiment, the encoder 202 and decoder 204 may use one or more of arithmetic coding techniques and range coding techniques. These techniques may allow compression approaching the shannon limit. However, these techniques are often complex and may utilize several multiplication operations of the encoder 202 and several division operations of the decoder 204.
As another example, in an embodiment, encoder 202 and decoder 204 may use range asymmetric digital system (rANS) codec techniques. The rANS codec technique may also allow compression approaching the Shannon limit. However, these techniques are also somewhat complex and may utilize one division operation by the encoder 202 and one multiplication operation by the decoder 204. The rANS codec technique may involve stack encoding/decoding, which may involve a last-in-first-out (LIFO) operation. In addition, rANS codec techniques may allow for local parallel encoding and decoding.
As another example, in an embodiment, encoder 202 and decoder 204 may use a table asymmetric digital system (tANS) codec technique. the tANS codec technique may also allow compression approaching the shannon limit. In addition, the tANS codec technique may be less complex than the techniques discussed above, and may utilize shifting, addition, and table lookup operations of the encoder 202 and decoder 204. However, the tANS codec technique may utilize relatively large memory space and may use additional steps for creating the table. the tANS codec technique may involve stack encoding/decoding and may allow for local parallel encoding and decoding.
As yet another example, in an embodiment, encoder 202 and decoder 204 may use huffman codec techniques. The huffman codec technique may be optimal for the shannon limit within 1 bit per symbol. In addition, huffman codec techniques may be less complex than some of the techniques described above, and may utilize shifting and table lookup operations of encoder 202 and decoder 204. However, huffman codec techniques may be difficult to parallelize, especially in decoder 204.
More detailed examples of several of the techniques discussed above are provided below. In an embodiment, several of the techniques discussed above (e.g., arithmetic codec, range codec, tANS codec, and huffman codec) may use a static frequency table. The frequency table may be created by scanning the symbol stream to be encoded. The frequency table may then be compressed, typically by encoder 202, and sent to the bitstream as a prefix message. Decoder 204 may then decompress and reconstruct the frequency table before decoding the bit stream.
Fig. 4A is a diagram illustrating an example of a frequency table of example messages of "THIS IS HIS MESSAGE" in accordance with some embodiments. As can be seen in fig. 4A, each symbol included in the example message may be listed in a frequency table along with information regarding the number of times that symbol occurs within the example message. For example, as shown in fig. 4A, the frequency table may indicate that symbol "S" occurs 5 times in an example message, may indicate that symbol "I" occurs 3 times in an example message, and so on.
To perform encoding according to the example Huffman codec technique, after constructing the frequency table, a Huffman tree corresponding to the example message may be obtained. The process of building a huffman tree may begin by placing symbols into an ordered forest of single-node trees.
Fig. 4B is a diagram illustrating an example of a forest of single-node trees of example message "THIS IS HIS MESSAGE" in accordance with some embodiments. As can be seen in fig. 4B, each single-node tree includes symbols and corresponding frequencies.
Next, a recursive operation may be performed that includes selecting the two trees having the smallest frequency at the root, generating a new binary tree with the selected trees as subtrees, and storing the sum of their frequencies in the root.
When there is a tree, the recursion can end. This final tree may be referred to as a huffman tree, a huffman coding tree, or an optimal huffman tree. Fig. 4C is a diagram illustrating an example of a huffman tree of example message "THIS IS HIS MESSAGE" in accordance with some embodiments.
An important aspect of the huffman tree may involve a resulting log-frequency table derived from the length (e.g., bit length) of the prefix code that may be used to represent the symbol. As used herein, "S" may represent a symbol, "S" may represent a number of symbols ranging from 0 to S-1, "l s "may represent the length of the Huffman tree for each symbol, which may also correspond to the length of the prefix code used to represent each symbol," n "may represent l in symbol s s "p(s)" may represent the probability of each symbol, "F s "may represent the frequency count for each symbol," N "may represent the sum of the frequency counts, and" z s "may represent the logarithm of the frequency count for each symbol. In an embodiment, the logarithm of the frequency count may be referred to as a logarithmic frequency count, and may be, for example, equal to log 2 F s . As used herein, a logarithm may refer to a binary logarithm or a logarithm based on 2, unless otherwise indicated.
Because each decision in the huffman tree is binary, the following equation 1 may hold:
in addition, it can be known that equation 2 holds:
S p(s) =1 equation 2
Therefore, equations 3 to 5 can also be satisfied:
because F s Not less than 1 andequations 6 to 7 can be satisfied:
n=2 n equation 7
Thus, equation 8 may hold:
z s =n-l s equation 8
Thus, it can be seen that for each symbol, the logarithm of the frequency count for each symbol may be equal to the maximum length of the Huffman tree minus the length of the Huffman tree for each symbol. In other words, the logarithmic frequency count of each symbol may be equal to the difference between the maximum length of the prefix code and the length of the prefix code corresponding to each symbol.
Fig. 4D is a diagram illustrating an example of a table including information about huffman trees corresponding to example message "THIS IS HIS MESSAGE" in accordance with some embodiments. The table of fig. 4D lists each symbol and the corresponding length of the huffman tree for each symbol, the bit stream (or prefix code) used to represent each symbol, the frequency count for each symbol, and the logarithmic frequency count for each symbol.
To perform encoding according to the example rANS codec technique, the encoding function according to the example rANS codec technique may be expressed as the following equation 9:
in equation 9 above, C may represent the encoding function and x may representA positive integer of the state of the encoder 202 or decoder 204, B s The cumulative frequency count corresponding to symbol s may be represented, and N may represent the total frequency count. In an embodiment, element x may be referred to as a state value. Element B s The following equations 10 to 12 may be satisfied:
B 0 =0 equation 10
B s =B s-1 +F s-1 Equation 11
B S =n equation 12
To perform decoding according to the example rANS codec technique, a symbol s can be found that satisfies equation 13 below:
B s ≤x mod N≤B s+1 equation 13
Then, the decoding function D (x) may be applied to the current state value x according to the following equation 14:
To perform encoding of the streaming bitstream σ according to the example streaming rANS encoding technique, an output block of size B may be defined, and the value B may be defined such that B is equal to 2 b . Then, when the state value x is greater than or equal to 2 b F s When, the updated bit stream σ 'and the updated state value x' may be obtained by applying the following equations 15 to 16:
σ′=σ2 b +(x mod 2 b ) Equation 15
The encoding function of equation 9 may then be applied to the updated state value x'.
To perform decoding according to the example streaming rANS codec technique, a symbol s can be found that satisfies the following equation 17:
B s ≤x mod N≤B s+1 equation 17
Then, the decoding function D (x) of equation 14 may be applied to the updated state value x', and when the state value x is less than N, the following equations 18 to 19 may be applied:
x′=x2 b +(σ mod 2 b ) Equation 18
Unlike range codec techniques such as those discussed above, example streaming rANS codec techniques may have the property of accurately synchronizing encoder and decoder states. In addition, the symbol can be immediately known when decoding. As a result, the example rANS codec technique may automatically set the number of bits to read. This is in contrast to huffman coding, where the current bit sequence must be read to find the beginning of the symbol and the next bit sequence. This means that streams in the rANS can be interleaved even if decoded without metadata.
Another example codec technique is a parallel rrans codec technique. In an example parallel rANS codec technique, the various encoders and decoders can operate in parallel with a blocking coordination step size of O (log p). Each processor (e.g., each individual encoder or decoder) may exchange the block size and thus be able to write or read data in parallel.
Each of the above discussed codec techniques may have certain advantages and certain disadvantages. For example, none of the codec techniques discussed above exhibit all of the combination of low storage space requirements, low complexity, high throughput (e.g., by being easily parallelizable), and near optimal compression properties.
Thus, a codec technique according to some embodiments may be constructed in such a way that the advantages of several of the above-described codec techniques are utilized without exhibiting the same disadvantages. For example, a codec technique according to some embodiments may combine certain elements of a near optimal low complexity huffman codec technique with a local parallel implementation of a rANS codec technique. In embodiments, such a codec technique may be referred to as a huffman asymmetric digital system (hANS) codec technique.
In an embodiment, the hANS codec technique may use huffman codec trees to create a frequency table, which may enable near-optimal entropy encoding and decoding. In addition, the hANS codec technique may contain tables that transmit at a lower cost of transmission than, for example, static frequency tables transmitted using the rANS codec technique. In addition, the hANS codec technique may have a similar complexity as the tANS codec technique. In addition, similar to the rANS codec technique, the hANS codec technique may not require a pre-computed table as is the case with the tANS codec technique. In addition, the hANS codec technique may involve embedding symbol lengths in the output bitstream, which may allow the hANS codec technique to be easily parallelized.
Thus, embodiments related to the hANS codec technique (e.g., embodiments according to fig. 5-13C discussed below) may have low storage space requirements, low complexity, high throughput (e.g., by being easily parallelizable), and near optimal compression.
Fig. 5-13C relate to an example process of encoding and decoding data according to hANS codec technology and an example device configured to perform such a process. For example, fig. 6A-6B relate to a basic hANS encoder and fig. 7A-7B relate to a basic hANS decoder. Similarly, fig. 8A-8C relate to a streaming hANS encoder and fig. 9A-9C relate to a streaming hANS decoder. In addition, fig. 10 relates to a parallel hANS encoder and fig. 11 relates to a parallel hANS decoder. Fig. 12A-12C and 13A-13C relate to an example process of encoding and decoding data using hANS codec technology. In at least some embodiments, the elements illustrated in fig. 5-13C can be included in, for example, the memory system 1000 or the communication system 3000 discussed above.
In the following description, a representation of a grammar similar to the C programming language may be used. According to this representation, the symbol "<<"may mean an unsigned integer left shift. For example, the expression "x<<i' canTo the left of the integer "x" by "i", and the expression "1<<i "may be equal to the expression" 2 i ". In addition, the symbol'>>"may mean that the unsigned integer shifts to the right. For example, the expression "x>>i "may represent an integer" x "right shifted by" i "and may be equal to the expressionFurthermore, the symbol'&"may represent an unsigned integer bitwise AND operation, the symbol" + "may represent an integer addition operation, AND the symbol" - "may represent an integer subtraction operation. In an embodiment, element "F s "may refer to the frequency count corresponding to the symbol" s ", element" z s ]]"may refer to the logarithmic frequency count corresponding to the symbol" s "in the logarithmic frequency array" z "such that z [ s ]]=1<<F s Element "bs]"may refer to the cumulative frequency count corresponding to the symbol" s "of the cumulative frequency array" B "and the element" n "may refer to the logarithm of the sum of the logarithmic frequency arrays such that 1<<n=N。
In an embodiment, the hANS codec technique may involve determining or calculating the most significant set bits. In an embodiment, the most significant set bit of x may be represented as Fig. 5 is a diagram illustrating an example of an algorithm for determining the most significant set bits, according to some embodiments. As can be seen in fig. 5, the most significant set bit of the integer v may be denoted as mssb and may be calculated using only a bitwise shift operation and a bitwise OR operation.
In an embodiment, the hANS codec technique may contain a function for writing bits to the bitstream, which may be referred to as a writebits function. In an embodiment, the writebits function may be defined as follows:
void writebits(unsigned int*ptr,unsigned int src,unsigned int bits)。
in an embodiment, writebits may be a bit-oriented function that works as follows: at a given point in the bitstream (indicated by an integer value of ptr), least Significant Bit (LSB) bits of src are added and ptr+=bits are incremented. For example, if ptr=17, src=0101, bits=4, then the tail added to the bitstream will be 0101, and ptr will be incremented to 21.
In an embodiment, the hANS codec technique may include a function that reads bits from the bitstream, which may be referred to as a readbits function. In an embodiment, the readbits function may be defined as follows:
unsigned int readbits(unsigned int*ptr,unsigned int bits)。
in an embodiment, readbits may be a bit-oriented function that operates as follows: at a given point in the bitstream (indicated by an integer value of ptr), LSB bits of the bitstream are removed, the value is returned, and ptr- =bits are decremented. Thus, for the above example of ptr=21, bits=4, 0101 will be returned, and ptr=17.
For ease of description, examples of the hANS codec technology according to the embodiments are presented below in the order of the basic hANS codec technology, the streaming hANS codec technology, and the parallel hANS codec technology. Each of these techniques may be understood as being based on the former technique.
As discussed above, fig. 6A is a block diagram schematically illustrating an example of an encoder 612, which may be referred to as a basic hANS encoder, according to some embodiments. As can be seen in fig. 6A, the encoder 612 may receive the symbol s and the current state value x, and may apply the encoding function 614 to output an updated state value x'. Examples of encoding functions 614 that may be used for hANS codec are provided below.
Fig. 6B is a flow chart of a process 600 for encoding a symbol stream to generate a compressed bitstream according to some embodiments. In some implementations, one or more of the process blocks in fig. 6B may be performed by one or more of the elements discussed above (e.g., one or more of the encoder 612 and the elements included therein).
As shown in fig. 6B, process 600 may include finding a frequency table of the symbol stream (operation 601). For example, given a symbol stream having S symbols in the alphabet, where the symbol stream has a length of M, the frequency table may indicate the number of times a particular symbol S occurs in the symbol stream.
As further shown in fig. 6B, process 600 may include finding an optimal huffman tree (operation 602). In an embodiment, the best huffman tree may be found based on the frequency table. In an embodiment, each symbol s in the symbol stream may be assigned a prefix code such that the system (e.g., the encoder 612 or a storage system including the encoder 612 such as the memory system 1000 or the communication system 3000) implements shannon limits within 1 bit per symbol. In an embodiment, operation 602 may correspond to a step in a huffman coding technique. However, in an embodiment, process 600 may not include sending a table, such as a frequency table, or a prefix code tree to the decoder. Instead, in an embodiment, a table indicating only the length of the prefix code for each symbol s may be transmitted in process 600. In an embodiment, a log-frequency table may be sent in process 600 that may indicate the logarithm of the frequency count for each symbol s.
As further shown in fig. 6B, process 600 may include creating a table based on the length of the prefix code of each symbol s, which may be represented as i s, and n (operation 603). In an embodiment, this table may be referred to as a prefix length table. In an embodiment, the element z [ s ] that may represent the logarithm of the frequency count of the symbol s may be equal to n-l [ s ]. In an embodiment, 1< < z [ s ] may be a frequency count corresponding to symbol s, and the sum of the frequency counts of the symbol streams may be equal to 1< < n.
As further shown in fig. 6B, process 600 may include creating an accumulated frequency table (operation 604). In an embodiment, the cumulative frequency table may be created based on one or more of the tables created in operations 602 and 603. In an embodiment, the cumulative frequency count corresponding to symbol s may be denoted as bs. In an embodiment, for the cumulative frequency table, the following equations 20 to 22 may be established:
b [0] =0 equation 20
Bs=Bs-1+ (1 < < z s-1) equation 21
Bs=1 < < n > equation 22
As further shown in fig. 6B, process 600 may include iterating encoding function 614 for M symbols (operation 605). In an embodiment, the encoding function 614 may be represented as the following equation 23:
c (s, x) = ((x > > zs) < n) + (x & ((1 < < zs) 1)) +Bs equation 23
In an embodiment, the result of applying the encoding function 614 to the symbol s and the current state value x may be an updated state value x'. In an embodiment, the updated state value x' may be used as the current state value of the next symbol. In an embodiment, after iterating operation 605 for all of the M symbols, the final updated state value x' may be a compressed bit stream corresponding to the input symbol stream. In an embodiment, operation 605 may correspond to a low complexity version of the step in the rANS encoding technique.
As can be seen above, according to process 600, encoder 612 may generate a compressed bitstream based on the input symbol stream using only a table lookup operation, a bitwise shift operation, a bitwise AND operation, AND an addition operation.
As discussed above, fig. 7A is a block diagram schematically illustrating an example of a decoder 712, which may be referred to as a basic hANS decoder, in accordance with some embodiments. As can be seen in fig. 7A, the decoder 712 may receive a current state value x, may use a table lookup function 714 to obtain a corresponding symbol s from an inverse symbol table based on the current state value x, and may apply a decoding function 716 to output an updated state value x'. Examples of decoding functions 716 that may be used for hANS codec are provided below.
Fig. 7B is a flow chart of a process 700 for decoding a compressed bitstream to generate a symbol stream, according to some embodiments. In some implementations, one or more of the process blocks in fig. 7B may be performed by one or more of the elements discussed above (e.g., one or more of decoder 712 and the elements included therein).
As shown in fig. 7B, process 700 may include decoding a log-frequency table (operation 701). In an embodiment, decoder 712 may receive the compressed bitstream and a table indicating a prefix length of each symbol (e.g., the prefix length table of operation 603), and may decode the log-frequency table based on the compressed bitstream and the prefix length.
As further shown in fig. 7B, process 700 may include creating an accumulated frequency table (operation 702). In an embodiment, the cumulative frequency table may be created by summing 1< < z s.
As further shown in fig. 7B, process 700 may include creating an inverse symbol table (operation 703). In an embodiment, the inverse symbol table may correspond to the inverse symbol table used for the table lookup function 714. In an embodiment, the inverse symbol table may be created by finding S [ x & ((1 < < n) -1) ] such that the following equation 24 holds:
bs < = x & ((1 < < n) -1) < Bs+1 > equation 24
In an embodiment, operation 703 may correspond to a step in one or more of an arithmetic codec technique, a range codec technique, a rANS codec technique, and a tANS codec technique.
As further shown in fig. 7B, process 700 may include iterating the decoding operation for M symbols (operation 704). In an embodiment, operation 704 may include determining the symbol s by performing a table lookup function 714 and determining the updated state value x' by applying a decoding function 716.
In an embodiment, the table lookup function 714 may involve retrieving the symbol s from the inverse symbol table and may be performed according to equation 25 below:
s=s [ x ]
In an embodiment, the decoding function 716 may be represented according to the following equation 26:
d (x) = ((x > > n) < < z [ s ]) + (x & ((1 < < n) -1)) -bs [ s ] equation 26
In an embodiment, the result of applying the decoding function 716 to the symbol s and the current state value x may be an updated state value x'. In an embodiment, the updated state value x' may be used as the current state value of the next symbol. In an embodiment, after the iterative operation 704 for all of the M symbols, the final updated state value x' may be the start state of the encoder.
As can be seen above, according to process 700, decoder 712 may reconstruct or otherwise generate a symbol stream based on the compressed bitstream using only a table lookup operation, a bitwise shift operation, a bitwise AND operation, AND an addition operation.
As discussed above, fig. 8A-8C and fig. 9A-9C relate to streaming hANS codec technology. In particular, fig. 8A-8C relate to a streaming hANS encoding technique, according to some embodiments. In an embodiment, the streaming hANS coding technique may include determining a value d, which may indicate the number of bits of the current state value x to be output to the compressed bitstream. In an embodiment, the value d may be used to avoid branches that may refer to while loops used in the streaming rANS codec techniques discussed above. In an embodiment, if branching is avoided, the speed of the codec process may be increased.
In an embodiment, if n is an integer and x is a real number, the relationship between the value n and the current state value x corresponding to a specific symbol s may be represented according to the following equation 27:
if the n+1 constraint is removed, equation 27 can be expressed as equations 28 and 29 below:
based on the above, the following equations 30 to 36 can be used to find the minimum value d, so that d b-bit blocks are removed:
x′<2 b F s equation(s)30
Looking back at equation 27 above, it can be seen that the following equations 34 through 36 hold:
fig. 8A is a block diagram schematically illustrating an example of an encoder 812, which may be referred to as a streaming hANS encoder, in accordance with some embodiments. As can be seen in fig. 8A, encoder 812 may include an encoding function 614, and may also include a streaming function 814 and a writebits function 816. In an embodiment, encoder 812 may receive the current state value x and the symbol s, and may apply streaming function 814 to generate value d, output state value x m And an intermediate state value x. Encoder 812 may apply a writebits function 816 to value d and output state value x m To write the output bits to the compressed bit stream. Encoder 812 may then apply a coding function 614 to symbol s and intermediate state value x to generate updated state value x'.
Fig. 8B is a flow chart of a process 800 for encoding a symbol stream to generate a compressed bitstream according to some embodiments. In an embodiment, process 800 may be included in operation 605 of process 600 illustrated above in fig. 6B, or performed in lieu of operation 605. In some implementations, one or more of the process blocks in fig. 8B may be performed by one or more of the elements discussed above (e.g., one or more of encoder 812 and the elements included therein).
As shown in fig. 8B, process 800 may include setting a value B equal to 1< < a (operation 801). In an embodiment, the value b may represent the minimum bit length of the codeword of the encoder 812, and the logarithm of the minimum bit length of the codeword may be represented as a.
As further shown in fig. 8B, process 800 may include determining a most significant set bit of the current state value (operation 802). In an embodiment, the most significant set bits may be determined according to the algorithm illustrated in fig. 5.
As further shown in fig. 8B, process 800 may include determining whether a difference between the most significant set bit and a log frequency count z s of symbol s is greater than or equal to a value B (operation 803).
As further shown in fig. 8B, based on the difference being less than the value B (no at operation 803), the process 800 may proceed to operation 807, which may include applying the encoding function 614 to the current state value x and the symbol s to obtain an updated state value x'. In an embodiment, the updated state value x' may be used as the current state value corresponding to the next symbol.
As further shown in fig. 8B, based on the difference being greater than or equal to the value B (yes at operation 803), the process 800 may proceed to operation 804.
As further shown in fig. 8B, process 800 may include determining a value d, which may indicate a number of blocks of B bits to be transferred out of the current state value x and output to the compressed bitstream (operation 804).
As further shown in fig. 8B, process 800 may include inserting db bits (or, e.g., d<<a bits) to the compressed bitstream (operation 805). In an embodiment, encoder 812 may use writebits function 816 to base value d and the outputGo out state value x m Operation 805 is performed.
As further shown in fig. 8B, process 800 may include determining an intermediate state value x (operation 806). After performing operation 806, process 800 may proceed to operation 807, which may include applying encoding function 614 to intermediate state value x and symbol s to obtain updated state value x'. In an embodiment, the updated state value x' may be used as the current state value corresponding to the next symbol.
Fig. 8C is a diagram illustrating an example of an algorithm for determining the number of bits db to be shifted out of the current state value x and the intermediate state value x, in accordance with some embodiments. As can be seen in fig. 8C, the number of bits db AND the intermediate state value x may be obtained using only a bitwise shift operation, a bitwise subtraction operation, a bitwise AND operation, AND a bitwise OR operation. Thus, multiplication and while loop of the streaming rANS technique can be avoided.
Fig. 9A-9C relate to a streaming hANS decoding technique, according to some embodiments. In an embodiment, the streaming hANS decoding technique may include determining a value d, which may be different from the value d described above and may indicate the number of bits to be transferred from the compressed bitstream into the current state value x. In an embodiment, the value d may be used to avoid branching of the while loop used in, for example, the streaming rANS codec techniques discussed above. In an embodiment, if branching is avoided, the speed of the codec process may be increased.
In an embodiment, if n is an integer and x is a real number, the relationship between the value n and the current state value x corresponding to a specific symbol s may be expressed according to the following equations 37 to 39:
x<2 n equation 37
log 2 x<n equation 38
Looking back at equation 27 above, according to the streaming hANS decoding technique, the value d can be found such that the following equations 40 to 43 hold:
thus, the value d can be found according to the following equations 44 to 45:
fig. 9A is a block diagram schematically illustrating an example of a decoder 912, which may be referred to as a streaming hANS decoder, in accordance with some embodiments. As can be seen in fig. 9A, the decoder 912 may include a decoding function 716 and a table lookup function 714, and may also include a streaming function 914, a renormalization function 916, and a readbits function 918. In an embodiment, decoder 912 may receive current state value x, may apply table lookup function 714 to obtain symbol s, and may apply decoding function 716 to current state value x and symbol s to obtain intermediate state value x. In an embodiment, the decoder 912 may apply the streaming function 914 to obtain a value d, which may indicate the number of bits to be transferred into the intermediate state value x, and the ready function 918 to obtain the read bits from the value d. For example, in an embodiment, the number of bits to be transferred may be db bits. In an embodiment, the decoder 912 may apply a renormalization function 916 to the intermediate state values x to obtain renormalized state values, and may add the renormalized state values to the read bits to generate updated state values x'.
Fig. 9B is a flow diagram of a process 900 for decoding a compressed bitstream to generate a symbol stream, according to some embodiments. In an embodiment, process 900 may be included in operation 704 of process 700 illustrated above in fig. 7B, or performed in lieu of operation 704. In some implementations, one or more of the process blocks in fig. 9B may be performed by one or more of the elements discussed above (e.g., one or more of the decoder 912 and the elements included therein).
As shown in fig. 9B, process 900 may include setting a value B equal to 1< < a (operation 901). In an embodiment, the value b may represent the minimum bit length of the codeword of the decoder 912, and the logarithm of the minimum bit length of the codeword may be represented as a.
As further shown in fig. 9B, process 900 may include determining a symbol s (operation 902). In an embodiment, the decoder 912 may determine the symbol s by applying the table look-up function 714 to the current state value x.
As further shown in fig. 9B, process 900 may include applying decoding function 716 to the current state value x to obtain an intermediate state value x (operation 903). In an embodiment, the updated state value x' may be used as the current state value corresponding to the next symbol.
As further shown in fig. 9B, process 900 may include determining whether the difference between the value n and the most significant set bit of the current state value is greater than zero (operation 904).
As further shown in fig. 9B, based on the difference being less than or equal to zero (no at operation 904), process 900 may end and the intermediate state value x may be used as the updated state value x'.
As further shown in fig. 9B, based on the difference being greater than zero (yes at operation 904), process 900 may include determining a value d (operation 905), which may indicate the number of bits to transition into the intermediate state value x. For example, in an embodiment, the number of bits to be transferred may be db bits.
As further shown in fig. 9B, process 900 may include obtaining db read bits (or, e.g., d < < a read bits) from the compressed bitstream (operation 906). In an embodiment, the decoder 912 may obtain the read bits using the readbits function 918 based on the value d.
As further shown in fig. 9B, process 900 may include re-normalizing the intermediate state value x (operation 907). In an embodiment, the decoder 912 may apply a renormalization function 916 to obtain renormalized intermediate state values x.
As further shown in fig. 9B, process 900 may include adding the read bits to the renormalized intermediate state value x to obtain an updated state value x' (operation 908). In an embodiment, the updated state value x' may be used as the current state value corresponding to the next symbol.
Fig. 9C is a diagram illustrating an example of an algorithm for determining a value d, which may indicate the number of bits db to be transferred into the intermediate state value x, and for performing the readbits function and determining the updated state value x', according to some embodiments. As can be seen in fig. 9C, the value d, the read bit, AND the updated state value x' may be obtained using only a bitwise shift operation, a bitwise subtraction operation, AND a bitwise AND operation. Thus, division and while looping using the streaming rANS technique may not be used.
As discussed above, fig. 10-11 relate to parallel hANS codec technology. In an embodiment, such parallel hANS codec technique may correspond to a parallel approach that sums in parallel to find memory locations where bits are written/read concurrently. In an embodiment, the parallel version may take advantage of the fact that the streaming techniques discussed above keep the encoder and decoder in the same synchronization state. In fig. 10-11, an O (log p) parallel algorithm may be used to find the memory offset parameter O based on the value d. o (o) -1 A previous memory offset may be indicated.
Fig. 10 is a block diagram schematically illustrating an example of an encoder 1012, which may be referred to as a parallel hANS encoder, in accordance with some embodiments. As can be seen in fig. 10, the encoder 1012 may include a first stage 1014, a second stage 1016, and a third stage 1018. In an embodiment, the first stage 1014 may include a plurality of parallel streaming functions 814a, 814 b-814 p, and the third stage 1018 may include a plurality of parallel encoding functions 614a, 614 b-614 p and a plurality of parallel writebits functions 816a, 816 b-816 p. In an embodiment, the combination of parallel streaming function 814i, parallel encoding function 614i, and parallel writebits function 816i may be referred to as a parallel encoding processor.
In an embodiment, in first stage 1014, each of the parallel encoding processors may be assigned an independent symbol (e.g., symbol s 0 、s 1 S to s p-1 ) And an independent state (e.g., state x 0 、x 1 To x p-1 ). Then, in a second stage 1016, an independent memory offset (e.g., o 0 To o p-1 ) May be assigned to each of the parallel encoding processors. Then, in a third stage 1018, each of the parallel encoding processors may write bits in parallel to the compressed codeword and generate an updated state value, e.g., updated state value x' 0 、x' 1 To x' p-1
Fig. 11 is a block diagram schematically illustrating an example of a decoder 1112, which may be referred to as a parallel hANS decoder, according to some embodiments. As can be seen in fig. 11, the decoder 1112 may include a first stage 1114, a second stage 1116, and a third stage 1118. In an embodiment, the first stage 1114 may include a plurality of parallel table lookup functions 714a, 714b through 714p, a plurality of parallel decoding functions 716a, 716b through 716p, and a plurality of parallel streaming functions 914a, 914b through 914p, and the third stage 1118 may include a plurality of parallel renormalization functions 916a, 916b through 916p, and a plurality of parallel readbits functions 918a, 918b through 918p. In an embodiment, the combination of the parallel table lookup function 714i, the parallel decoding function 716i, the parallel streaming function 914i, the parallel renormalization function 916i, and the parallel readbits function 918i may be referred to as a parallel decoding processor.
In an embodiment, in the first stage 1114, each of the parallel decoding processors may be assigned an independent symbol (e.g., symbol s 0 、s 1 S to s p-1 ) And an independent state (e.g., state x 0 、x 1 To x p-1 ). Then, in a second stage 1116, an independent memory offset (e.g., o 0 To o p-1 ) May be assigned to each of the parallel decoding processors. Then, in a third stage 1118, each of the parallel decoding processors may read bits from the compressed codeword in parallel and generate an updated state value, such as updated state value x' 0 、x' 1 To x' p -1。
According to some embodiments, the encoder and decoder discussed above may be included in one or more of the memory system 1000, the communication system 3000, and any other system or device that involves compression and decompression or encoding and decoding of data, such as digital data. For example, according to some embodiments, one or more of encoder 612, encoder 812, and encoder 1012 may correspond to encoder 202 discussed above. Similarly, according to some embodiments, one or more of decoder 712, decoder 912, and decoder 1112 may correspond to decoder 204 discussed above.
Fig. 12A-12C are flowcharts of a process for compressing a symbol stream for storage in a memory device, according to some embodiments. In some implementations, one or more of the process blocks in fig. 12A-12C may be performed by one or more of the elements discussed above (e.g., one or more of encoder 202, encoder 612, encoder 812, and encoder 1012, and elements included therein).
Fig. 12A is a flow diagram of a process 1200A for compressing a symbol stream for storage in a memory device, in accordance with some embodiments.
As shown in fig. 12A, process 1200A may include obtaining a symbol stream (operation 1211). In an embodiment, the symbol stream may comprise a plurality of symbols.
As further shown in fig. 12A, process 1200A may include determining a huffman tree corresponding to the symbol stream (operation 1212). In an embodiment, each of the plurality of symbols may be assigned a corresponding prefix code among a plurality of prefix codes based on a huffman tree. In an embodiment, the Huffman tree may correspond to the final Huffman tree or the optimal Huffman tree discussed above. In an embodiment, each symbol may correspond to symbol s discussed above.
As further shown in fig. 12A, process 1200A may include generating a prefix length table based on the huffman tree (operation 1213). In an embodiment, the prefix length table may indicate the length of the corresponding prefix code for each symbol.
As further shown in fig. 12A, process 1200A may include generating a log-frequency table based on the prefix length table (operation 1214). In an embodiment, the log-frequency table may indicate the logarithm of the frequency count for each symbol.
As further shown in fig. 12A, process 1200A may include generating an accumulated frequency table (operation 1215). In an embodiment, the cumulative frequency table may indicate a cumulative frequency count corresponding to each symbol. In an embodiment, the cumulative frequency count may correspond to the cumulative frequency count bs discussed above.
As further shown in fig. 12A, process 1200A may include generating a compressed bitstream by iteratively applying an encoding function to a plurality of symbols based on a logarithmic frequency table and a cumulative frequency table (operation 1216).
As further shown in fig. 12A, process 1200A may include storing the compressed bitstream in a memory device (operation 1217). In an embodiment, the memory device may correspond to one or more of the memory system 1000 and the memory 100 discussed above.
In an embodiment, generating the log-frequency table may include: the length of the corresponding prefix code for each symbol is subtracted from the maximum length of the plurality of prefix codes. In an embodiment, the maximum length of the plurality of prefix codes may correspond to the value n discussed above.
In an embodiment, generating the cumulative frequency table may include: obtaining a frequency count for each symbol by left-shifting an integer value 1 based on the logarithm of the frequency count for each symbol; and obtaining an accumulated frequency count for each symbol by adding the frequency count for each symbol to a sum of frequency counts of previous symbols in the plurality of symbols. In an embodiment, the logarithm of the frequency count may correspond to the logarithm of the frequency count z s discussed above.
Fig. 12B is a flow diagram of a process 1200B for compressing a symbol stream for storage in a memory device, in accordance with some embodiments. In an embodiment, one or more of the operations of process 1200B may be combined with, included in, or performed in lieu of one or more of the operations of process 1200A. For example, one or more of the operations of process 1200B may be included in operation 1217 of process 1200A.
As shown in fig. 12B, process 1200B may include determining a most significant set bit of an initial state value (operation 1221).
As further shown in fig. 12B, process 1200B may include determining whether a difference between a most significant set bit of an initial state value corresponding to a symbol stream and a logarithm of a frequency count corresponding to each symbol is greater than or equal to a minimum bit length of a codeword corresponding to the symbol stream (operation 1222). In an embodiment, the minimum bit length of the codeword may correspond to the value b discussed above.
As further shown in fig. 12B, based on determining that the difference is less than the minimum bit length of the codeword (no at operation 1222), process 1200B may end. In an embodiment, as a result of the end of process 1200B, a current state value, which may correspond to the current state value x discussed above, may be determined as an initial state value.
As further shown in fig. 12B, based on determining that the difference is greater than or equal to the minimum bit length of the codeword (yes at operation 1222), process 1200B may include determining a third value by subtracting the logarithm of the frequency count for each symbol from the most significant set bit (operation 1223).
As further shown in fig. 12B, process 1200B may include obtaining a shifted third value by right shifting the third value based on a logarithm of a minimum bit length of the codeword (operation 1224). In an embodiment, the logarithm of the minimum bit length of the codeword may correspond to the value a discussed above.
As further shown in fig. 12B, process 1200B may include: the number of bits to be shifted out of the initial state value is determined by shifting the shifted third value left based on the logarithm of the minimum bit length of the codeword (operation 1225). In an embodiment, the determined number of bits may correspond to the value d < < a discussed above with respect to fig. 8B.
As further shown in fig. 12B, process 1200B may include outputting the determined number of bits to the compressed bitstream (operation 1226).
As further shown in fig. 12B, process 1200B may include obtaining the current state value by right shifting the initial state value based on the determined number of bits (operation 1227). In an embodiment, the current state value may correspond to the current state value x discussed above.
In an embodiment, an encoding function may be applied to a plurality of symbols in parallel by a plurality of processors, each processor of the plurality of processors may be assigned a corresponding initial state value in order to perform the encoding function, each processor may be assigned a corresponding memory location to output a determined number of bits after each processor determines a number of bits to be transferred from the corresponding initial state value, and after the determined number of bits is output to the compressed bit stream, process 1200B may include determining a corresponding current state value for each processor.
Fig. 12C is a flow diagram of a process 1200C for compressing a symbol stream for storage in a memory device, in accordance with some embodiments. In an embodiment, one or more of the operations of process 1200C may be combined with, included in, or performed in lieu of one or more of the operations of process 1200A. For example, one or more of the operations of process 1200C may be included in operation 1217 of process 1200A.
As shown in fig. 12C, process 1200C may include obtaining a current state value (operation 1231). In an embodiment, the current state value may correspond to the current state value x discussed above.
As further shown in fig. 12C, process 1200C may include obtaining a shifted state value by right shifting the current state value based on the logarithm of the frequency count of each symbol (operation 1232).
As further shown in fig. 12C, the process 1200C may include obtaining a first value by left shifting the shifted state value based on a maximum length of the plurality of prefix codes (operation 1233).
As further shown in fig. 12C, process 1200C may include obtaining a frequency count for each symbol by left shifting integer value 1 based on the logarithm of the frequency count (operation 1234).
As further shown in fig. 12C, process 1200C may include obtaining a second value by performing a bitwise AND operation on the current state value AND the frequency count of each symbol minus 1 (operation 1235).
As further shown in fig. 12C, process 1200C may include obtaining an updated state value by adding the first value, the second value, and the cumulative frequency count corresponding to each symbol (operation 1236). In an embodiment, the updated state value may correspond to the updated state value x' discussed above.
Fig. 13A-13C are flowcharts of a process for generating a symbol stream based on a compressed bitstream, according to some embodiments. In some implementations, one or more of the process blocks in fig. 13A-13C may be performed by one or more of the elements discussed above (e.g., one or more of decoder 204, decoder 712, decoder 912, and decoder 1112, and elements included therein).
Fig. 13A is a flow diagram of a process 1300A for generating a symbol stream based on a compressed bitstream, according to some embodiments.
As shown in fig. 13A, process 1300A may include obtaining a compressed bitstream from a memory (operation 1311). In an embodiment, the compressed bit stream may correspond to a plurality of symbols included in the symbol stream. In an embodiment, each symbol may correspond to symbol s discussed above.
As further shown in fig. 13A, process 1300A may include obtaining a log-frequency table from the compressed bitstream (operation 1312). In an embodiment, the log-frequency table may indicate a log of the frequency count for each of the plurality of symbols. In an embodiment, the logarithm of the frequency count may correspond to the logarithm of the frequency count z s discussed above.
As further shown in fig. 13A, process 1300A may include generating an accumulated frequency table based on the log-frequency table (operation 1313). In an embodiment, the cumulative frequency table may indicate a cumulative frequency count corresponding to each symbol. In an embodiment, the cumulative frequency count may correspond to the cumulative frequency count bs discussed above.
As further shown in fig. 13A, the process 1300A may include generating an inverse symbol table based on the log-frequency table and the cumulative-frequency table (operation 1314).
As further shown in fig. 13A, process 1300A may include generating a symbol stream by iteratively applying a decoding function to a plurality of symbols based on a cumulative frequency table and an inverse symbol table (operation 1315).
In an embodiment, generating the cumulative frequency table may include: obtaining a frequency count for each symbol by left-shifting the integer value 1 based on the logarithm of the frequency count for each symbol; and the cumulative frequency count for each symbol is obtained by adding the frequency count for each symbol to the sum of the frequency counts of previous symbols in the plurality of symbols.
In an embodiment, generating the inverse symbol table may include: an inverse symbol value is determined for each symbol by performing a bitwise AND operation on the current state value AND a maximum length of a plurality of prefix codes corresponding to the compressed bit stream minus 1, the inverse symbol value may be greater than or equal to an accumulated frequency count of each symbol, AND the inverse symbol value may be less than an accumulated frequency count of a next symbol. In an embodiment, the current state value may correspond to the current state value x discussed above. In an embodiment, the maximum length of the plurality of prefix codes may correspond to the value n discussed above.
Fig. 13B is a flow diagram of a process 1300B for generating a symbol stream based on a compressed bitstream, according to some embodiments. In embodiments, one or more of the operations of process 1300B may be combined with, included in, or performed in lieu of one or more of the operations of process 1300A. For example, one or more of the operations of process 1300B may be included in operation 1315 of process 1300A.
As shown in fig. 13B, the process 1300B may include obtaining each symbol based on an inverse symbol value corresponding to each symbol among the inverse symbol tables (operation 1321).
As further shown in fig. 13B, process 1300B may include obtaining a shifted state value by right shifting the current state value based on a maximum length of the plurality of prefix codes (operation 1322).
As further shown in fig. 13B, the process 1300B may include obtaining a first value by left-shifting the shifted state value based on the logarithm of the frequency count of each symbol (operation 1323).
As further shown in fig. 13B, the process 1300B may include obtaining a total frequency count by left-shifting the integer value 1 based on the maximum lengths of the plurality of prefix codes (operation 1324).
As further shown in fig. 13B, the process 1300B may include obtaining a second value by performing a bitwise AND operation on the current state value AND a maximum length of the plurality of prefix codes minus 1 (operation 1325).
As further shown in fig. 13B, the process 1300B may include obtaining an updated state value by subtracting the accumulated frequency count corresponding to each symbol from the sum of the first value and the second value (operation 1326). In an embodiment, the updated state value may correspond to the updated state value x' discussed above.
Fig. 13C is a flow diagram of a process 1300C for generating a symbol stream based on a compressed bitstream, according to some embodiments. In embodiments, one or more of the operations of process 1300C may be combined with, included in, or performed in lieu of one or more of the operations of process 1300A. For example, one or more of the operations of process 1300C may be included in operation 1315 of process 1300A.
As shown in fig. 13C, process 1300C may include: a difference between a maximum length of the plurality of prefix codes and a most significant set bit of an initial state value corresponding to the compressed bit stream is determined (operation 1331).
As further shown in fig. 13C, process 1300C may include determining if the difference is greater than 0 (operation 1332).
As further shown in fig. 13C, based on determining that the difference is not greater than 0 (no at operation 1332), process 1300C may end. In an embodiment, as a result of the end of process 1300C, a current state value, which may correspond to the current state value x discussed above, may be determined as an initial state value.
As further shown in fig. 13C, based on determining that the difference is greater than 0 (yes at operation 1332), process 1300C may include obtaining a third value by left-shifting integer value 1 based on a logarithm of a minimum bit length of a codeword corresponding to the symbol stream (operation 1333). In an embodiment, the logarithm of the minimum bit length of the codeword may correspond to the value a discussed above.
As further shown in fig. 13C, process 1300C may include obtaining a fourth value by adding the difference to the third value minus 1 (operation 1334).
As further shown in fig. 13C, process 1300C may include obtaining a shifted fourth value by right shifting the fourth value based on a logarithm of a minimum bit length of the codeword (operation 1335).
As further shown in fig. 13C, process 1300C may include: the number of bits to be transferred into the initial state value is determined by shifting the shifted fourth value to the left based on the logarithm of the minimum bit length of the codeword (operation 1336). In an embodiment, the determined number of bits may correspond to the value d < < a discussed above with respect to fig. 9B.
As further shown in fig. 13C, process 1300C may include obtaining additional bits from the compressed bitstream based on the determined number of bits (operation 1337).
As further shown in fig. 13C, process 1300C may include obtaining a shifted state value by shifting the initial state value to the left based on the determined number of bits (operation 1338).
As further shown in fig. 13C, process 1300C may include obtaining a current state value by adding the shifted state value and the additional bits (operation 1339). In an embodiment, the current state value may correspond to the current state value x discussed above.
Although fig. 6B, 7B, 8B, 9B, 12A-12C, and 13A-13C illustrate example blocks of various processes, in some implementations, one or more of these processes may include additional blocks, fewer blocks, different blocks, or differently arranged blocks than the blocks depicted in fig. 6B, 7B, 8B, 9B, 12A-12C, and 13A-13C. Additionally or alternatively, two or more of the blocks of the illustrated process may be performed in parallel or combined in any order.
In an embodiment, the decoding function may be applied to the plurality of symbols in parallel by the plurality of processors, each processor of the plurality of processors may be assigned a corresponding initial state value in order to perform the decoding function, each processor may be assigned a corresponding memory location from which to obtain additional bits after each processor determines the number of bits to transfer into the corresponding initial state value, and after obtaining additional bits from the compressed bitstream, the process 1300C may include determining a corresponding current state value for each processor.
Accordingly, the above-described embodiments may provide hANS codec techniques that may have many benefits over other codec techniques. According to some embodiments, the hANS codec technology may have low storage space requirements. For example, the hANS codec technique may not use an intermediate table of compression results as used in the tANS codec technique. In an embodiment, the hANS codec technique may use only the cumulative frequency table and the inverse symbol table.
According to some embodiments, the hANS codec technique may have low complexity. For example, the hANS codec technique may not use multiplication or division operations as used in the rANS codec technique or the range codec technique. In an embodiment, the hANS codec technique may be slightly more complex than the tANS codec technique because the bit width may be larger.
According to some embodiments, the hANS codec technique may be easily parallelized, similar to the rANS codec technique and the tANS codec technique, and near optimal compression may be achieved, similar to the huffman codec technique.
According to some embodiments, near-optimal compression of the hANS codec may be within one bit per symbol, and thus, may be more useful for worst-case compression than for general case compression or skew distribution. In an embodiment, the hANS codec technique may involve stack encoding/decoding, such as last in first out encoding/decoding, and thus special methods of reversing the symbol stream and bitstream may be used to avoid buffering in the decoder. In embodiments, the hANS codec technique may be ideal for memory compression such as embedded memory compression.
The foregoing disclosure provides illustration and description, but is not intended to be exhaustive or to limit the implementations to the precise form disclosed. Modifications and variations are possible in light of the above disclosure or may be acquired from practice of the implementations.
As used herein, the term "component" is intended to be broadly interpreted as hardware, software, firmware, or a combination thereof.
It will be apparent that the systems and/or methods described herein may be implemented in different forms of hardware, software, firmware, or combinations thereof. The actual specialized control hardware or software code used to implement the systems and/or methods is not limiting of the implementation. Thus, the operation and behavior of the systems and/or methods were described without reference to the specific software code-it being understood that software and hardware can be designed to implement the systems and/or methods based on the description herein.
Although specific combinations of features are recited in the claims and/or disclosed in the specification, such combinations are not intended to limit the disclosure of possible implementations. Indeed, many of these features may be combined in ways not specifically recited in the claims and/or disclosed in the specification. Although each dependent claim listed may refer directly to only one claim, disclosure of a possible implementation includes the combination of each dependent claim with each other claim in the claim set.
No element, act, or instruction used herein should be construed as critical or essential unless explicitly described as such. Furthermore, as used herein, the article "a" is intended to include one or more items and may be used interchangeably with "one or more". Furthermore, as used herein, the term "collection" is intended to include one or more items (e.g., related items, unrelated items, combinations of related and unrelated items, etc.), and can be used interchangeably with "one or more. Where only one item is intended, the term "a" or similar language is used. Furthermore, as used herein, the term "having" or variants thereof and the like are intended to be open-ended terms. Furthermore, unless explicitly stated otherwise, the phrase "based on" is intended to mean "based, at least in part, on".
Although one or more example embodiments have been described above with reference to the accompanying drawings, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope, at least in part, defined by the following claims.

Claims (24)

1. A memory device, comprising:
a memory; and
at least one processor configured to:
a symbol stream comprising a plurality of symbols is obtained,
determining a Huffman tree corresponding to the symbol stream, wherein each symbol of the plurality of symbols is assigned a corresponding prefix code of a plurality of prefix codes based on the Huffman tree,
generating a prefix length table based on the huffman tree, wherein the prefix length table indicates a length of the corresponding prefix code of each symbol,
generating a logarithmic frequency table based on the prefix length table, wherein the logarithmic frequency table indicates the logarithm of the frequency count for each symbol,
an accumulated frequency table indicating an accumulated frequency count corresponding to each symbol is generated,
generating a compressed bit stream by iteratively applying an encoding function to the plurality of symbols based on the log-frequency table and the cumulative-frequency table, and
Storing the compressed bit stream in the memory.
2. The memory device of claim 1, wherein to generate the log-frequency table, the at least one processor is configured to subtract the length of the corresponding prefix code for each symbol from a maximum length of the plurality of prefix codes.
3. The memory device of claim 1, wherein to generate the cumulative frequency table, the at least one processor is configured to:
obtaining the frequency count of each symbol by shifting left the integer value 1 based on the logarithm of the frequency count of each symbol, and
the cumulative frequency count for each symbol is obtained by adding the frequency count for each symbol to a sum of frequency counts of previous symbols in the plurality of symbols.
4. The memory device of claim 3, wherein to apply the encoding function to each symbol, the at least one processor is configured to:
a current state value is obtained and a current state value is obtained,
by right shifting the current state value based on the logarithm of the frequency count for each symbol, a shifted state value is obtained,
obtaining a first value by shifting the shifted state value to the left based on a maximum length of the plurality of prefix codes,
Obtaining the frequency count for each symbol by shifting left an integer value of 1 based on the logarithm of the frequency count,
obtaining a second value by performing a bitwise AND operation on the current state value AND the frequency count of each symbol minus 1, AND
An updated state value is obtained by adding the first value, the second value, and the cumulative frequency count corresponding to each symbol.
5. The memory device of claim 4, wherein to apply the encoding function to each symbol, the at least one processor is further configured to:
determining whether a difference between a most significant set bit of an initial state value corresponding to the symbol stream and the logarithm of the frequency count corresponding to each symbol is greater than or equal to a minimum bit length of a codeword corresponding to the symbol stream, and
based on determining that the difference is greater than or equal to the minimum bit length of the codeword:
determining a third value by subtracting the logarithm of the frequency count for each symbol from the most significant set bit,
obtaining a shifted third value by right shifting the third value based on the logarithm of the minimum bit length of the codeword,
Determining a number of bits to be shifted out of the initial state value by shifting the shifted third value to the left based on the logarithm of the minimum bit length of the codeword,
outputting the determined number of bits to the compressed bit stream, and
the current state value is obtained by right-shifting the initial state value based on the determined number of bits.
6. The memory device of claim 5, wherein the at least one processor comprises a plurality of processors configured to execute the encoding function on the plurality of symbols in parallel,
wherein, to execute the encoding function, each processor of the plurality of processors is assigned a corresponding initial state value,
wherein after each processor determines the number of bits to be transferred from the corresponding initial state value, each processor is allocated a corresponding memory location to output the determined number of bits, and
wherein after the determined number of bits is output to the compressed bitstream, each processor is further configured to determine a corresponding current state value.
7. A memory device, comprising:
a memory; and
at least one processor configured to:
Obtaining a compressed bit stream from the memory, wherein the compressed bit stream corresponds to a symbol stream comprising a plurality of symbols,
obtaining a log-frequency table from the compressed bit stream, wherein the log-frequency table indicates a log of a frequency count for each symbol of the plurality of symbols,
generating an accumulated frequency table based on the logarithmic frequency table, wherein the accumulated frequency table indicates an accumulated frequency count corresponding to each symbol,
generating an inverse symbol table based on the logarithmic frequency table and the cumulative frequency table, and
the symbol stream is generated by iteratively applying a decoding function to the plurality of symbols based on the cumulative frequency table and the inverse symbol table.
8. The memory device of claim 7, wherein to generate the cumulative frequency table, the at least one processor is configured to:
obtaining the frequency count of each symbol by shifting left the integer value 1 based on the logarithm of the frequency count of each symbol, and
the cumulative frequency count for each symbol is obtained by adding the frequency count for each symbol to a sum of frequency counts of previous symbols in the plurality of symbols.
9. The memory device of claim 7, wherein to generate the inverse symbol table, the at least one processor is configured to: determining an inverse symbol value for each symbol by performing a bitwise AND operation on a current state value AND a maximum length minus 1 of a plurality of prefix codes corresponding to the compressed bit stream,
wherein the inverse symbol value is greater than or equal to the cumulative frequency count for each symbol, an
Wherein the inverse symbol value is less than the cumulative frequency count of the next symbol.
10. The memory device of claim 9, wherein to apply the decoding function to each symbol, the at least one processor is configured to:
obtaining each symbol based on the inverse symbol value corresponding to each symbol in the inverse symbol table,
by right shifting the current state value based on the maximum lengths of the plurality of prefix codes, a shifted state value is obtained,
obtaining a first value by shifting the shifted state value to the left based on the logarithm of the frequency count for each symbol,
obtaining a total frequency count by shifting left an integer value 1 based on said maximum length of said plurality of prefix codes,
Obtaining a second value by performing a bitwise AND operation on the current state value AND the maximum length minus 1 of the plurality of prefix codes, AND
An updated state value is obtained by subtracting the cumulative frequency count corresponding to each symbol from the sum of the first value and the second value.
11. The memory device of claim 10, wherein to apply the decoding function to each symbol, the at least one processor is further configured to:
determining whether a difference between the maximum length of the plurality of prefix codes and a most significant set bit of an initial state value corresponding to the compressed bit stream is greater than 0, and
based on determining that the difference is greater than 0:
a third value is obtained by left-shifting the integer value 1 based on the logarithm of the minimum bit length of the code word corresponding to the symbol stream,
by adding the difference to the third value minus 1, a fourth value is obtained,
obtaining a shifted fourth value by right shifting the fourth value based on the logarithm of the minimum bit length of the codeword,
determining a number of bits to be transferred into the initial state value by shifting the shifted fourth value to the left based on the logarithm of the minimum bit length of the codeword,
Based on the determined number of bits, additional bits are obtained from the compressed bit stream,
obtaining a shifted state value by shifting the initial state value to the left based on the determined number of bits, and
the current state value is obtained by adding the shifted state value and the additional bit.
12. The memory device of claim 11, wherein the at least one processor comprises a plurality of processors configured to perform the decoding function on the plurality of symbols in parallel,
wherein, to perform the decoding function, each processor of the plurality of processors is assigned a corresponding initial state value,
wherein after each processor determines the number of bits to transfer into the corresponding initial state value, each processor is allocated a corresponding memory location from which to obtain the additional bits, an
Wherein after the additional bits are obtained from the compressed bitstream, each processor is further configured to determine a corresponding current state value.
13. A method of compressing a symbol stream for storage in a memory device, the method being performed by at least one processor and comprising:
Obtaining the symbol stream comprising a plurality of symbols;
determining a huffman tree corresponding to the symbol stream, wherein each symbol of the plurality of symbols is assigned a corresponding prefix code of a plurality of prefix codes based on the huffman tree;
generating a prefix length table based on the huffman tree, wherein the prefix length table indicates the length of the corresponding prefix code of each symbol;
generating a log frequency table based on the prefix length table, wherein the log frequency table indicates a log of the frequency count for each symbol;
generating an accumulated frequency table indicating accumulated frequency counts corresponding to each symbol;
generating a compressed bit stream by iteratively applying an encoding function to the plurality of symbols based on the log-frequency table and the cumulative-frequency table; and is also provided with
Storing the compressed bit stream in the memory device.
14. The method of claim 13, wherein the generating of the log-frequency table comprises: the length of the corresponding prefix code for each symbol is subtracted from a maximum length of the plurality of prefix codes.
15. The method of claim 13, wherein the generating of the cumulative frequency table comprises:
Obtaining the frequency count for each symbol by left-shifting an integer value of 1 based on the logarithm of the frequency count for each symbol; and is also provided with
The cumulative frequency count for each symbol is obtained by adding the frequency count for each symbol to a sum of frequency counts of previous symbols in the plurality of symbols.
16. The method of claim 15, wherein applying the encoding function to each symbol comprises:
obtaining a current state value;
obtaining a shifted state value by right shifting the current state value based on the logarithm of the frequency count of each symbol;
obtaining a first value by shifting the shifted state value to the left based on a maximum length of the plurality of prefix codes;
obtaining the frequency count for each symbol by left-shifting an integer value of 1 based on the logarithm of the frequency count;
obtaining a second value by performing a bitwise AND operation on the current state value AND the frequency count of each symbol minus 1; and is also provided with
An updated state value is obtained by adding the first value, the second value, and the cumulative frequency count corresponding to each symbol.
17. The method of claim 16, wherein applying the encoding function to each symbol further comprises:
determining whether a difference between a most significant set bit of an initial state value corresponding to the symbol stream and the logarithm of the frequency count corresponding to each symbol is greater than or equal to a minimum bit length of a codeword corresponding to the symbol stream; and is also provided with
Based on determining that the difference is greater than or equal to the minimum bit length of the codeword:
determining a third value by subtracting the logarithm of the frequency count for each symbol from the most significant set bit;
obtaining a shifted third value by right shifting the third value based on a logarithm of the minimum bit length of the codeword;
determining a number of bits to be shifted out of the initial state value by shifting the shifted third value to the left based on the logarithm of the minimum bit length of the codeword;
outputting the determined number of bits to the compressed bitstream; and is also provided with
The current state value is obtained by right-shifting the initial state value based on the determined number of bits.
18. The method of claim 17, wherein the at least one processor comprises a plurality of processors,
Wherein the encoding function is applied by the plurality of processors to the plurality of symbols in parallel,
wherein, to execute the encoding function, each processor of the plurality of processors is assigned a corresponding initial state value,
wherein after each processor determines the number of bits to be transferred from the corresponding initial state value, each processor is allocated a corresponding memory location to output the determined number of bits, and
wherein after the determined number of bits is output to the compressed bitstream, the method further comprises: a corresponding current state value for each processor is determined.
19. A method of generating a symbol stream based on a compressed bit stream, the method being performed by at least one processor and comprising:
obtaining the compressed bit stream from a memory, wherein the compressed bit stream corresponds to a plurality of symbols included in the symbol stream;
obtaining a log-frequency table from the compressed bit stream, wherein the log-frequency table indicates a log of a frequency count for each symbol of the plurality of symbols;
generating an accumulated frequency table based on the logarithmic frequency table, wherein the accumulated frequency table indicates an accumulated frequency count corresponding to each symbol;
Generating an inverse symbol table based on the log-frequency table and the cumulative-frequency table; and is also provided with
The symbol stream is generated by iteratively applying a decoding function to the plurality of symbols based on the cumulative frequency table and the inverse symbol table.
20. The method of claim 19, wherein the generating of the cumulative frequency table comprises:
obtaining the frequency count for each symbol by left-shifting an integer value of 1 based on the logarithm of the frequency count for each symbol; and is also provided with
The cumulative frequency count for each symbol is obtained by adding the frequency count for each symbol to a sum of frequency counts of previous symbols in the plurality of symbols.
21. The method of claim 19, wherein the generating of the inverse symbol table comprises: determining an inverse symbol value for each symbol by performing a bitwise AND operation on a current state value AND a maximum length minus 1 of a plurality of prefix codes corresponding to the compressed bit stream,
wherein the inverse symbol value is greater than or equal to the cumulative frequency count for each symbol, an
Wherein the inverse symbol value is less than the cumulative frequency count of the next symbol.
22. The method of claim 21, wherein applying the decoding function to each symbol comprises:
obtaining each symbol based on the inverse symbol value corresponding to each symbol in the inverse symbol table;
obtaining a shifted state value by right-shifting the current state value based on the maximum lengths of the plurality of prefix codes;
obtaining a first value by left-shifting the shifted state value based on the logarithm of the frequency count for each symbol;
obtaining a total frequency count by left-shifting an integer value 1 based on the maximum lengths of the plurality of prefix codes;
obtaining a second value by performing a bitwise AND operation on the current state value AND the maximum length minus 1 of the plurality of prefix codes; and is also provided with
An updated state value is obtained by subtracting the cumulative frequency count corresponding to each symbol from the sum of the first value and the second value.
23. The method of claim 22, wherein applying the decoding function to each symbol further comprises:
determining whether a difference between the maximum length of the plurality of prefix codes and a most significant set bit of an initial state value corresponding to the compressed bitstream is greater than 0; and is also provided with
Based on determining that the difference is greater than 0:
obtaining a third value by left-shifting the integer value 1 based on the logarithm of the minimum bit length of the codeword corresponding to the symbol stream;
obtaining a fourth value by adding the difference to the third value minus 1;
obtaining a shifted fourth value by right shifting the fourth value based on the logarithm of the minimum bit length of the codeword;
determining a number of bits to transfer into the initial state value by shifting the shifted fourth value to the left based on the logarithm of the minimum bit length of the codeword;
obtaining additional bits from the compressed bit stream based on the determined number of bits;
obtaining a shifted state value by shifting the initial state value to the left based on the determined number of bits; and is also provided with
The current state value is obtained by adding the shifted state value and the additional bit.
24. The method of claim 23, wherein the at least one processor comprises a plurality of processors,
wherein the decoding function is applied by the plurality of processors to the plurality of symbols in parallel,
wherein, to perform the decoding function, each processor of the plurality of processors is assigned a corresponding initial state value,
Wherein after each processor determines the number of bits to transfer into the corresponding initial state value, each processor is allocated a corresponding memory location from which to obtain the additional bits, an
Wherein after the additional bits are obtained from the compressed bitstream, the method further comprises: a corresponding current state value for each processor is determined.
CN202310845805.5A 2022-07-12 2023-07-11 Memory device, compression method of symbol stream and generation method Pending CN117394865A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US63/388,352 2022-07-12
US17/939,643 2022-09-07
US17/939,643 US20240022260A1 (en) 2022-07-12 2022-09-07 Low complexity optimal parallel huffman encoder and decoder

Publications (1)

Publication Number Publication Date
CN117394865A true CN117394865A (en) 2024-01-12

Family

ID=89469035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310845805.5A Pending CN117394865A (en) 2022-07-12 2023-07-11 Memory device, compression method of symbol stream and generation method

Country Status (1)

Country Link
CN (1) CN117394865A (en)

Similar Documents

Publication Publication Date Title
US20180203797A1 (en) Compression and Decompression of Data at High Speed in Solid State Storage
US9209832B2 (en) Reduced polar codes
TWI533304B (en) Methods and apparatus for storing data in a multi-level cell flash memory device with cross-page sectors, multi-page coding and per-page coding
KR101759658B1 (en) Memory device and memory system
US8769374B2 (en) Multi-write endurance and error control coding of non-volatile memories
WO2018142391A1 (en) Device, system and method of implementing product error correction codes for fast encoding and decoding
US8527849B2 (en) High speed hard LDPC decoder
KR102275717B1 (en) Flash memory system and operating method thereof
US8321746B2 (en) Systems and methods for quasi-cyclic LDPC code production and decoding
JP2012525062A5 (en)
US10303402B2 (en) Data compression using partial statistics
CN111869111B (en) Generating and using reversible shortened bose-charderry-hokumq codewords
KR20160150036A (en) Memory system for partial page compression
TWI536749B (en) Decoding method, memory storage device and memory controlling circuit unit
US10942805B2 (en) Error correcting circuit performing error correction on user data and error correcting method using the error correcting circuit
CN117394865A (en) Memory device, compression method of symbol stream and generation method
US20240022260A1 (en) Low complexity optimal parallel huffman encoder and decoder
KR20210001927A (en) Generalized concatenated error correction coding scheme with locality
US10819374B2 (en) Accelerated processing for maximum distance separable codes using composite field extensions
US11411584B2 (en) Data storage device channel encoding current data using redundancy bits generated over preceding data
CN105915234B (en) Scheme for avoiding error correction of turbo product code
KR101496052B1 (en) Decoding circuit and method for improved performance and lower error floors of block-wise concatenated BCH codes with cyclic shift of constituent BCH codes
US8508391B1 (en) Code word formatter of shortened non-binary linear error correction code
US11855772B2 (en) High throughput polar ECC decoding via compressed successive cancellation algorithm
US8284840B2 (en) Video decoding device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication