WO2023045204A1 - Method and system for generating finite state entropy coding table, medium, and device - Google Patents

Method and system for generating finite state entropy coding table, medium, and device Download PDF

Info

Publication number
WO2023045204A1
WO2023045204A1 PCT/CN2022/074614 CN2022074614W WO2023045204A1 WO 2023045204 A1 WO2023045204 A1 WO 2023045204A1 CN 2022074614 W CN2022074614 W CN 2022074614W WO 2023045204 A1 WO2023045204 A1 WO 2023045204A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
character
state
occurrence
column
Prior art date
Application number
PCT/CN2022/074614
Other languages
French (fr)
Chinese (zh)
Inventor
吴睿振
秦臻
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023045204A1 publication Critical patent/WO2023045204A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Definitions

  • the present application relates to the technical field of data compression, in particular to a method and system for generating a finite state entropy coding table.
  • Lossless data compression can be divided into algorithm based on data statistics and algorithm based on dictionary according to the principle of compression.
  • algorithms based on data statistics include Shannon-Fano coding, Huffman (Hufman) coding, arithmetic coding, run-length coding, and finite state entropy coding (finite state entropy, FSE), etc.; algorithms based on dictionaries include LZ77 (Lempel- Ziv77) encoding and LZ78 (Lempel-Ziv 78) encoding, etc.
  • Zstd Zstandard
  • Zstd Zika-Fi Protected Access
  • Hufman encoding Hufman encoding
  • FSE Fast lossless compression algorithm
  • Zstd has better compression performance.
  • Zstd is an open source algorithm that provides 22 compression levels for weighing compression speed and compression rate, and has been widely used in Linux kernel, FreeBSD operating system, and AWS Redshift data warehouse and other fields.
  • the lossless compression technology based on software implementation has the advantages of high flexibility, universality and low cost, but the software execution method can only be executed sequentially, resulting in the central processing unit (CPU) being occupied for a long time when processing massive data. , so that the compression speed is greatly reduced, and it is difficult to meet the demand for real-time compression processing of massive data in specific application fields.
  • Using hardware to implement is an effective way to solve the above problems. Benefiting from the inherent parallel processing characteristics of hardware, it can achieve the purpose of improving transmission speed, resource utilization and security.
  • the purpose of this application is to propose a method and system for generating a finite state entropy coding table, so as to solve the problem in the prior art that there is no hardware acceleration solution for finite state entropy coding.
  • the application provides a method for generating a finite state entropy coding table, comprising the following steps:
  • the frequency of occurrence of each character in the data block is obtained based on the proportion of each character in the data block to be encoded
  • the state quantity of the corresponding column of each character in the finite state entropy coding table is obtained, and the space of the finite state entropy coding table is formed based on the state quantity of the corresponding column of each character and the frequency of occurrence of each character surface;
  • the initial value of each blank is obtained based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character, and the initial table of the finite state entropy coding table is obtained based on the filling of the initial value of each blank;
  • the initial value of the nth column of the mth row is greater than the initial value of the m-1th row's nth column
  • the initial value of the mth row's nth column is used as its temporary value, and the temporary value of the mth row's nth column is judged Is it repeated with the initial value arranged before it;
  • the temporary value of row m and column n is used as its state value
  • a state table of finite state entropy encoding tables is generated based on all state values.
  • the method also includes:
  • the value after the initial value of the m-1th row of the nth column is increased by the preset increment value as the mth row Temporary value for column n.
  • the method also includes:
  • the method also includes:
  • the largest numerical value among all state values is replaced by the maximum state value to generate a state table of the finite state entropy encoding table.
  • the number of states in the columns corresponding to each character in the finite state entropy coding table includes:
  • the preset maximum state value is multiplied by the frequency of occurrence of each character to respectively obtain the state quantity of the column corresponding to each character in the finite state entropy coding table.
  • forming an empty table of the finite state entropy encoding table based on the number of states of each character's corresponding column and the frequency of occurrence of each character includes:
  • the number of states of each character is used as the number of spaces in the corresponding column of the finite state entropy coding table, and the order between the columns is arranged based on the frequency of occurrence of each character to form an empty table of the finite state entropy coding table .
  • arranging the order of the columns based on the frequency of occurrence of each character includes:
  • obtaining the initial value of each blank based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character includes:
  • the ratio of the number of rows where each space is located in the empty table to the frequency of occurrence of the corresponding character is rounded towards zero to obtain the initial value of each space.
  • traversing the initial table includes:
  • the initial table is traversed in order from left to right and top to bottom.
  • n is an integer greater than or equal to 1.
  • a system for generating a finite state entropy coding table including:
  • the frequency of occurrence obtaining module is configured to obtain the frequency of occurrence of each character in the data block based on the proportion of each character in the data block to be encoded;
  • Empty table obtaining module configured to obtain the state quantity of each character corresponding column in the finite state entropy coding table based on the preset maximum state value and the frequency of occurrence of each character, and based on the state quantity of each character corresponding column and the frequency of occurrence of each character form an empty list of finite state entropy encoding tables;
  • the initial table obtaining module is configured to obtain the initial value of each blank based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character, and obtain the initial table of the finite state entropy coding table based on the filling of the initial value of each blank ;
  • the first judging module is configured to traverse the initial table, and judge whether the initial value of the mth row and the nth column traversed is greater than the initial value of the m-1th row and the nth column;
  • the second judging module is configured to respond to the initial value of the nth column of the mth row being greater than the initial value of the m-1th row of the nth column, taking the initial value of the mth row of the nth column as its temporary value, and judging the first Whether the temporary value of column n in row m is repeated with the initial value arranged before it;
  • a state value module configured to use the temporary value of the nth row of the mth row as its state value in response to the fact that the temporary value of the mth row and the nth column is not repeated with the initial value arranged before it;
  • the third judging module is configured to obtain all the state values of the finite state entropy coding table in response to the completion of the traversal, and judge whether there is a maximum state value among all the state values;
  • a state table generating module configured to generate a state table of a finite state entropy encoding table based on all state values in response to having a maximum state value.
  • the number of states in the columns corresponding to each character in the finite state entropy coding table includes:
  • the preset maximum state value is multiplied by the frequency of occurrence of each character to respectively obtain the state quantity of the column corresponding to each character in the finite state entropy coding table.
  • forming an empty table of the finite state entropy encoding table based on the number of states of each character's corresponding column and the frequency of occurrence of each character includes:
  • arranging the order of the columns based on the frequency of occurrence of each character includes:
  • obtaining the initial value of each blank based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character includes:
  • the ratio of the number of rows where each space is located in the empty table to the frequency of occurrence of the corresponding character is rounded towards zero to obtain the initial value of each space.
  • n is an integer greater than or equal to 1.
  • a computer-readable storage medium which stores computer program instructions, and implements any one of the above-mentioned methods when the computer program instructions are executed by a processor.
  • a computer device including a memory and a processor, where a computer program is stored in the memory, and when the computer program is executed by the processor, any one of the above-mentioned methods is executed.
  • the method for generating the finite state entropy coding table of the present application has a simple and convenient data storage form, which can be stored in the form of a one-dimensional array without using a linked list similar to the Huffman tree binary tree, which can greatly reduce the memory space;
  • the generation method of the finite state entropy encoding table of the present application is applicable to the FSE compression and decompression implemented by hardware under the Zstd specification standard, mainly using a comparator and an adder, and the calculation is relatively simple, thereby effectively reducing hardware resource overhead and improving hardware utilization;
  • FIG. 1 is a schematic diagram of a method for generating a finite state entropy coding table provided according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a data block to be encoded according to an embodiment of the present application
  • Fig. 3 is the schematic diagram of the FSE Table initial table provided according to the embodiment of the present application.
  • Fig. 4 is a schematic diagram of the FSE Table calculation traversal process provided according to the embodiment of the present application.
  • Fig. 5 is the schematic diagram of the FSE Table iterative calculation I provided according to the embodiment of the application.
  • Fig. 6 is the schematic diagram of the FSE Table iterative calculation II provided according to the embodiment of the application.
  • Fig. 7 is the schematic diagram of the FSE Table that traversal is completed according to the embodiment of the present application.
  • Fig. 8 is the schematic diagram of the FSE Table state table provided according to the embodiment of the present application.
  • FIG. 9 is a schematic diagram of a system for generating a finite state entropy coding table according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a computer-readable storage medium for implementing a method for generating a finite state entropy coding table according to an embodiment of the present application
  • FIG. 11 is a schematic diagram of a hardware structure of a computer device implementing a method for generating a finite state entropy coding table according to an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an embodiment of a method for generating a finite state entropy coding table provided by the present application. As shown in Figure 1, the embodiment of the present application includes the following steps:
  • Step S10 obtain the frequency of occurrence of each character in the data block based on the proportion of each character in the data block to be encoded
  • Step S20 based on the preset maximum state value and the frequency of occurrence of each character, the number of states in the column corresponding to each character in the finite state entropy coding table is obtained, and a finite state entropy code is formed based on the number of states in the column corresponding to each character and the frequency of appearance of each character empty list of tables;
  • Step S30 obtain the initial value of each blank based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character, and obtain the initial table of the finite state entropy coding table based on filling the initial value of each blank;
  • Step S40 traversing the initial table, and judging whether the initial value of the mth row and the nth column traversed is greater than the initial value of the m-1th row and the nth column;
  • Step S50 in response to the fact that the initial value of the nth column of the mth row is greater than the initial value of the m-1th row of the nth column, the initial value of the mth row of the nth column is used as its temporary value, and the mth row of the nth column is judged Whether the temporary value of is a duplicate of the initial value arranged before it;
  • Step S60 in response to the fact that the temporary value of row m and column n does not overlap with the initial value arranged before it, take the temporary value of row m and column n as its state value;
  • Step S70 in response to the completion of the traversal, obtain all state values of the finite state entropy coding table, and determine whether there is a maximum state value among all state values;
  • Step S80 in response to having the largest state value, generate a state table of the finite state entropy coding table based on all state values.
  • n is an integer greater than or equal to 1.
  • FSE belongs to an entropy encoding of tANS (table asymmetric numerical systems) in asymmetric numerical systems (asymmetric numerical systems, ANS).
  • tANS table asymmetric numerical systems
  • asymmetric numerical systems asymmetric numerical systems
  • the method for generating the finite state entropy coding table of the embodiment of the present application has a simple and convenient data storage form, and only uses a one-dimensional array to store, without using a Huffman tree binary tree linked list storage, which can greatly reduce the memory space;
  • the generation method of the finite state entropy encoding table in the embodiment of the present application is suitable for hardware-implemented FSE (finite state entropy encoding) compression and decompression under the Zstd (fast lossless compression algorithm) standard, mainly using comparators and adders , the calculation is relatively simple, thereby effectively reducing hardware resource overhead and improving hardware utilization; further improving the compression and decompression speed of the Zstd algorithm, and meeting the increasing demand for compression performance in specific application fields.
  • FSE finite state entropy encoding
  • All calculations and storage methods involved in the generation method of the finite state entropy encoding table in the embodiment of the present application can not only be implemented as hardware, but also can improve the efficiency of software calculations.
  • Various implementation forms make the application more flexible. If it is realized by hardware, it can become a hardware acceleration technology for network data storage, which can accelerate the compression of data based on limited entropy encoding and effectively reduce the load on the server CPU (central processing unit). It can focus on data compression acceleration to help improve the performance of data centers.
  • the method further includes: increasing the initial value of the m-1th row and the nth column by a preset The value after the increment value is used as the temporary value of the mth row and the nth column.
  • the preset increment value is an integer greater than 0.
  • the method further includes: in response to the temporary value of the mth row and the nth column being repeated with the initial value arranged before it, increasing the mth row mth column nth temporary value by a preset increment value as the updated until the updated temporary value does not overlap with the initial value arranged before it, and the updated temporary value is used as the state value of the mth row and the nth column.
  • the value of each space traversed will be compared with the value of the previous row in the same column and the query of the repeated value arranged before, and then the corresponding operation will be performed based on the comparison result to meet the corresponding conditions, and based on the query result Do the corresponding operation to meet the corresponding condition before traversing the next space.
  • the method further includes: in response to no maximum state value, replacing the largest numerical value among all state values with the maximum state value to generate a state table of the finite state entropy encoding table.
  • obtaining the number of states in the column corresponding to each character in the finite state entropy coding table based on the preset maximum state value and the frequency of occurrence of each character includes: multiplying the preset maximum state value and the frequency of occurrence of each character, To respectively obtain the state quantity of the column corresponding to each character in the finite state entropy coding table.
  • forming an empty table of the finite state entropy coding table based on the number of states of the corresponding column of each character and the frequency of occurrence of each character includes: using the number of states of each character as the space of the corresponding column of the finite state entropy coding table number, and arrange the order among the columns based on the frequency of occurrence of each character to form an empty table of the finite state entropy coding table.
  • arranging the order of the columns based on the frequency of occurrence of each character includes: arranging the corresponding columns from left to right in order of the frequency of occurrence of each character from large to small.
  • obtaining the initial value of each blank based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character includes: calculating the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character The ratio of is rounded towards zero to get the initial value of each space.
  • traversing the initial table includes: traversing the initial table in order from left to right and from top to bottom.
  • the generation method of the finite state entropy coding table of an exemplary embodiment of the present application is as follows:
  • the first step count and sort the frequency of occurrence of Symbol (character) in the data block Text to be encoded as shown in Figure 2, the following table can be obtained:
  • the second step The encoding and decoding process of FSE (Finite State Entropy, finite state entropy encoding) needs to use the core table FSE Table, the primary task of its generation method is to initially calculate the state of each symbol corresponding column in the finite state entropy encoding table Number:
  • the calculation method of the number of states is the product of the maximum number of states and the frequency of occurrence of each symbol.
  • the maximum number of states is taken as 31 (the larger the number of states in practical applications, the closer the compression effect will be to the limit compression rate).
  • Step 3 According to the number of states in the column where each Symbol is located in the FSE Table calculated in the previous step, and the frequency of occurrence of each Symbol, an empty table with a determined size and shape can be formed. Then calculate the value in the space of each row separately, and the calculation rules for the value of each cell are as follows:
  • Val Num_row,x (Num_row/P(S f )) round
  • Step 4 Adjust the values of the elements in the initial table.
  • the entropy coded state table needs to satisfy the following two characteristics: (1) each value in the table is unique (that is, there is no duplication); (2) each column is sorted according to the value from small to large. Now adjust the initial table based on the traversal comparison method, and the traversal order is from left to right and from top to bottom, that is, the schematic diagram of the FSE Table calculation traversal process shown in Figure 4.
  • element 6 in the third row of column f is repeated with element 6 in the first row of column e, and the element in the third row of column f is replaced by 7; after that, it is found that element 7 in the first row of column d is replaced by 7 , replace the element in the third row of column f with 8; after that, it is found that the element 8 in the first row of column c is repeated after replacing it with 8, and the element in the third row of column f is replaced with 9; it is found that after replacing it with 9, it is no longer If there is duplication, the elements in the third row of column f are replaced.
  • Step 5 Arrange the maximum status value in the FSE Table. This operation is to insert the maximum status value 31 in this example into the FSE Table, and this maximum value must exist in the FSE Table. Observe that 31 does not appear in the currently generated FSE Table, and there is a maximum value of 30. At this time, replace 30 with 31, and the schematic diagram of the FSE Table status table shown in Figure 8 is obtained.
  • FIG. 9 is a schematic diagram of an embodiment of a system for generating a finite state entropy coding table provided by the present application.
  • a generation system of a finite state entropy encoding table includes: an occurrence frequency obtaining module 10 configured to obtain the proportion of each character in the data block based on the number ratio of each character in the data block to be encoded.
  • Empty table acquisition module 20 configured to obtain the state quantity of each character corresponding column in the finite state entropy coding table based on the preset maximum state value and the frequency of occurrence of each character, and based on the state of each character corresponding column Quantity and the frequency of occurrence of each character form the empty table of finite state entropy coding table;
  • Initial table obtains module 30, is configured to obtain the initial value of each space based on the number of rows where each space is located in the empty table and the frequency of occurrence of the corresponding character , and obtain the initial table of the finite state entropy coding table based on the filling of the initial value of each space;
  • the first judging module 40 is configured to traverse the initial table, and judge whether the initial value of the mth row nth column traversed is greater than the mth ⁇ The initial value of the nth column of row 1;
  • the second judging module 50 configured to respond to the initial value of the nth column of the m
  • FIG. 10 shows a schematic diagram of a computer-readable storage medium for implementing a method for generating a finite state entropy encoding table according to an embodiment of the present application.
  • computer readable storage medium 3 stores computer program instructions 31 .
  • the computer program instructions 31 are executed by the processor, the method of any one of the above-mentioned embodiments is realized.
  • the fourth aspect of the embodiment of the present application also provides a computer device, including a memory 402 and a processor 401 as shown in FIG. 11 , the memory 402 stores a computer program, and the computer program is executed by the processor 401 When implementing the method of any one of the above embodiments.
  • FIG. 11 it is a schematic diagram of a hardware structure of an embodiment of a computer device implementing a method for generating a finite state entropy coding table provided by the present application.
  • the computer equipment includes a processor 401 and a memory 402 , and may also include: an input device 403 and an output device 404 .
  • the processor 401, the memory 402, the input device 403, and the output device 404 may be connected through a bus or in other ways, and connection through a bus is taken as an example in FIG. 11 .
  • the input device 403 can receive input numbers or character information, and generate key signal input related to user settings and function control of the generation system of the finite state entropy coding table.
  • the output device 404 may include a display device such as a display screen.
  • the memory 402 as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs and modules, such as the generation of the finite state entropy coding table in the embodiment of the present application
  • the program instruction/module to which the method corresponds may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store the data created by using the method for generating the finite state entropy coding table wait.
  • the memory 402 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage devices.
  • the memory 402 may optionally include memory that is remotely located relative to the processor 401, and these remote memories may be connected to the local module through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the processor 401 executes various functional applications and data processing of the server by running the non-volatile software programs, instructions and modules stored in the memory 402, that is, realizes the generation method of the finite state entropy encoding table in the above method embodiment.
  • nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory memory.
  • Volatile memory can include random access memory (RAM), which can act as external cache memory.
  • RAM is available in various forms such as Synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM).
  • DRAM Synchronous RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • DDR SDRAM Double Data Rate SDRAM
  • ESDRAM Enhanced SDRAM
  • SLDRAM Synchronous Link DRAM
  • DRRAM Direct Rambus RAM
  • Storage devices of the disclosed aspects are intended to include, but are not limited to, these and other suitable types of memory.

Abstract

Provided are a method and a system for generating a finite state entropy coding table. The method comprises: obtaining frequencies of occurrence of characters in a data block to be coded; obtaining a quantity of states of a corresponding column on the basis of a preset maximum state value and the frequencies of occurrence, then forming an empty table of a finite state entropy coding table; on the basis of a row number of each space in the empty table and a frequency of occurrence of a corresponding character, obtaining an initial value of the space, thus obtaining an initial table; traversing the initial table to determine whether an initial value at an m-th row and an n-th column traversed is greater than an initial value at an (m-1)th row and the n-th column; if so, using the initial value at the m-th row and the n-th column as a temporary value thereof, and determining whether said initial value is repeated with a previous initial value; if not, using the temporary value thereof as a state value thereof; after traversing is completed, obtaining all state values, and determining whether a maximum state value exists; if so, generating a state table of the finite state entropy coding table on the basis of all the state values, thereby implementing hardware acceleration of finite state entropy coding.

Description

一种有限状态熵编码表的生成方法、系统、介质及设备A method, system, medium and device for generating a finite state entropy coding table
本申请要求在2021年09月22日提交中国专利局、申请号为202111107199.4、发明名称为“一种有限状态熵编码表的生成方法及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on September 22, 2021, with the application number 202111107199.4, and the title of the invention is "a method and system for generating a finite state entropy coding table", the entire content of which is by reference incorporated in this application.
技术领域technical field
本申请涉及数据压缩技术领域,尤其涉及一种有限状态熵编码表的生成方法及系统。The present application relates to the technical field of data compression, in particular to a method and system for generating a finite state entropy coding table.
背景技术Background technique
随着大数据时代的来临,在物联网、人工智能等特定应用领域,对海量数据处理的低时延性要求不断提高,无损数据压缩技术也越来越重要。无损数据压缩按压缩原理可分为基于数据统计的算法和基于字典的算法。其中,基于数据统计的算法包括香农-范诺编码、霍夫曼(Hufman)编码、算术编码、游程编码和有限状态熵编码(finite state entropy,FSE)等;基于字典的算法包括LZ77(Lempel-Ziv77)编码和LZ78(Lempel-Ziv 78)编码等。With the advent of the big data era, in specific application fields such as the Internet of Things and artificial intelligence, the low-latency requirements for massive data processing continue to increase, and lossless data compression technology is becoming more and more important. Lossless data compression can be divided into algorithm based on data statistics and algorithm based on dictionary according to the principle of compression. Among them, algorithms based on data statistics include Shannon-Fano coding, Huffman (Hufman) coding, arithmetic coding, run-length coding, and finite state entropy coding (finite state entropy, FSE), etc.; algorithms based on dictionaries include LZ77 (Lempel- Ziv77) encoding and LZ78 (Lempel-Ziv 78) encoding, etc.
为提高普适性,数据压缩方案中通常采用2种及以上的压缩算法进行混合压缩。快速无损压缩算法Zstd(Zstandard)就是一种由LZ77编码、Hufman编码和FSE组成的混合压缩算法。与其他压缩算法(如Deflate算法、Bzip2算法、Brotli算法)相比,Zstd具有更好的压缩性能。此外,Zstd属于开源算法,提供了22个压缩等级,用于权衡压缩速度与压缩率,在Linux内核、FreeBSD操作系统和AWS Redshift数据仓库等领域得到了广泛应用。In order to improve universality, data compression schemes usually use two or more compression algorithms for mixed compression. The fast lossless compression algorithm Zstd (Zstandard) is a hybrid compression algorithm composed of LZ77 encoding, Hufman encoding and FSE. Compared with other compression algorithms (such as Deflate algorithm, Bzip2 algorithm, Brotli algorithm), Zstd has better compression performance. In addition, Zstd is an open source algorithm that provides 22 compression levels for weighing compression speed and compression rate, and has been widely used in Linux kernel, FreeBSD operating system, and AWS Redshift data warehouse and other fields.
基于软件实现方式的无损压缩技术具有较高的灵活性、普适性和低成本等优点,但软件执行方式只能顺序执行,导致中央处理器(CPU)在处理海量数据时资源长时间被占用,使得压缩速度大幅降低,难以满足特定应用领域对海量数据实时压缩处理的需求。采用硬件实现是解决上述问题的有效途径,受益于硬件固有并行处理特点,可以达到提高传输速度、资源使用率和安全性的目的。The lossless compression technology based on software implementation has the advantages of high flexibility, universality and low cost, but the software execution method can only be executed sequentially, resulting in the central processing unit (CPU) being occupied for a long time when processing massive data. , so that the compression speed is greatly reduced, and it is difficult to meet the demand for real-time compression processing of massive data in specific application fields. Using hardware to implement is an effective way to solve the above problems. Benefiting from the inherent parallel processing characteristics of hardware, it can achieve the purpose of improving transmission speed, resource utilization and security.
对Zstd中3个主要组成部分:LZ77、FSE和Huffman进行统计分析,其压缩时间占比约为4:1:1,其中FSE占比虽然不大,但对Zstd的性能影响较大。由于采用了FSE, 因此Zstd比其他混合压缩算法具有更好的压缩性能。此外,针对LZ77和Huffman编码的硬件加速方案较为成熟,但FSE作为一种新型的压缩算法,既具有类似算数编码的精度,还具有Huffman编码的压缩速度,而且对于符号的重新编码可以精确到小数位,计算中不需要使用乘法和除法更新状态。因此,研究FSE的硬件加速架构对实现Zstd算法整体加速具有重要意义,也是满足特定应用领域需求的有效方法。The statistical analysis of the three main components in Zstd: LZ77, FSE and Huffman shows that the ratio of compression time is about 4:1:1. Although the proportion of FSE is not large, it has a greater impact on the performance of Zstd. Due to the use of FSE, Zstd has better compression performance than other hybrid compression algorithms. In addition, hardware acceleration schemes for LZ77 and Huffman coding are relatively mature, but FSE, as a new type of compression algorithm, has both the accuracy similar to arithmetic coding and the compression speed of Huffman coding, and the recoding of symbols can be accurate to decimals bit, multiplication and division do not need to be used in the computation to update the state. Therefore, researching the hardware acceleration architecture of FSE is of great significance to realize the overall acceleration of the Zstd algorithm, and it is also an effective method to meet the needs of specific application fields.
发明内容Contents of the invention
有鉴于此,本申请的目的在于提出一种有限状态熵编码表的生成方法及系统,用以解决现有技术中缺乏有限状态熵编码的硬件加速方案的问题。In view of this, the purpose of this application is to propose a method and system for generating a finite state entropy coding table, so as to solve the problem in the prior art that there is no hardware acceleration solution for finite state entropy coding.
基于上述目的,本申请提供了一种有限状态熵编码表的生成方法,包括以下步骤:Based on the above purpose, the application provides a method for generating a finite state entropy coding table, comprising the following steps:
基于待编码的数据块中的各字符在数据块中的数量占比得到各字符在数据块中的出现频率;The frequency of occurrence of each character in the data block is obtained based on the proportion of each character in the data block to be encoded;
基于预设最大状态值和各字符的出现频率得到有限状态熵编码表中各字符对应列的状态数量,并基于各字符对应列的状态数量及各字符的出现频率形成有限状态熵编码表的空表;Based on the preset maximum state value and the frequency of occurrence of each character, the state quantity of the corresponding column of each character in the finite state entropy coding table is obtained, and the space of the finite state entropy coding table is formed based on the state quantity of the corresponding column of each character and the frequency of occurrence of each character surface;
基于空表中每一空格所在的行数和对应的字符的出现频率得到各空格的初始值,并基于各空格初始值的填充得到有限状态熵编码表的初始表;The initial value of each blank is obtained based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character, and the initial table of the finite state entropy coding table is obtained based on the filling of the initial value of each blank;
遍历初始表,判断遍历到的第m行第n列的初始值是否大于第m-1行第n列的初始值;Traverse the initial table, and judge whether the initial value of the nth column of the mth row traversed is greater than the initial value of the nth column of the m-1th row;
响应于第m行第n列的初始值大于第m-1行第n列的初始值,将第m行第n列的初始值作为其暂时值,并判断第m行第n列的暂时值是否与排列在其之前的初始值重复;Responding to the fact that the initial value of the nth column of the mth row is greater than the initial value of the m-1th row's nth column, the initial value of the mth row's nth column is used as its temporary value, and the temporary value of the mth row's nth column is judged Is it repeated with the initial value arranged before it;
响应于第m行第n列的暂时值与排列在其之前的初始值未重复,将第m行第n列的暂时值作为其状态值;Responding to the fact that the temporary value of row m and column n does not overlap with the initial value arranged before it, the temporary value of row m and column n is used as its state value;
响应于遍历完成,得到有限状态熵编码表的所有状态值,并判断所有状态值中是否有最大状态值;In response to the completion of traversal, obtain all state values of the finite state entropy coding table, and judge whether there is a maximum state value among all state values;
响应于有最大状态值,基于所有状态值生成有限状态熵编码表的状态表。In response to having a maximum state value, a state table of finite state entropy encoding tables is generated based on all state values.
在一些实施例中,方法还包括:In some embodiments, the method also includes:
响应于第m行第n列的初始值小于等于第m-1行第n列的初始值,将第m-1行第n列的初始值增加预设增量值后的数值作为第m行第n列的暂时值。In response to the initial value of the nth column of the mth row being less than or equal to the initial value of the m-1th row of the nth column, the value after the initial value of the m-1th row of the nth column is increased by the preset increment value as the mth row Temporary value for column n.
在一些实施例中,方法还包括:In some embodiments, the method also includes:
响应于第m行第n列的暂时值与排列在其之前的初始值重复,将第m行第n列的 暂时值增加预设增量值作为更新后的暂时值,直到更新后的暂时值与排列在其之前的初始值不重复,将更新后的暂时值作为第m行第n列的状态值。In response to the repetition of the temporary value of row m and column n with the initial value arranged before it, increasing the temporary value of row m and column n by a preset increment value as the updated temporary value until the updated temporary value It is not repeated with the initial value arranged before it, and the updated temporary value is used as the state value of the mth row and the nth column.
在一些实施例中,方法还包括:In some embodiments, the method also includes:
响应于没有最大状态值,将所有状态值中最大的数值替换为最大状态值,以生成有限状态熵编码表的状态表。In response to no maximum state value, the largest numerical value among all state values is replaced by the maximum state value to generate a state table of the finite state entropy encoding table.
在一些实施例中,基于预设最大状态值和各字符的出现频率得到有限状态熵编码表中各字符对应列的状态数量包括:In some embodiments, based on the preset maximum state value and the frequency of occurrence of each character, the number of states in the columns corresponding to each character in the finite state entropy coding table includes:
将预设最大状态值与各字符的出现频率分别相乘,以分别得到有限状态熵编码表中各字符对应列的状态数量。The preset maximum state value is multiplied by the frequency of occurrence of each character to respectively obtain the state quantity of the column corresponding to each character in the finite state entropy coding table.
在一些实施例中,基于各字符对应列的状态数量及各字符的出现频率形成有限状态熵编码表的空表包括:In some embodiments, forming an empty table of the finite state entropy encoding table based on the number of states of each character's corresponding column and the frequency of occurrence of each character includes:
将各字符的状态数量分别作为有限状态熵编码表的对应列的空格的数量,并基于各字符的出现频率的大小对各列之间的顺序进行排列,以形成有限状态熵编码表的空表。The number of states of each character is used as the number of spaces in the corresponding column of the finite state entropy coding table, and the order between the columns is arranged based on the frequency of occurrence of each character to form an empty table of the finite state entropy coding table .
在一些实施例中,基于各字符的出现频率的大小对各列之间的顺序进行排列包括:In some embodiments, arranging the order of the columns based on the frequency of occurrence of each character includes:
按照各字符出现频率的由大到小的顺序对对应列进行从左到右排列。Arrange the corresponding columns from left to right according to the descending order of the occurrence frequency of each character.
在一些实施例中,基于空表中每一空格所在的行数和对应的字符的出现频率得到各空格的初始值包括:In some embodiments, obtaining the initial value of each blank based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character includes:
对空表中每一空格所在的行数与对应的字符的出现频率的比值进行向零取整,以得到各空格的初始值。The ratio of the number of rows where each space is located in the empty table to the frequency of occurrence of the corresponding character is rounded towards zero to obtain the initial value of each space.
在一些实施例中,遍历初始表包括:In some embodiments, traversing the initial table includes:
对初始表按照从左到右且从上到下的顺序进行遍历。The initial table is traversed in order from left to right and top to bottom.
在一些实施例中,m为大于等于2的整数,n为大于等于1的整数。In some embodiments, m is an integer greater than or equal to 2, and n is an integer greater than or equal to 1.
本申请的另一方面,还提供了一种有限状态熵编码表的生成系统,包括:In another aspect of the present application, a system for generating a finite state entropy coding table is also provided, including:
出现频率获得模块,配置用于基于待编码的数据块中的各字符在数据块中的数量占比得到各字符在数据块中的出现频率;The frequency of occurrence obtaining module is configured to obtain the frequency of occurrence of each character in the data block based on the proportion of each character in the data block to be encoded;
空表获得模块,配置用于基于预设最大状态值和各字符的出现频率得到有限状态熵编码表中各字符对应列的状态数量,并基于各字符对应列的状态数量及各字符的出现频率形成有限状态熵编码表的空表;Empty table obtaining module, configured to obtain the state quantity of each character corresponding column in the finite state entropy coding table based on the preset maximum state value and the frequency of occurrence of each character, and based on the state quantity of each character corresponding column and the frequency of occurrence of each character form an empty list of finite state entropy encoding tables;
初始表获得模块,配置用于基于空表中每一空格所在的行数和对应的字符的出现频率得到各空格的初始值,并基于各空格初始值的填充得到有限状态熵编码表的初始表;The initial table obtaining module is configured to obtain the initial value of each blank based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character, and obtain the initial table of the finite state entropy coding table based on the filling of the initial value of each blank ;
第一判断模块,配置用于遍历初始表,判断遍历到的第m行第n列的初始值是否大于第m-1行第n列的初始值;The first judging module is configured to traverse the initial table, and judge whether the initial value of the mth row and the nth column traversed is greater than the initial value of the m-1th row and the nth column;
第二判断模块,配置用于响应于第m行第n列的初始值大于第m-1行第n列的初始值,将第m行第n列的初始值作为其暂时值,并判断第m行第n列的暂时值是否与排列在其之前的初始值重复;The second judging module is configured to respond to the initial value of the nth column of the mth row being greater than the initial value of the m-1th row of the nth column, taking the initial value of the mth row of the nth column as its temporary value, and judging the first Whether the temporary value of column n in row m is repeated with the initial value arranged before it;
状态值模块,配置用于响应于第m行第n列的暂时值与排列在其之前的初始值未重复,将第m行第n列的暂时值作为其状态值;A state value module configured to use the temporary value of the nth row of the mth row as its state value in response to the fact that the temporary value of the mth row and the nth column is not repeated with the initial value arranged before it;
第三判断模块,配置用于响应于遍历完成,得到有限状态熵编码表的所有状态值,并判断所有状态值中是否有最大状态值;以及The third judging module is configured to obtain all the state values of the finite state entropy coding table in response to the completion of the traversal, and judge whether there is a maximum state value among all the state values; and
状态表生成模块,配置用于响应于有最大状态值,基于所有状态值生成有限状态熵编码表的状态表。A state table generating module configured to generate a state table of a finite state entropy encoding table based on all state values in response to having a maximum state value.
在一些实施例中,基于预设最大状态值和各字符的出现频率得到有限状态熵编码表中各字符对应列的状态数量包括:In some embodiments, based on the preset maximum state value and the frequency of occurrence of each character, the number of states in the columns corresponding to each character in the finite state entropy coding table includes:
将预设最大状态值与各字符的出现频率分别相乘,以分别得到有限状态熵编码表中各字符对应列的状态数量。The preset maximum state value is multiplied by the frequency of occurrence of each character to respectively obtain the state quantity of the column corresponding to each character in the finite state entropy coding table.
在一些实施例中,基于各字符对应列的状态数量及各字符的出现频率形成所述有限状态熵编码表的空表包括:In some embodiments, forming an empty table of the finite state entropy encoding table based on the number of states of each character's corresponding column and the frequency of occurrence of each character includes:
将各字符的状态数量分别作为所述有限状态熵编码表的对应列的空格的数量,并基于各字符的出现频率的大小对各列之间的顺序进行排列,以形成所述有限状态熵编码表的空表。Taking the number of states of each character as the number of spaces in the corresponding column of the finite state entropy coding table, and arranging the order between the columns based on the frequency of occurrence of each character to form the finite state entropy coding An empty list of tables.
在一些实施例中,基于各字符的出现频率的大小对各列之间的顺序进行排列包括:In some embodiments, arranging the order of the columns based on the frequency of occurrence of each character includes:
按照各字符出现频率的由大到小的顺序对对应列进行从左到右排列。Arrange the corresponding columns from left to right according to the descending order of the occurrence frequency of each character.
在一些实施例中,基于所述空表中每一空格所在的行数和对应的字符的出现频率得到各空格的初始值包括:In some embodiments, obtaining the initial value of each blank based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character includes:
对所述空表中每一空格所在的行数与对应的字符的出现频率的比值进行向零取整,以得到各空格的初始值。The ratio of the number of rows where each space is located in the empty table to the frequency of occurrence of the corresponding character is rounded towards zero to obtain the initial value of each space.
在一些实施例中,m为大于等于2的整数,n为大于等于1的整数。In some embodiments, m is an integer greater than or equal to 2, and n is an integer greater than or equal to 1.
本申请的又一方面,还提供了一种计算机可读存储介质,存储有计算机程序指令,该计算机程序指令被处理器执行时实现上述任意一项方法。In yet another aspect of the present application, a computer-readable storage medium is provided, which stores computer program instructions, and implements any one of the above-mentioned methods when the computer program instructions are executed by a processor.
本申请的再一方面,还提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该计算机程序被处理器执行时执行上述任意一项方法。In yet another aspect of the present application, a computer device is provided, including a memory and a processor, where a computer program is stored in the memory, and when the computer program is executed by the processor, any one of the above-mentioned methods is executed.
本申请至少具有以下有益技术效果:The application at least has the following beneficial technical effects:
1.本申请的有限状态熵编码表的生成方法,其数据的存储形式简单方便,只用使用一维数组的形式存储,无需使用类似霍夫曼树二叉树链表式存储,可以大大减少内存空间;1. The method for generating the finite state entropy coding table of the present application has a simple and convenient data storage form, which can be stored in the form of a one-dimensional array without using a linked list similar to the Huffman tree binary tree, which can greatly reduce the memory space;
2.本申请的有限状态熵编码表的生成方法在Zstd规范标准下适用于硬件实现的FSE压缩和解压,主要用到比较器和加法器,计算较简便,从而有效地减少硬件资源开销且提高硬件利用率;2. The generation method of the finite state entropy encoding table of the present application is applicable to the FSE compression and decompression implemented by hardware under the Zstd specification standard, mainly using a comparator and an adder, and the calculation is relatively simple, thereby effectively reducing hardware resource overhead and improving hardware utilization;
3.进一步提高了Zstd算法的压缩和解压的速度,满足了特定应用领域对压缩性能日益增长的需求。3. The speed of compression and decompression of the Zstd algorithm is further improved to meet the increasing demand for compression performance in specific application fields.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的实施例。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present application, and those skilled in the art can obtain other embodiments according to these drawings without creative efforts.
图1为根据本申请实施例提供的有限状态熵编码表的生成方法的示意图;FIG. 1 is a schematic diagram of a method for generating a finite state entropy coding table provided according to an embodiment of the present application;
图2为根据本申请实施例提供的待编码的数据块的示意图;FIG. 2 is a schematic diagram of a data block to be encoded according to an embodiment of the present application;
图3为根据本申请实施例提供的FSE Table初始表的示意图;Fig. 3 is the schematic diagram of the FSE Table initial table provided according to the embodiment of the present application;
图4为根据本申请实施例提供的FSE Table计算遍历过程的示意图;Fig. 4 is a schematic diagram of the FSE Table calculation traversal process provided according to the embodiment of the present application;
图5为根据本申请实施例提供的FSE Table迭代计算I的示意图;Fig. 5 is the schematic diagram of the FSE Table iterative calculation I provided according to the embodiment of the application;
图6为根据本申请实施例提供的FSE Table迭代计算II的示意图;Fig. 6 is the schematic diagram of the FSE Table iterative calculation II provided according to the embodiment of the application;
图7为根据本申请实施例提供的遍历完成的FSE Table的示意图;Fig. 7 is the schematic diagram of the FSE Table that traversal is completed according to the embodiment of the present application;
图8为根据本申请实施例提供的FSE Table状态表的示意图;Fig. 8 is the schematic diagram of the FSE Table state table provided according to the embodiment of the present application;
图9为根据本申请实施例提供的有限状态熵编码表的生成系统的示意图;FIG. 9 is a schematic diagram of a system for generating a finite state entropy coding table according to an embodiment of the present application;
图10为根据本申请实施例提供的实现有限状态熵编码表的生成方法的计算机可读存储介质的示意图;FIG. 10 is a schematic diagram of a computer-readable storage medium for implementing a method for generating a finite state entropy coding table according to an embodiment of the present application;
图11为根据本申请实施例提供的执行有限状态熵编码表的生成方法的计算机设备的硬件结构示意图。FIG. 11 is a schematic diagram of a hardware structure of a computer device implementing a method for generating a finite state entropy coding table according to an embodiment of the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本申请实施例进一步详细说明。In order to make the purpose, technical solution and advantages of the present application clearer, the embodiments of the present application will be further described in detail below in combination with specific embodiments and with reference to the accompanying drawings.
需要说明的是,本申请实施例中所有使用“第一”和“第二”的表述均是为了区分两个相同名称的非相同的实体或者非相同的参量,可见“第一”、“第二”仅为了表述的方便,不应理解为对本申请实施例的限定。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备固有的其他步骤或单元。It should be noted that all expressions using "first" and "second" in the embodiments of the present application are to distinguish between two non-identical entities or non-identical parameters with the same name. It can be seen that "first", "second "two" is only for the convenience of expression, and should not be understood as a limitation to the embodiment of the present application. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, of a process, method, system, product or other steps or elements inherent in a process, method, system, product, or device comprising a series of steps or elements.
基于上述目的,本申请实施例的第一个方面,提出了一种有限状态熵编码表的生成方法的实施例。图1示出的是本申请提供的有限状态熵编码表的生成方法的实施例的示意图。如图1所示,本申请实施例包括如下步骤:Based on the above purpose, the first aspect of the embodiments of the present application proposes an embodiment of a method for generating a finite state entropy coding table. FIG. 1 is a schematic diagram of an embodiment of a method for generating a finite state entropy coding table provided by the present application. As shown in Figure 1, the embodiment of the present application includes the following steps:
步骤S10、基于待编码的数据块中的各字符在数据块中的数量占比得到各字符在数据块中的出现频率;Step S10, obtain the frequency of occurrence of each character in the data block based on the proportion of each character in the data block to be encoded;
步骤S20、基于预设最大状态值和各字符的出现频率得到有限状态熵编码表中各字符对应列的状态数量,并基于各字符对应列的状态数量及各字符的出现频率形成有限状态熵编码表的空表;Step S20, based on the preset maximum state value and the frequency of occurrence of each character, the number of states in the column corresponding to each character in the finite state entropy coding table is obtained, and a finite state entropy code is formed based on the number of states in the column corresponding to each character and the frequency of appearance of each character empty list of tables;
步骤S30、基于空表中每一空格所在的行数和对应的字符的出现频率得到各空格的初始值,并基于各空格初始值的填充得到有限状态熵编码表的初始表;Step S30, obtain the initial value of each blank based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character, and obtain the initial table of the finite state entropy coding table based on filling the initial value of each blank;
步骤S40、遍历初始表,判断遍历到的第m行第n列的初始值是否大于第m-1行第n列的初始值;Step S40, traversing the initial table, and judging whether the initial value of the mth row and the nth column traversed is greater than the initial value of the m-1th row and the nth column;
步骤S50、响应于第m行第n列的初始值大于第m-1行第n列的初始值,将第m行第n列的初始值作为其暂时值,并判断第m行第n列的暂时值是否与排列在其之前的初始值重复;Step S50, in response to the fact that the initial value of the nth column of the mth row is greater than the initial value of the m-1th row of the nth column, the initial value of the mth row of the nth column is used as its temporary value, and the mth row of the nth column is judged Whether the temporary value of is a duplicate of the initial value arranged before it;
步骤S60、响应于第m行第n列的暂时值与排列在其之前的初始值未重复,将第m行第n列的暂时值作为其状态值;Step S60, in response to the fact that the temporary value of row m and column n does not overlap with the initial value arranged before it, take the temporary value of row m and column n as its state value;
步骤S70、响应于遍历完成,得到有限状态熵编码表的所有状态值,并判断所有状态值中是否有最大状态值;Step S70, in response to the completion of the traversal, obtain all state values of the finite state entropy coding table, and determine whether there is a maximum state value among all state values;
步骤S80、响应于有最大状态值,基于所有状态值生成有限状态熵编码表的状态表。Step S80, in response to having the largest state value, generate a state table of the finite state entropy coding table based on all state values.
本申请实施例中,m为大于等于2的整数,n为大于等于1的整数。In the embodiment of the present application, m is an integer greater than or equal to 2, and n is an integer greater than or equal to 1.
FSE属于非对称数字系统(asymmetric numeral systems,ANS)中tANS(table asymmetric numeral systems)的一种熵编码。现有研究主要针对tANS和ANS的重要组成部分uABS(uniform asymmetric binary systems),而鲜有对FSE硬件架构的研究。FSE belongs to an entropy encoding of tANS (table asymmetric numerical systems) in asymmetric numerical systems (asymmetric numerical systems, ANS). Existing research mainly focuses on tANS and uABS (uniform asymmetric binary systems), an important part of ANS, while there is little research on FSE hardware architecture.
本申请实施例的有限状态熵编码表的生成方法,其数据的存储形式简单方便,只用使用一维数组的形式存储,无需使用类似霍夫曼树二叉树链表式存储,可以大大减少内存空间;另外,本申请实施例的有限状态熵编码表的生成方法在Zstd(快速无损压缩算法)规范标准下适用于硬件实现的FSE(有限状态熵编码)压缩和解压,主要用到比较器和加法器,计算较简便,从而有效地减少硬件资源开销且提高硬件利用率;进一步提高了Zstd算法的压缩和解压的速度,满足了特定应用领域对压缩性能日益增长的需求。The method for generating the finite state entropy coding table of the embodiment of the present application has a simple and convenient data storage form, and only uses a one-dimensional array to store, without using a Huffman tree binary tree linked list storage, which can greatly reduce the memory space; In addition, the generation method of the finite state entropy encoding table in the embodiment of the present application is suitable for hardware-implemented FSE (finite state entropy encoding) compression and decompression under the Zstd (fast lossless compression algorithm) standard, mainly using comparators and adders , the calculation is relatively simple, thereby effectively reducing hardware resource overhead and improving hardware utilization; further improving the compression and decompression speed of the Zstd algorithm, and meeting the increasing demand for compression performance in specific application fields.
本申请实施例的有限状态熵编码表的生成方法涉及到的所有的计算及存储方式不仅可以硬件化,也可以为软件计算提高效率,多种可实现形式使其应用更加灵活。如若通过硬件的方式实现,即可成为针对网络数据存储推出的一个硬件加速技术,能够加速基于有限熵编码数据的压缩,有效降低服务器CPU(中央处理器)的负载。其可专注数据压缩加速,助力数据中心的性能提升。All calculations and storage methods involved in the generation method of the finite state entropy encoding table in the embodiment of the present application can not only be implemented as hardware, but also can improve the efficiency of software calculations. Various implementation forms make the application more flexible. If it is realized by hardware, it can become a hardware acceleration technology for network data storage, which can accelerate the compression of data based on limited entropy encoding and effectively reduce the load on the server CPU (central processing unit). It can focus on data compression acceleration to help improve the performance of data centers.
在一些实施例中,方法还包括:响应于第m行第n列的初始值小于等于第m-1行第n列的初始值,将第m-1行第n列的初始值增加预设增量值后的数值作为第m行第n列的暂时值。In some embodiments, the method further includes: increasing the initial value of the m-1th row and the nth column by a preset The value after the increment value is used as the temporary value of the mth row and the nth column.
在一可选实施例中,预设增量值为大于0的整数。In an optional embodiment, the preset increment value is an integer greater than 0.
在一些实施例中,方法还包括:响应于第m行第n列的暂时值与排列在其之前的初始值重复,将第m行第n列的暂时值增加预设增量值作为更新后的暂时值,直到更新后的暂时值与排列在其之前的初始值不重复,将更新后的暂时值作为第m行第n列的状态值。In some embodiments, the method further includes: in response to the temporary value of the mth row and the nth column being repeated with the initial value arranged before it, increasing the mth row mth column nth temporary value by a preset increment value as the updated until the updated temporary value does not overlap with the initial value arranged before it, and the updated temporary value is used as the state value of the mth row and the nth column.
上述实施例中,对遍历到的每一个空格的数值都会进行同列上一行数值的比较以及排列在之前的重复数值的查询,然后基于比较结果进行相应的操作以满足相应的条件,并基于查询结果进行相应的操作以满足相应的条件,然后再遍历下一个空格。In the above embodiment, the value of each space traversed will be compared with the value of the previous row in the same column and the query of the repeated value arranged before, and then the corresponding operation will be performed based on the comparison result to meet the corresponding conditions, and based on the query result Do the corresponding operation to meet the corresponding condition before traversing the next space.
在一些实施例中,方法还包括:响应于没有最大状态值,将所有状态值中最大的数值替换为最大状态值,以生成有限状态熵编码表的状态表。In some embodiments, the method further includes: in response to no maximum state value, replacing the largest numerical value among all state values with the maximum state value to generate a state table of the finite state entropy encoding table.
在一些实施例中,基于预设最大状态值和各字符的出现频率得到有限状态熵编码表中各字符对应列的状态数量包括:将预设最大状态值与各字符的出现频率分别相乘,以分别得到有限状态熵编码表中各字符对应列的状态数量。In some embodiments, obtaining the number of states in the column corresponding to each character in the finite state entropy coding table based on the preset maximum state value and the frequency of occurrence of each character includes: multiplying the preset maximum state value and the frequency of occurrence of each character, To respectively obtain the state quantity of the column corresponding to each character in the finite state entropy coding table.
在一些实施例中,基于各字符对应列的状态数量及各字符的出现频率形成有限状态熵编码表的空表包括:将各字符的状态数量分别作为有限状态熵编码表的对应列的空格的数量,并基于各字符的出现频率的大小对各列之间的顺序进行排列,以形成有限状态熵编码表的空表。In some embodiments, forming an empty table of the finite state entropy coding table based on the number of states of the corresponding column of each character and the frequency of occurrence of each character includes: using the number of states of each character as the space of the corresponding column of the finite state entropy coding table number, and arrange the order among the columns based on the frequency of occurrence of each character to form an empty table of the finite state entropy coding table.
在一些实施例中,基于各字符的出现频率的大小对各列之间的顺序进行排列包括: 按照各字符出现频率的由大到小的顺序对对应列进行从左到右排列。In some embodiments, arranging the order of the columns based on the frequency of occurrence of each character includes: arranging the corresponding columns from left to right in order of the frequency of occurrence of each character from large to small.
在一些实施例中,基于空表中每一空格所在的行数和对应的字符的出现频率得到各空格的初始值包括:对空表中每一空格所在的行数与对应的字符的出现频率的比值进行向零取整,以得到各空格的初始值。In some embodiments, obtaining the initial value of each blank based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character includes: calculating the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character The ratio of is rounded towards zero to get the initial value of each space.
在一些实施例中,遍历初始表包括:对初始表按照从左到右且从上到下的顺序进行遍历。In some embodiments, traversing the initial table includes: traversing the initial table in order from left to right and from top to bottom.
具体地,本申请一示例性实施例的有限状态熵编码表的生成方法如下:Specifically, the generation method of the finite state entropy coding table of an exemplary embodiment of the present application is as follows:
第一步:对如图2所示的待编码的数据块Text中的Symbol(字符)出现的频率进行统计并排序,即可得到下表:The first step: count and sort the frequency of occurrence of Symbol (character) in the data block Text to be encoded as shown in Figure 2, the following table can be obtained:
表1 待压缩编码数据字符频率统计表Table 1 Statistical table of character frequency of coded data to be compressed
字符character aa bb cc dd ee ff
出现次数The number of occurrences 55 99 1212 1313 1616 4545
出现频率Frequency of occurrence 0.050.05 0.090.09 0.120.12 0.130.13 0.160.16 0.450.45
第二步:FSE(Finite State Entropy,有限状态熵编码)的编码、解码的过程需要利用核心表FSE Table,其生成方式的首要任务为初始计算有限状态熵编码表中每个symbol对应列的状态个数:The second step: The encoding and decoding process of FSE (Finite State Entropy, finite state entropy encoding) needs to use the core table FSE Table, the primary task of its generation method is to initially calculate the state of each symbol corresponding column in the finite state entropy encoding table Number:
状态个数计算方法为最大状态数与每个symbol出现频率的乘积。这里取最大状态数为31(实际应用中状态数越大压缩效果越能接近极限压缩率)。The calculation method of the number of states is the product of the maximum number of states and the frequency of occurrence of each symbol. Here, the maximum number of states is taken as 31 (the larger the number of states in practical applications, the closer the compression effect will be to the limit compression rate).
即Symbol x在FSE Table中状态个数为:That is, the number of states of Symbol x in the FSE Table is:
Num stat_x=(Sum stat·P(S x)) round Num stat_x =(Sum stat ·P(S x )) round
那么Symbol a列的状态个数为:Then the number of states in the Symbol a column is:
Num stat_a=(Sum stat·P(S a)) round=(31*0.05) round=(1.55) round=1 Num stat_a =(Sum stat P(S a )) round =(31*0.05) round =(1.55) round =1
其他Symbol b、c、d、e和f列的状态表对应列的个数为:The number of corresponding columns in the state table of other Symbol b, c, d, e and f columns is:
Num stat_b=(Sum stat·P(S b)) round=(31*0.09) round=2 Num stat_b = (Sum stat P(S b )) round = (31*0.09) round = 2
Num stat_c=(Sum stat·P(S c)) round=(31*0.12) round=3 Num stat_c =(Sum stat P(S c )) round =(31*0.12) round =3
Num stat_d=(Sum stat·P(S d)) round=(31*0.13) round=4 Num stat_d = (Sum stat P(S d )) round = (31*0.13) round = 4
Num stat_e=(Sum stat·P(S e)) round=(31*0.16) round=4 Num stat_e = (Sum stat P(S e )) round = (31*0.16) round = 4
Num stat_f=(Sum stat·P(S f)) round=(31*0.45) round=13 Num stat_f =(Sum stat P(S f )) round =(31*0.45) round =13
第三步:根据上步计算FSE Table所得的各个Symbol所在列的状态数,以及各个 Symbol的出现频率的大小,可形成尺寸形状确定的空表。然后分别计算每一行空格里的数值,每一格的数值计算规则如下:Step 3: According to the number of states in the column where each Symbol is located in the FSE Table calculated in the previous step, and the frequency of occurrence of each Symbol, an empty table with a determined size and shape can be formed. Then calculate the value in the space of each row separately, and the calculation rules for the value of each cell are as follows:
Val Num_row,x=(Num_row/P(S f)) round Val Num_row,x = (Num_row/P(S f )) round
对于第一行,(1,f)位置上的元素=1/0.45=2.222…,向零取整为2;(1,e)位置上的值=1/0.16=6.25,向零取整为6,(1,d)=1/0.13=7.692…,向零取整为7,……。第二行(2,f)=2/0.45=4.4444,近似于4,……,这样就构建出了如图3所示的FSE Table初始表。For the first row, the element at position (1, f) = 1/0.45 = 2.222..., rounded to zero is 2; the value at position (1, e) = 1/0.16 = 6.25, rounded to zero is 6, (1,d)=1/0.13=7.692..., rounded towards zero to be 7,.... The second line (2, f)=2/0.45=4.4444, which is approximately 4, ..., thus constructing the FSE Table initial table as shown in Figure 3.
第四步:调整初始表中的元素的数值。熵编码的状态表需满足如下两个特性:(1)表中的每个值都是唯一的(即不存在重复);(2)每列都按照值从小到大排序。现基于遍历比较的方式对初始表进行调整,其遍历顺序为从左到右、从上到下,即如图4所示的FSE Table计算遍历过程示意图。Step 4: Adjust the values of the elements in the initial table. The entropy coded state table needs to satisfy the following two characteristics: (1) each value in the table is unique (that is, there is no duplication); (2) each column is sorted according to the value from small to large. Now adjust the initial table based on the traversal comparison method, and the traversal order is from left to right and from top to bottom, that is, the schematic diagram of the FSE Table calculation traversal process shown in Figure 4.
假设搜索到元素a m,n,即在表格位置为(m,n)的元素,先进行条件一的判断及处理,之后再进行条件二的判断及处理,其中条件二的判断应遍历所有a m,n搜寻前元素即 Assuming that the element a m,n is searched, that is, the element whose position is (m,n) in the table, the judgment and processing of condition 1 are performed first, and then the judgment and processing of condition 2 are performed, and the judgment of condition 2 should traverse all a m,n search the previous element namely
Figure PCTCN2022074614-appb-000001
Figure PCTCN2022074614-appb-000001
条件一判断及处理(假设预设增量值为1): Condition 1 judgment and processing (assuming the default increment value is 1):
Figure PCTCN2022074614-appb-000002
其中m≥2
like
Figure PCTCN2022074614-appb-000002
where m≥2
有a m,n≤a m-1,n have a m,n ≤a m-1,n
则a m,n=a m-1,n+1 Then a m,n = a m-1,n +1
条件二判断及处理(假设预设增量值为1):Judgment and processing of condition 2 (assuming that the default increment value is 1):
Figure PCTCN2022074614-appb-000003
其中
Figure PCTCN2022074614-appb-000004
like
Figure PCTCN2022074614-appb-000003
in
Figure PCTCN2022074614-appb-000004
Figure PCTCN2022074614-appb-000005
have
Figure PCTCN2022074614-appb-000005
Figure PCTCN2022074614-appb-000006
but
Figure PCTCN2022074614-appb-000006
此步骤应用在本示例性实施例中,如图5所示的FSE Table迭代计算I的示意图,当遍历到f列第三行的元素6,发现其比上一行的元素4大,即进行条件二的判断与处理。发现该f列第三行的元素6和e列第一行的元素6重复,将f列第三行的元素替换为7;之后,发现替换为7后与d列第一行的元素7重复,将f列第三行的元素替换为8;之后,发现替换为8后与c列第一行的元素8重复,将f列第三行的元素替换为9;发现替换为9后不再存在重复,则f列第三行的元素替换完毕。This step is applied in this exemplary embodiment, the schematic diagram of FSE Table iterative calculation I as shown in Figure 5, when traversing to the element 6 of the third row of the f column, it is found that it is larger than the element 4 of the previous row, that is, the condition is carried out 2. Judgment and handling. It is found that element 6 in the third row of column f is repeated with element 6 in the first row of column e, and the element in the third row of column f is replaced by 7; after that, it is found that element 7 in the first row of column d is replaced by 7 , replace the element in the third row of column f with 8; after that, it is found that the element 8 in the first row of column c is repeated after replacing it with 8, and the element in the third row of column f is replaced with 9; it is found that after replacing it with 9, it is no longer If there is duplication, the elements in the third row of column f are replaced.
如图6所示的FSE Table迭代计算II的示意图,在遍历到f列的第四行时,其值为8小于f列的第三行9,由于每一列必须呈递增的状态,则将第三行的值加1赋值给第 四行,即第四行的值替换为9+1=10;替换为10后遍历之前的元素,未发生重合,则替换完毕。The schematic diagram of FSE Table iterative calculation II shown in Figure 6, when traversing to the fourth row of column f, its value is 8 less than the third row 9 of column f, since each column must be in an increasing state, the first The value of the three lines plus 1 is assigned to the fourth line, that is, the value of the fourth line is replaced with 9+1=10; after replacing with 10, the previous elements are traversed, and if there is no overlap, the replacement is complete.
用此方法遍历操作表格的所有元素即可得到如图7所示的遍历完成的FSE Table示意图,其中加粗的字体为经过调整的数值。Use this method to traverse all the elements of the operation table to get the schematic diagram of the traversed FSE Table as shown in Figure 7, where the bold font is the adjusted value.
第五步:在FSE Table中安排最大状态值。此操作为将FSE Table中插入本例子中的最大状态值31,这个最大数值必须存在于FSE Table表中。观察目前生成的FSE Table中没有出现31,有最大值30,此时将30替换为31,即得到了如图8所示的FSE Table状态表的示意图。Step 5: Arrange the maximum status value in the FSE Table. This operation is to insert the maximum status value 31 in this example into the FSE Table, and this maximum value must exist in the FSE Table. Observe that 31 does not appear in the currently generated FSE Table, and there is a maximum value of 30. At this time, replace 30 with 31, and the schematic diagram of the FSE Table status table shown in Figure 8 is obtained.
本申请实施例的第二个方面,还提供了一种有限状态熵编码表的生成系统。图9示出的是本申请提供的有限状态熵编码表的生成系统的实施例的示意图。如图9所示,一种有限状态熵编码表的生成系统包括:出现频率获得模块10,配置用于基于待编码的数据块中的各字符在数据块中的数量占比得到各字符在数据块中的出现频率;空表获得模块20,配置用于基于预设最大状态值和各字符的出现频率得到有限状态熵编码表中各字符对应列的状态数量,并基于各字符对应列的状态数量及各字符的出现频率形成有限状态熵编码表的空表;初始表获得模块30,配置用于基于空表中每一空格所在的行数和对应的字符的出现频率得到各空格的初始值,并基于各空格初始值的填充得到有限状态熵编码表的初始表;第一判断模块40,配置用于遍历初始表,判断遍历到的第m行第n列的初始值是否大于第m-1行第n列的初始值;第二判断模块50,配置用于响应于第m行第n列的初始值大于第m-1行第n列的初始值,将第m行第n列的初始值作为其暂时值,并判断第m行第n列的暂时值是否与排列在其之前的初始值重复;状态值模块60,配置用于响应于第m行第n列的暂时值与排列在其之前的初始值未重复,将第m行第n列的暂时值作为其状态值;第三判断模块70,配置用于响应于遍历完成,得到有限状态熵编码表的所有状态值,并判断所有状态值中是否有最大状态值;以及状态表生成模块80,配置用于响应于有最大状态值,基于所有状态值生成有限状态熵编码表的状态表。In a second aspect of the embodiments of the present application, a system for generating a finite state entropy coding table is also provided. FIG. 9 is a schematic diagram of an embodiment of a system for generating a finite state entropy coding table provided by the present application. As shown in Figure 9, a generation system of a finite state entropy encoding table includes: an occurrence frequency obtaining module 10 configured to obtain the proportion of each character in the data block based on the number ratio of each character in the data block to be encoded. Frequency of occurrence in the block; Empty table acquisition module 20, configured to obtain the state quantity of each character corresponding column in the finite state entropy coding table based on the preset maximum state value and the frequency of occurrence of each character, and based on the state of each character corresponding column Quantity and the frequency of occurrence of each character form the empty table of finite state entropy coding table; Initial table obtains module 30, is configured to obtain the initial value of each space based on the number of rows where each space is located in the empty table and the frequency of occurrence of the corresponding character , and obtain the initial table of the finite state entropy coding table based on the filling of the initial value of each space; the first judging module 40 is configured to traverse the initial table, and judge whether the initial value of the mth row nth column traversed is greater than the mth − The initial value of the nth column of row 1; the second judging module 50, configured to respond to the initial value of the nth column of the mth row being greater than the initial value of the m-1th row of the nth column, the mth row of the nth column The initial value is used as its temporary value, and it is judged whether the temporary value of the mth row and the nth column is repeated with the initial value arranged before it; the state value module 60 is configured to respond to the mth row and the nth column's temporary value and arrangement The initial value before it is not repeated, and the temporary value of the mth row and the nth column is used as its state value; the third judging module 70 is configured to obtain all state values of the finite state entropy coding table in response to the completion of the traversal, and judging whether there is a maximum state value among all state values; and a state table generation module 80 configured to generate a state table of a finite state entropy encoding table based on all state values in response to the presence of a maximum state value.
本申请实施例的第三个方面,还提供了一种计算机可读存储介质,图10示出了根据本申请实施例提供的实现有限状态熵编码表的生成方法的计算机可读存储介质的示意图。如图10所示,计算机可读存储介质3存储有计算机程序指令31。该计算机程序指令31被处理器执行时实现上述任意一项实施例的方法。In the third aspect of the embodiment of the present application, a computer-readable storage medium is also provided. FIG. 10 shows a schematic diagram of a computer-readable storage medium for implementing a method for generating a finite state entropy encoding table according to an embodiment of the present application. . As shown in FIG. 10 , computer readable storage medium 3 stores computer program instructions 31 . When the computer program instructions 31 are executed by the processor, the method of any one of the above-mentioned embodiments is realized.
应当理解,在相互不冲突的情况下,以上针对根据本申请的有限状态熵编码表的生成方法阐述的所有实施方式、特征和优势同样地适用于根据本申请的有限状态熵编码表的生成系统和存储介质。It should be understood that all the embodiments, features and advantages described above for the generation method of the finite-state entropy coding table according to the present application are equally applicable to the generation system of the finite-state entropy coding table according to the present application without conflicting with each other. and storage media.
本申请实施例的第四个方面,还提供了一种计算机设备,包括如图11所示的存储 器402和处理器401,该存储器402中存储有计算机程序,该计算机程序被该处理器401执行时实现上述任意一项实施例的方法。The fourth aspect of the embodiment of the present application also provides a computer device, including a memory 402 and a processor 401 as shown in FIG. 11 , the memory 402 stores a computer program, and the computer program is executed by the processor 401 When implementing the method of any one of the above embodiments.
如图11所示,为本申请提供的执行有限状态熵编码表的生成方法的计算机设备的一个实施例的硬件结构示意图。以如图11所示的计算机设备为例,在该计算机设备中包括一个处理器401以及一个存储器402,并还可以包括:输入装置403和输出装置404。处理器401、存储器402、输入装置403和输出装置404可以通过总线或者其他方式连接,图11中以通过总线连接为例。输入装置403可接收输入的数字或字符信息,以及产生与有限状态熵编码表的生成系统的用户设置以及功能控制有关的键信号输入。输出装置404可包括显示屏等显示设备。As shown in FIG. 11 , it is a schematic diagram of a hardware structure of an embodiment of a computer device implementing a method for generating a finite state entropy coding table provided by the present application. Taking the computer equipment shown in FIG. 11 as an example, the computer equipment includes a processor 401 and a memory 402 , and may also include: an input device 403 and an output device 404 . The processor 401, the memory 402, the input device 403, and the output device 404 may be connected through a bus or in other ways, and connection through a bus is taken as an example in FIG. 11 . The input device 403 can receive input numbers or character information, and generate key signal input related to user settings and function control of the generation system of the finite state entropy coding table. The output device 404 may include a display device such as a display screen.
存储器402作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本申请实施例中的有限状态熵编码表的生成方法对应的程序指令/模块。存储器402可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储有限状态熵编码表的生成方法的使用所创建的数据等。此外,存储器402可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器402可选包括相对于处理器401远程设置的存储器,这些远程存储器可以通过网络连接至本地模块。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 402, as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs and modules, such as the generation of the finite state entropy coding table in the embodiment of the present application The program instruction/module to which the method corresponds. The memory 402 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store the data created by using the method for generating the finite state entropy coding table wait. In addition, the memory 402 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage devices. In some embodiments, the memory 402 may optionally include memory that is remotely located relative to the processor 401, and these remote memories may be connected to the local module through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
处理器401通过运行存储在存储器402中的非易失性软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例的有限状态熵编码表的生成方法。The processor 401 executes various functional applications and data processing of the server by running the non-volatile software programs, instructions and modules stored in the memory 402, that is, realizes the generation method of the finite state entropy encoding table in the above method embodiment.
最后需要说明的是,本文的计算机可读存储介质(例如,存储器)可以是易失性存储器或非易失性存储器,或者可以包括易失性存储器和非易失性存储器两者。作为例子而非限制性的,非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦写可编程ROM(EEPROM)或快闪存储器。易失性存储器可以包括随机存取存储器(RAM),该RAM可以充当外部高速缓存存储器。作为例子而非限制性的,RAM可以以多种形式获得,比如同步RAM(DRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据速率SDRAM(DDR SDRAM)、增强SDRAM(ESDRAM)、同步链路DRAM(SLDRAM)、以及直接Rambus RAM(DRRAM)。所公开的方面的存储设备意在包括但不限于这些和其它合适类型的存储器。Finally, it should be noted that the computer-readable storage medium (eg, memory) herein may be a volatile memory or a nonvolatile memory, or may include both volatile memory and nonvolatile memory. By way of example and not limitation, nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory memory. Volatile memory can include random access memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in various forms such as Synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). Storage devices of the disclosed aspects are intended to include, but are not limited to, these and other suitable types of memory.
本领域技术人员还将明白的是,结合这里的公开所描述的各种示例性逻辑块、模块、电路和算法步骤可以被实现为电子硬件、计算机软件或两者的组合。为了清楚地说明硬 件和软件的这种可互换性,已经就各种示意性组件、方块、模块、电路和步骤的功能对其进行了一般性的描述。这种功能是被实现为软件还是被实现为硬件取决于具体应用以及施加给整个系统的设计约束。本领域技术人员可以针对每种具体应用以各种方式来实现的功能,但是这种实现决定不应被解释为导致脱离本申请实施例公开的范围。Those of skill would also appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described generally in terms of their functionality. Whether such functionality is implemented as software or as hardware depends upon the particular application and design constraints imposed on the overall system. Those skilled in the art may implement the functions in various ways for each specific application, but such implementation decisions should not be interpreted as causing a departure from the scope disclosed in the embodiments of the present application.
以上是本申请公开的示例性实施例,但是应当注意,在不背离权利要求限定的本申请实施例公开的范围的前提下,可以进行多种改变和修改。根据这里描述的公开实施例的方法权利要求的功能、步骤和/或动作不需以任何特定顺序执行。此外,尽管本申请实施例公开的元素可以以个体形式描述或要求,但除非明确限制为单数,也可以理解为多个。The above are the exemplary embodiments disclosed in the present application, but it should be noted that various changes and modifications can be made without departing from the scope of the embodiments disclosed in the present application defined by the claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. In addition, although the elements disclosed in the embodiments of the present application may be described or required in an individual form, they may also be understood as plural unless explicitly limited to a singular number.
应当理解的是,在本文中使用的,除非上下文清楚地支持例外情况,单数形式“一个”旨在也包括复数形式。还应当理解的是,在本文中使用的“和/或”是指包括一个或者一个以上相关联地列出的项目的任意和所有可能组合。上述本申请实施例公开实施例序号仅仅为了描述,不代表实施例的优劣。It should be understood that as used herein, the singular form "a" and "an" are intended to include the plural forms as well, unless the context clearly supports an exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The serial numbers of the embodiments disclosed in the above-mentioned embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments.
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本申请实施例公开的范围(包括权利要求)被限于这些例子;在本申请实施例的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,并存在如上的本申请实施例的不同方面的许多其它变化,为了简明它们没有在细节中提供。因此,凡在本申请实施例的精神和原则之内,所做的任何省略、修改、等同替换、改进等,均应包含在本申请实施例的保护范围之内。Those of ordinary skill in the art should understand that: the discussion of any of the above embodiments is exemplary only, and is not intended to imply that the scope (including claims) disclosed by the embodiments of the present application is limited to these examples; under the idea of the embodiments of the present application , the technical features in the above embodiments or different embodiments can also be combined, and there are many other changes in different aspects of the above embodiments of the present application, which are not provided in details for the sake of brevity. Therefore, within the spirit and principle of the embodiments of the present application, any omissions, modifications, equivalent replacements, improvements, etc., shall be included in the protection scope of the embodiments of the present application.

Claims (18)

  1. 一种有限状态熵编码表的生成方法,其特征在于,包括以下步骤:A method for generating a finite state entropy coding table, comprising the following steps:
    基于待编码的数据块中的各字符在所述数据块中的数量占比得到各字符在所述数据块中的出现频率;The frequency of occurrence of each character in the data block is obtained based on the proportion of each character in the data block to be encoded in the data block;
    基于预设最大状态值和各字符的出现频率得到有限状态熵编码表中各字符对应列的状态数量,并基于各字符对应列的状态数量及各字符的出现频率形成所述有限状态熵编码表的空表;Based on the preset maximum state value and the frequency of occurrence of each character, the state quantity of the corresponding column of each character in the finite state entropy coding table is obtained, and the finite state entropy coding table is formed based on the state quantity of the corresponding column of each character and the frequency of occurrence of each character the empty list;
    基于所述空表中每一空格所在的行数和对应的字符的出现频率得到各空格的初始值,并基于各空格初始值的填充得到所述有限状态熵编码表的初始表;The initial value of each blank is obtained based on the number of rows where each blank is located in the empty table and the frequency of occurrence of the corresponding character, and the initial table of the finite state entropy coding table is obtained based on filling the initial value of each blank;
    遍历所述初始表,判断遍历到的第m行第n列的初始值是否大于第m-1行第n列的初始值;Traversing the initial table, judging whether the initial value of the nth column of the mth row traversed is greater than the initial value of the nth column of the m-1th row;
    响应于所述第m行第n列的初始值大于第m-1行第n列的初始值,将所述第m行第n列的初始值作为其暂时值,并判断所述第m行第n列的暂时值是否与排列在其之前的初始值重复;In response to the initial value of the n-th row of the m-th row being greater than the initial value of the m-1-th row of the n-th column, the initial value of the m-th row of the n-th column is used as its temporary value, and the m-th row is judged Whether the temporary value of the nth column is repeated with the initial value arranged before it;
    响应于所述第m行第n列的暂时值与排列在其之前的初始值未重复,将所述第m行第n列的暂时值作为其状态值;In response to the fact that the temporary value of the mth row and the nth column does not overlap with the initial value arranged before it, the temporary value of the mth row and the nth column is used as its state value;
    响应于遍历完成,得到所述有限状态熵编码表的所有状态值,并判断所述所有状态值中是否有所述最大状态值;In response to the completion of the traversal, obtain all state values of the finite state entropy coding table, and determine whether there is the maximum state value among all the state values;
    响应于有所述最大状态值,基于所述所有状态值生成所述有限状态熵编码表的状态表。In response to having the largest state value, a state table of the finite state entropy encoding table is generated based on all state values.
  2. 根据权利要求1所述的方法,其特征在于,还包括:The method according to claim 1, further comprising:
    响应于所述第m行第n列的初始值小于等于第m-1行第n列的初始值,将所述第m-1行第n列的初始值增加预设增量值后的数值作为所述第m行第n列的暂时值。In response to the initial value of the nth row of the mth column being less than or equal to the initial value of the m-1th row of the nth column, the initial value of the m-1th row of the nth column is increased by a value after a preset increment value as a temporary value for the mth row and nth column.
  3. 根据权利要求2所述的方法,其特征在于,还包括:The method according to claim 2, further comprising:
    响应于所述第m行第n列的暂时值与排列在其之前的初始值重复,将所述第m行第n列的暂时值增加所述预设增量值作为更新后的暂时值,直到更新后的暂时值与排列在其之前的初始值不重复,将所述更新后的暂时值作为所述第m行第n列的状态值。In response to the temporary value of the mth row and nth column being repeated with the initial value arranged before it, increasing the mth row and nth column’s temporary value by the preset incremental value as an updated temporary value, Until the updated temporary value does not overlap with the initial value arranged before it, the updated temporary value is used as the state value of the mth row and nth column.
  4. 根据权利要求1所述的方法,其特征在于,还包括:The method according to claim 1, further comprising:
    响应于没有所述最大状态值,将所述所有状态值中最大的数值替换为所述最大状态值,以生成所述有限状态熵编码表的状态表。In response to the absence of the maximum state value, replacing the largest value among all the state values with the maximum state value to generate a state table of the finite state entropy encoding table.
  5. 根据权利要求1所述的方法,其特征在于,基于预设最大状态值和各字符的出现频率得到有限状态熵编码表中各字符对应列的状态数量包括:The method according to claim 1, wherein obtaining the state quantity of each character corresponding column in the finite state entropy coding table based on the preset maximum state value and the frequency of occurrence of each character comprises:
    将预设最大状态值与各字符的出现频率分别相乘,以分别得到有限状态熵编码表中各字符对应列的状态数量。The preset maximum state value is multiplied by the frequency of occurrence of each character to respectively obtain the state quantity of the column corresponding to each character in the finite state entropy coding table.
  6. 根据权利要求1所述的方法,其特征在于,基于各字符对应列的状态数量及各字符的出现频率形成所述有限状态熵编码表的空表包括:The method according to claim 1, wherein, forming the empty table of the finite state entropy encoding table based on the state quantity of each character corresponding column and the frequency of occurrence of each character includes:
    将各字符的状态数量分别作为所述有限状态熵编码表的对应列的空格的数量,并基于各字符的出现频率的大小对各列之间的顺序进行排列,以形成所述有限状态熵编码表的空表。Taking the number of states of each character as the number of spaces in the corresponding column of the finite state entropy coding table, and arranging the order between the columns based on the frequency of occurrence of each character to form the finite state entropy coding An empty list of tables.
  7. 根据权利要求6所述的方法,其特征在于,基于各字符的出现频率的大小对各列之间的顺序进行排列包括:The method according to claim 6, wherein arranging the order between the columns based on the frequency of occurrence of each character comprises:
    按照各字符出现频率的由大到小的顺序对对应列进行从左到右排列。Arrange the corresponding columns from left to right according to the descending order of the occurrence frequency of each character.
  8. 根据权利要求1所述的方法,其特征在于,基于所述空表中每一空格所在的行数和对应的字符的出现频率得到各空格的初始值包括:The method according to claim 1, wherein obtaining the initial value of each blank based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character comprises:
    对所述空表中每一空格所在的行数与对应的字符的出现频率的比值进行向零取整,以得到各空格的初始值。The ratio of the number of rows where each space is located in the empty table to the frequency of occurrence of the corresponding character is rounded towards zero to obtain the initial value of each space.
  9. 根据权利要求1所述的方法,其特征在于,遍历所述初始表包括:The method according to claim 1, wherein traversing the initial table comprises:
    对所述初始表按照从左到右且从上到下的顺序进行遍历。The initial table is traversed from left to right and from top to bottom.
  10. 根据权利要求1所述的方法,其特征在于,m为大于等于2的整数,n为大于等于1的整数。The method according to claim 1, wherein m is an integer greater than or equal to 2, and n is an integer greater than or equal to 1.
  11. 一种有限状态熵编码表的生成系统,其特征在于,包括:A generation system of a finite state entropy coding table is characterized in that it comprises:
    出现频率获得模块,配置用于基于待编码的数据块中的各字符在所述数据块中的数量占比得到各字符在所述数据块中的出现频率;The frequency of occurrence obtaining module is configured to obtain the frequency of occurrence of each character in the data block based on the proportion of each character in the data block to be encoded in the data block;
    空表获得模块,配置用于基于预设最大状态值和各字符的出现频率得到有限状态熵编码表中各字符对应列的状态数量,并基于各字符对应列的状态数量及各字符的出现频率形成所述有限状态熵编码表的空表;Empty table obtaining module, configured to obtain the state quantity of each character corresponding column in the finite state entropy coding table based on the preset maximum state value and the frequency of occurrence of each character, and based on the state quantity of each character corresponding column and the frequency of occurrence of each character forming an empty table of said finite state entropy encoding table;
    初始表获得模块,配置用于基于所述空表中每一空格所在的行数和对应的字符的出现频率得到各空格的初始值,并基于各空格初始值的填充得到所述有限状态熵编码表的初始表;The initial table obtaining module is configured to obtain the initial value of each blank based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character, and obtain the finite state entropy encoding based on the filling of the initial value of each blank the initial table of tables;
    第一判断模块,配置用于遍历所述初始表,判断遍历到的第m行第n列的初始值是否大于第m-1行第n列的初始值;The first judging module is configured to traverse the initial table, and judge whether the initial value of the mth row and the nth column traversed is greater than the initial value of the m-1th row and the nth column;
    第二判断模块,配置用于响应于所述第m行第n列的初始值大于第m-1行第n列的初始值,将所述第m行第n列的初始值作为其暂时值,并判断所述第m行第n列的暂时值是否与排列在其之前的初始值重复;The second judging module is configured to take the initial value of the mth row and nth column as its temporary value in response to the initial value of the mth row and nth column being greater than the m-1th row and nth column's initial value , and judge whether the temporary value of the mth row and the nth column is repeated with the initial value arranged before it;
    状态值模块,配置用于响应于所述第m行第n列的暂时值与排列在其之前的初始值未重复,将所述第m行第n列的暂时值作为其状态值;A state value module configured to use the temporary value of the mth row and nth column as its state value in response to the fact that the temporary value of the mth row and nth column does not overlap with the initial value arranged before it;
    第三判断模块,配置用于响应于遍历完成,得到所述有限状态熵编码表的所有状态值,并判断所述所有状态值中是否有所述最大状态值;以及A third judging module, configured to obtain all state values of the finite state entropy coding table in response to the completion of the traversal, and judge whether there is the maximum state value among all the state values; and
    状态表生成模块,配置用于响应于有所述最大状态值,基于所述所有状态值生成所述有限状态熵编码表的状态表。A state table generation module, configured to generate a state table of the finite state entropy encoding table based on all state values in response to having the maximum state value.
  12. 根据权利要求11所述的系统,其特征在于,基于预设最大状态值和各字符的出现频率得到有限状态熵编码表中各字符对应列的状态数量包括:The system according to claim 11, wherein the state quantity obtained based on the preset maximum state value and the frequency of occurrence of each character in the corresponding column of each character in the finite state entropy coding table includes:
    将预设最大状态值与各字符的出现频率分别相乘,以分别得到有限状态熵编码表中各字符对应列的状态数量。The preset maximum state value is multiplied by the frequency of occurrence of each character to respectively obtain the state quantity of the column corresponding to each character in the finite state entropy coding table.
  13. 根据权利要求11所述的系统,其特征在于,基于各字符对应列的状态数量及各字符的出现频率形成所述有限状态熵编码表的空表包括:The system according to claim 11, wherein, forming an empty table of the finite state entropy encoding table based on the state quantity of each character corresponding column and the frequency of occurrence of each character includes:
    将各字符的状态数量分别作为所述有限状态熵编码表的对应列的空格的数量,并基于各字符的出现频率的大小对各列之间的顺序进行排列,以形成所述有限状态熵编码表的空表。Taking the number of states of each character as the number of spaces in the corresponding column of the finite state entropy coding table, and arranging the order between the columns based on the frequency of occurrence of each character to form the finite state entropy coding An empty list of tables.
  14. 根据权利要求13所述的系统,其特征在于,基于各字符的出现频率的大小对各列之间的顺序进行排列包括:The system according to claim 13, wherein arranging the order of each column based on the frequency of occurrence of each character comprises:
    按照各字符出现频率的由大到小的顺序对对应列进行从左到右排列。Arrange the corresponding columns from left to right according to the descending order of the occurrence frequency of each character.
  15. 根据权利要求11所述的系统,其特征在于,基于所述空表中每一空格所在的行数和对应的字符的出现频率得到各空格的初始值包括:The system according to claim 11, wherein the initial value of each blank is obtained based on the number of rows of each blank in the empty table and the frequency of occurrence of the corresponding character:
    对所述空表中每一空格所在的行数与对应的字符的出现频率的比值进行向零取整,以得到各空格的初始值。The ratio of the number of rows where each space is located in the empty table to the frequency of occurrence of the corresponding character is rounded towards zero to obtain the initial value of each space.
  16. 根据权利要求11所述的方法,其特征在于,m为大于等于2的整数,n为大于等于1的整数。The method according to claim 11, wherein m is an integer greater than or equal to 2, and n is an integer greater than or equal to 1.
  17. 一种计算机可读存储介质,存储有计算机程序指令,所述计算机程序指令被处理器执行时实现如权利要求1至10中任一项所述的方法。A computer-readable storage medium storing computer program instructions, the computer program instructions implement the method according to any one of claims 1 to 10 when executed by a processor.
  18. 一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该计算机程序被处理器执行时执行如权利要求1至10中任一项所述的方法。A computer device, comprising a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the method according to any one of claims 1 to 10 is executed.
PCT/CN2022/074614 2021-09-22 2022-01-28 Method and system for generating finite state entropy coding table, medium, and device WO2023045204A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111107199.4A CN113572479B (en) 2021-09-22 2021-09-22 Method and system for generating finite state entropy coding table
CN202111107199.4 2021-09-22

Publications (1)

Publication Number Publication Date
WO2023045204A1 true WO2023045204A1 (en) 2023-03-30

Family

ID=78173917

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/074614 WO2023045204A1 (en) 2021-09-22 2022-01-28 Method and system for generating finite state entropy coding table, medium, and device

Country Status (2)

Country Link
CN (1) CN113572479B (en)
WO (1) WO2023045204A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116933734A (en) * 2023-09-15 2023-10-24 山东济矿鲁能煤电股份有限公司阳城煤矿 Intelligent diagnosis method for cutter faults of shield machine
CN117171399A (en) * 2023-11-02 2023-12-05 吉林省有继科技有限公司 New energy data optimized storage method based on cloud platform

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113572479B (en) * 2021-09-22 2021-12-21 苏州浪潮智能科技有限公司 Method and system for generating finite state entropy coding table
CN114513210B (en) * 2022-04-20 2022-08-02 苏州浪潮智能科技有限公司 State selection method, system, storage medium and device for finite state entropy coding
CN115441878A (en) * 2022-08-05 2022-12-06 海飞科(南京)信息技术有限公司 FSE code table rapid establishing method for text compression
CN117155405A (en) * 2023-08-09 2023-12-01 海飞科(南京)信息技术有限公司 Method for quickly establishing tANS coding and decoding conversion table based on gradient descent

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120242517A1 (en) * 2011-03-25 2012-09-27 Samsung Electronics Co., Ltd. Methods of compressing data in storage device
CN110602498A (en) * 2019-09-20 2019-12-20 唐驰鹏 Self-adaptive finite state entropy coding method
US20200326910A1 (en) * 2020-06-23 2020-10-15 Intel Corporation Normalized probability determination for character encoding
CN111787325A (en) * 2020-07-03 2020-10-16 北京博雅慧视智能技术研究院有限公司 Entropy encoder and encoding method thereof
CN112953550A (en) * 2021-03-23 2021-06-11 上海复佳信息科技有限公司 Data compression method, electronic device and storage medium
CN113572479A (en) * 2021-09-22 2021-10-29 苏州浪潮智能科技有限公司 Method and system for generating finite state entropy coding table

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107565971B (en) * 2017-09-07 2020-04-14 华为技术有限公司 Data compression method and device
US11483009B2 (en) * 2019-05-08 2022-10-25 Intel Corporation Self-checking compression

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120242517A1 (en) * 2011-03-25 2012-09-27 Samsung Electronics Co., Ltd. Methods of compressing data in storage device
CN110602498A (en) * 2019-09-20 2019-12-20 唐驰鹏 Self-adaptive finite state entropy coding method
US20200326910A1 (en) * 2020-06-23 2020-10-15 Intel Corporation Normalized probability determination for character encoding
CN111787325A (en) * 2020-07-03 2020-10-16 北京博雅慧视智能技术研究院有限公司 Entropy encoder and encoding method thereof
CN112953550A (en) * 2021-03-23 2021-06-11 上海复佳信息科技有限公司 Data compression method, electronic device and storage medium
CN113572479A (en) * 2021-09-22 2021-10-29 苏州浪潮智能科技有限公司 Method and system for generating finite state entropy coding table

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邢琳 (XING LIN): "有限状态熵编码的硬件加速设计与实现 (Hardware Acceleration Design and Its Implementation for Finite State Entropy)", 中国优秀硕士学位论文全文数据库信息科技辑 (INFORMATION & TECHNOLOGY, CHINA MASTER'S THESES FULL-TEXT DATABASE), no. 2, 15 February 2021 (2021-02-15), XP009544763 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116933734A (en) * 2023-09-15 2023-10-24 山东济矿鲁能煤电股份有限公司阳城煤矿 Intelligent diagnosis method for cutter faults of shield machine
CN116933734B (en) * 2023-09-15 2023-12-19 山东济矿鲁能煤电股份有限公司阳城煤矿 Intelligent diagnosis method for cutter faults of shield machine
CN117171399A (en) * 2023-11-02 2023-12-05 吉林省有继科技有限公司 New energy data optimized storage method based on cloud platform
CN117171399B (en) * 2023-11-02 2024-02-20 云图数据科技(郑州)有限公司 New energy data optimized storage method based on cloud platform

Also Published As

Publication number Publication date
CN113572479B (en) 2021-12-21
CN113572479A (en) 2021-10-29

Similar Documents

Publication Publication Date Title
WO2023045204A1 (en) Method and system for generating finite state entropy coding table, medium, and device
US11755565B2 (en) Hybrid column store providing both paged and memory-resident configurations
CN112292816A (en) Processing core data compression and storage system
KR102535450B1 (en) Data storage method and apparatus, and computer device and storage medium thereof
CN108628898B (en) Method, device and equipment for data storage
Müller et al. Retrieval and perfect hashing using fingerprinting
CN113300715B (en) Data processing method, device, hardware compression equipment and medium
Funasaka et al. Adaptive loss‐less data compression method optimized for GPU decompression
WO2023202149A1 (en) State selection method and system for finite state entropy encoding, and storage medium and device
Zou et al. Performance optimization for relative-error-bounded lossy compression on scientific data
US11693876B2 (en) Efficient shared bulk loading into optimized storage
US11562241B2 (en) Data output method, data acquisition method, device, and electronic apparatus
US20230318621A1 (en) Compression And Decompression In Hardware For Data Processing
US9916335B2 (en) Row, table, and index decompression
CN115438114B (en) Storage format conversion method, system, device, electronic equipment and storage medium
CN115811317A (en) Stream processing method and system based on self-adaptive non-decompression direct calculation
WO2018082245A1 (en) Raster data aggregation method and apparatus, raster data decoupling method and apparatus, and system
CN115905168A (en) Adaptive compression method and compression apparatus, computer device, storage medium
CN112000707B (en) Variable-length sequence matching method, database access method and device
US11397712B2 (en) Rapid and robust predicate evaluation
US9054730B2 (en) Method and system for LZW based decompression
Culpepper et al. Revisiting bounded context block‐sorting transformations
CN111275184B (en) Method, system, device and storage medium for realizing neural network compression
CN108683424B (en) Full-parallel bidirectional recursion pipeline LDPC encoder and method
Xuan et al. The improved variable length counting bloom filter based on buffer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22871271

Country of ref document: EP

Kind code of ref document: A1