CN115801019B

CN115801019B - Parallel acceleration LZ77 decoding method and device and electronic equipment

Info

Publication number: CN115801019B
Application number: CN202310077789.XA
Authority: CN
Inventors: 潘远星; 原顺
Original assignee: Guangzhou Jiangxinchuang Technology Co ltd
Current assignee: Guangzhou Jiangxinchuang Technology Co ltd
Priority date: 2023-02-08
Filing date: 2023-02-08
Publication date: 2023-05-12
Anticipated expiration: 2043-02-08
Also published as: CN115801019A

Abstract

The invention discloses a parallel acceleration LZ77 decoding method and device and electronic equipment. Wherein the method comprises the following steps: carrying out recognition judgment processing and classification marking processing on the input characteristic data to obtain characteristic data with type marks; caching a plurality of adjacent characteristic data into the same address through a first-in first-out cache management mode to obtain cache characteristic data with type marks; acquiring feature data to be decoded from the cache feature data; and reading and writing data according to the type mark of the characteristic data to be decoded and the prior decoding processing condition so as to perform disordered decoding processing on the characteristic data to be decoded. The invention can decode the feature data to be decoded by optimizing the cache structure and the cache management mode of the feature data and adopting the disordered processing mode, thereby improving the processing efficiency in a single decoding processing period and reducing the queuing delay when the data is read, written and decoded, so as to effectively improve the overall decoding efficiency.

Description

Parallel acceleration LZ77 decoding method and device and electronic equipment

Technical Field

The invention relates to the technical field of decompression, in particular to a parallel acceleration LZ77 decoding method and device and electronic equipment.

Background

The LZ77 algorithm is widely used as a compression algorithm with good balance between performance and compression rate. In the decoding process of the LZ77 compression algorithm, reference data needs to be read from the stored history data to perform a decoding task. LZ77 hardware decoding therefore requires that recent history data be saved for querying as dictionary references. The processing of a feature data is regarded as a decoding task, and different decoding tasks have different data reading requirements (such as fetch length, fetch position, etc.), which may make the processing time of different decoding tasks very different.

In the related art, a pipeline mode is adopted to process the decoding task, that is, after the n decoding task is processed, the n+1th decoding task is processed next. Even if the data of the n+1th decoding task is ready, the n+1th decoding task needs to be processed and can be started to be processed after the n decoding task is processed, so that the decoding efficiency is low.

Disclosure of Invention

The following is a summary of the subject matter described in detail herein.

The embodiment of the invention provides a parallel acceleration LZ77 decoding method, a device and electronic equipment thereof, which can decode the characteristic data to be decoded by optimizing the cache structure and the cache management mode of the characteristic data and adopting a disordered processing mode, thereby improving the processing efficiency in a single decoding processing period and reducing the queuing delay when data reading and writing and decoding processing are carried out, so that the overall decoding efficiency is effectively improved.

In a first aspect, an embodiment of the present invention provides a parallel acceleration LZ77 decoding method, including:

carrying out recognition judgment processing and classification marking processing on the input characteristic data to obtain the characteristic data with type marks;

caching a plurality of adjacent characteristic data into the same address through a first-in first-out cache management mode to obtain cache characteristic data with the type mark;

acquiring feature data to be decoded from the cache feature data;

and reading and writing data according to the type mark of the feature data to be decoded and the prior decoding processing condition so as to perform disordered decoding processing on the feature data to be decoded.

In a second aspect, an embodiment of the present invention provides a parallel acceleration LZ77 decoding device, including:

the data identification and classification module is used for carrying out identification and judgment processing and classification marking processing on the input characteristic data to obtain the characteristic data with the type mark;

the characteristic data caching module is used for caching a plurality of adjacent characteristic data into the same address in a first-in first-out caching management mode to obtain cached characteristic data with the type mark;

The feature data processing module is used for acquiring feature data to be decoded from the cache feature data; and reading and writing data according to the type mark of the feature data to be decoded and the prior decoding processing condition so as to perform disordered decoding processing on the feature data to be decoded.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the parallel accelerated LZ77 decoding method as described in the first aspect when the computer program is executed.

In a fourth aspect, embodiments of the present invention provide a computer readable storage medium storing computer executable instructions for implementing the parallel accelerated LZ77 decoding method according to the first aspect when executed by a processor.

The embodiment of the invention comprises the following steps: the input characteristic data is subjected to identification judgment processing and classification marking processing by utilizing a parallel acceleration LZ77 decoding device to obtain the characteristic data with type marks so as to conveniently determine a data reading mode according to the type marks; and then caching a plurality of adjacent characteristic data into the same address through a first-in first-out cache management mode to obtain cache characteristic data with type marks, and enabling the plurality of accumulated adjacent characteristic data to be obtained and subjected to decoding processing in parallel in subsequent processing through optimizing the cache structure and the cache management mode of the characteristic data, so that the processing efficiency in a single decoding processing period is improved, and the overall decoding efficiency is effectively improved; and then acquiring the characteristic data to be decoded from the cache characteristic data, and performing data reading and writing according to the type mark of the characteristic data to be decoded and the prior decoding processing condition so as to perform disordered decoding processing on the characteristic data to be decoded, thereby being beneficial to reducing queuing delay during data reading and writing and decoding processing, and further effectively improving the overall decoding efficiency.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

FIG. 1 is a schematic diagram of a system architecture for performing a parallel accelerated LZ77 decoding method according to one embodiment of the present invention;

FIG. 2 is a schematic diagram of a cache structure of a feature data cache module according to an embodiment of the present invention;

FIG. 3 is a flow chart of a parallel acceleration LZ77 decoding method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of the operation of the present invention for out-of-order processing of different types of feature data to be decoded;

FIG. 5 is a flowchart illustrating a specific method of step S340 in FIG. 3 according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating a specific method of step S340 in FIG. 3 in the case where the type tag is the second tag according to another embodiment of the present invention;

FIG. 7 is a flowchart illustrating a specific method of step S340 in FIG. 3 in the case where the type tag is the third tag according to another embodiment of the present invention;

FIG. 8 is a flowchart illustrating a specific method of step S340 in FIG. 3 in the case where the type flag is a fourth flag according to another embodiment of the present invention;

FIG. 9 is a schematic diagram of a parallel acceleration LZ77 decoding device according to an embodiment of the present invention;

fig. 10 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent.

It should be noted that although a logical order is illustrated in the flowchart in the description of the present invention, in some cases, the steps illustrated or described may be performed in an order different from that in the flowchart. In the description of the present invention, a plurality means one or more, and a plurality means two or more. The description of "first" and "second" is used for the purpose of distinguishing between technical features only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

First, several terms involved in the present invention are explained:

LZ77 compression algorithm: the LZ77 compression algorithm is a dictionary, "sliding window" based lossless compression algorithm. Recent history data needs to be saved in LZ77 hardware decoding for querying as dictionary references.

Absolute data: the LZ77 compression algorithm is a type of characteristic data obtained in the decoding process, can be represented by a plain (original text), and can be directly output as a decoding result.

Relative data: the method is another type of characteristic data obtained in the LZ77 compression algorithm decoding process, is used for recording the relation between new data and historical data obtained through decoding, and is represented by two values of distance and length, and the decoded real data can be obtained by reading corresponding long and short data (called reference data hereinafter) from the stored historical data (corresponding to a dictionary) through the data distance and the data length.

The end mark, another type of characteristic data obtained in the decoding process of the LZ77 compression algorithm, may be denoted by end, and is used to indicate that the decoding is ended.

Dynamic random access memory (Dynamic Random Access Memory, DRAM), is a common system memory. The DRAM can only hold data for a short time. Therefore, the DRAM needs to be refreshed at a certain period by the memory control circuit to maintain the data retention.

Static Random-Access Memory (SRAM) is a common system Memory. The data of the SRAM does not need a refresh process, and the data is not lost during power-up. SRAM costs are high.

Dictionary data for storing recent history data during LZ77 hardware decoding to facilitate subsequent provision of reference data that can be referenced for querying. In hardware design, there are three schemes for storing dictionary data. Firstly, dictionary data are stored in SRAM, so that the SRAM can be accessed rapidly to read reference data during decoding, and the disadvantage is high cost of the SRAM; the second is to store dictionary data in DRAM, the advantage is that DRAM cost is low, the disadvantage is that the DRAM needs to be accessed by indirect access to read reference data when decoding, and the access time is long; the third scheme is to store the farther dictionary data in the DRAM and the nearer dictionary data in the SRAM in consideration of the cost and access speed, and it is understood that the larger the data stored in the SRAM, the smaller the probability of the decoding process accessing the DRAM, and thus the faster the decoding speed, with a fixed access maximum data distance (the maximum data distance of lz77 in a typical application such as deflate compression is 32K). Therefore, in view of saving hardware cost and considering decoding efficiency, hardware decoding generally stores near dictionary data in SRAM and far dictionary data in DRAM.

Processing of a feature data is regarded as a decoding task, and different decoding tasks have different data reading requirements, wherein the data reading requirements refer to: whether or not the need exists for fetching from memory, fetch length, fetch location, etc., can vary widely in processing time from one decoding task to another. In the related art, a pipeline mode is adopted to process the decoding task, that is, after the n decoding task is processed, the n+1th decoding task is processed next. Even if the data of the n+1th decoding task is ready, the n+1th decoding task needs to be processed and can be started to be processed after the n decoding task is processed, so that the decoding efficiency is low.

Based on this, the invention provides a parallel acceleration LZ77 decoding method, a parallel acceleration LZ77 decoding device, an electronic device and a computer readable storage medium, firstly, by utilizing the parallel acceleration LZ77 decoding device to perform identification judgment processing and classification marking processing on input characteristic data, the characteristic data with type marks is obtained, so that the mode of data reading is determined according to the type marks; and then caching a plurality of adjacent characteristic data into the same address through a first-in first-out cache management mode to obtain cache characteristic data with type marks, and enabling the plurality of accumulated adjacent characteristic data to be obtained and subjected to decoding processing in parallel in subsequent processing through optimizing the cache structure and the cache management mode of the characteristic data, so that the processing efficiency in a single decoding processing period is improved, and the overall decoding efficiency is effectively improved; and then, acquiring the characteristic data to be decoded from the cache characteristic data, and reading and writing the data according to the type mark of the characteristic data to be decoded and the prior decoding processing condition, so as to perform disordered decoding processing on the characteristic data to be decoded, thereby being beneficial to reducing queuing delay during data reading and writing and decoding processing, further effectively improving the overall decoding efficiency, and further improving the processing efficiency in a single decoding processing period and reducing queuing delay during data reading and writing and decoding processing by optimizing the cache structure and cache management mode of the characteristic data and adopting the disordered processing mode to decode the characteristic data to be decoded.

Embodiments of the present invention will be further described below with reference to the accompanying drawings.

As shown in fig. 1, the system frame includes: a data identification classification module 110, a feature data caching module 120, a feature data processing module 130, a first caching module 140, a first data handling module 150, a second caching module 160, a second data handling module 170, and a dynamic random access memory 180. The data identification and classification module 110, the feature data buffer module 120 and the feature data processing module 130 are sequentially in communication connection, the first buffer module 140 is in communication connection with the first data handling module 150, the second buffer module 160 is in communication connection with the second data handling module 170, the first buffer module 140 and the first data handling module 150, the second buffer module 160 and the second data handling module 170 are parallel arranged between the feature data processing module 130 and the dynamic random access memory 180, and the feature data processing module 130 is respectively in communication connection with the first data handling module 150 and the second data handling module 170.

The data identification and classification module 110 is configured to perform identification and judgment processing and classification and marking processing on the input feature data to obtain feature data with type marks, so as to determine a data reading mode according to the type marks. Specifically, the data recognition and classification module 110 receives feature data sequentially input one by one in disorder, performs preprocessing on the feature data, the preprocessing includes recognition judgment processing and classification marking processing, and outputs the preprocessed feature data in parallel to the feature data buffer module 120.

The feature data caching module 120 is configured to cache a plurality of adjacent feature data to the same address in a first-in first-out cache management manner, obtain cache feature data with a type tag, optimize a cache structure and a cache management manner of the feature data, enable a plurality of stacked adjacent feature data to be acquired in subsequent processing and perform decoding processing in parallel, and improve processing efficiency in a single decoding processing period.

It can be understood that the buffer size of the feature data buffer module 120 may be set to 8×69 bits, or may be set to other values according to the actual data buffer requirement, and the buffer size of the feature data buffer module 120 is not particularly limited in the present invention.

As an example, as shown in fig. 2, the size of the cache space of the feature data cache module 120 is 8×69 bits, where bits 68-67 are used to store type tags; bits 66-64 are used to store the number of feature data, and 0 to 7 can represent eight cases; the remaining space is then used to store the characteristic data. Wherein the type tag includes: 0,1,2,3, four type labels are used to represent four types of feature data. Specifically, the first flag 0 indicates that the feature data is real, the second flag 1 indicates that the feature data is distance and length and the reference data to be read is located in the second buffer module 160, the third flag 2 indicates that the feature data is distance and length and the reference data to be read is located in the dynamic random access memory 180, and the fourth flag 3 indicates that the feature data is end. Thus, in the feature data cache module 120, at most eight adjacent entities (8 bits) can be stored at the same address, or two pairs of distance (16 bits) and length (9 bits) can be stored, or one end tag (9 bits) can be stored.

It will be appreciated that in the case where no storage of the structure is established, the address5 in fig. 2 can store only one real even if a FIFO is used, and the feature data processing module 130 can process the real only one by one. After the feature data is stored according to the optimized structure according to the scheme of the embodiment of the invention, the feature data processing module 130 can rapidly process a plurality of stacked adjacent interfaces, so as to improve the processing efficiency of the feature data processing module 130. For example, when there are N consecutive slots, the feature data processing module 130 may process the N slots in at most two decoding processing periods, without N decoding processing periods, thereby improving the processing efficiency of a single decoding processing period.

The feature data processing module 130 is configured to obtain feature data to be decoded from the feature data buffer module 120, and read and write data according to the type tag of the feature data to be decoded and a previous decoding processing condition, so as to perform out-of-order decoding processing on the feature data to be decoded. Specifically, after the feature data processing module 130 obtains the feature data to be decoded from the feature data cache module 120, a data read-write mode is determined according to the type mark of the feature data to be decoded, and the timing of data read-write is determined according to the previous decoding processing condition, so that the out-of-order decoding processing is realized, and the queuing delay during the data read-write and decoding processing is reduced, thereby further effectively improving the overall decoding efficiency. After the decoding result is obtained, the decoding result is written into the second buffer module 160. The determining the data read-write mode according to the type mark of the feature data to be decoded comprises the following steps: if the feature data is a real, directly writing the feature data into the second buffer module 160; if the feature data is distance and length and the reference data to be read is located in the second buffer module 160, the reference data is read from the second buffer module 160 and then written back to the second buffer module 160; if the feature data are distance and length and the reference data to be read is located in the dynamic random access memory 180, corresponding reference data is read from the dynamic random access memory 180 through the first data handling module 150 and the first buffer module 140, and then the reference data is written into the second buffer module 160; if the feature data is end, the decoding is finished.

The first buffer module 140 is configured to buffer the reference data, and the first buffer module 140 adopts SRAM.

The first data handling module 150 is configured to interface the first buffer module 140 and the dynamic random access memory 180, so as to support reading data in the dynamic random access memory 180. Specifically, the first data handling module 150 responds to the read data request sent by the feature data processing module 130, reads corresponding reference data from the dynamic random access memory 180, stores the corresponding reference data in the first buffer module 140, and then sends a first feedback signal to the feature data processing module 130, so that the feature data processing module 130 responds to the first feedback signal, and writes the reference data to the second buffer module 160 after reading the data from the first buffer module 140 without occupying the write interface of the second buffer module 160 for a decoding task.

The second buffer module 160 is configured to buffer the reference data, and the second buffer module 160 adopts SRAM. It is understood that the cache size of the SRAM is 1024×64 bits, and the cache size of the SRAM may be set to other values according to the data cache requirement, which is not particularly limited in the present invention.

The second data handling module 170 is configured to interface the second buffer module 160 and the dynamic random access memory 180 to support writing data into the dynamic random access memory 180. Specifically, the second data handling module 170 reads data from the second buffer module 160 in response to the data writing request sent by the feature data processing module 130, writes the data into the dynamic random access memory 180, and then sends a second feedback signal to the feature data processing module 130, so that the feature data processing module 130 responds to the second feedback signal to end decoding or continue decoding.

And the dynamic random access memory 180 is used for storing the reference data and storing the decoding result after the decoding processing.

Note that, the DRAM mentioned later refers to the dynamic random access memory 180.

In an embodiment, multiple parallel groups of the first buffer module 140 and the first data handling module 150 may be further disposed between the feature data processing module 130 and the dynamic random access memory 180, so that if adjacent reference data to be read currently is not related to the previous data related to the DRAM access task and the decoding task, multiple DRAM access tasks may be performed in parallel, so as to further reduce the influence of the access task related to the DRAM on the overall decoding efficiency and improve the decoding efficiency. That is, the embodiment of the present invention can improve the efficiency of indirectly accessing the DRAM to acquire the dictionary data by providing the plurality of sets of the first buffer module 140 and the first data handling module 150 in parallel between the feature data processing module 130 and the dynamic random access memory 180.

The parallel acceleration LZ77 decoding method is executed through the system framework shown in the figure 1, and the hardware cost is saved by adopting a smaller SRAM to store near dictionary data; and the processing efficiency in a single decoding processing period is improved, and the queuing delay during data reading and writing and decoding processing is reduced by optimizing the cache structure and the cache management mode of the characteristic data and adopting the disordered processing mode to decode the characteristic data, so that the overall decoding efficiency is effectively improved. In a specific decoding test scene, the decoding efficiency after combining the two optimization measures is improved by nearly 50% compared with the decoding efficiency before optimization.

It will be appreciated by persons skilled in the art that the system architecture shown in the figures is not limiting of the embodiments of the invention and may include more or fewer components than shown, or certain components may be combined, or a different arrangement of components.

The system embodiments described above are merely illustrative, in that the units illustrated as separate components may or may not be physically separate, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

It will be understood by those skilled in the art that the system architecture and the application scenario described in the embodiments of the present invention are for more clearly describing the technical solution of the embodiments of the present invention, and are not limited to the technical solution provided in the embodiments of the present invention, and those skilled in the art can know that, with the evolution of the system architecture and the appearance of the new application scenario, the technical solution provided in the embodiments of the present invention is equally applicable to similar technical problems.

Based on the above system configuration, various embodiments of the parallel acceleration LZ77 decoding method of the present invention are presented below.

In a first aspect, as shown in fig. 3, the parallel acceleration LZ77 decoding method can be applied in the system framework shown in fig. 1, and the parallel acceleration LZ77 decoding method may include, but is not limited to, steps S310 to S340.

Step S310: and carrying out recognition judgment processing and classification marking processing on the input characteristic data to obtain the characteristic data with the type mark.

In the step, the input characteristic data is subjected to identification judgment processing and classification marking processing to obtain the characteristic data with the type mark, so that the mode of data reading is determined according to the type mark. Wherein the input characteristic data includes: absolute data, relative data, and end markers. The type tag includes: a first mark, a second mark, a third mark, and a fourth mark; specifically, the first flag is 0, and is used for marking the characteristic data as absolute data; the second mark is 1, which is used for marking the characteristic data as relative data and the reference data to be read is positioned in the second buffer module; the third mark is 2, which is used for marking the characteristic data as relative data and the reference data to be read is positioned in the dynamic random access memory; and a fourth marker 3 for marking the characteristic data as an end marker.

In one embodiment, step S310 is further described, and may include, but is not limited to, the following steps:

firstly, receiving characteristic data which are sequentially input one by one in disorder;

marking the characteristic data by a first mark under the condition that the identification judgment characteristic data is absolute data;

when the identification judgment feature data are relative data and the reference data which need to be read for decoding the relative data are located in the second buffer module, marking the feature data through a second mark, wherein the reference data are used for decoding the relative data;

marking the characteristic data by a third mark under the condition that the characteristic data is identified and judged to be relative data and the reference data which is required to be read for decoding the relative data is positioned in the dynamic random access memory;

when the identification judgment feature data is the end flag, the feature data is marked by the fourth flag.

It should be noted that, the data to be decoded is subjected to discrimination processing to obtain the type of the characteristic data, and then the reference data to be read by the relative data is further discriminated whether in the dynamic random access memory or the second buffer module.

It will be appreciated that the length of each piece of characteristic data is known, for example, one real is 1 byte of data, the relative data can be represented by distance (shown in simplified form by dist) and length (shown in simplified form by len), and a pair of length and distance can be decoded in combination to obtain length bytes of data. Thus, by statistical processing, it is possible to know how many bytes of data were previously processed in total. When the length and distance are combined, it is known from the distance how far to trace back the data, and whether the distance exceeds the range of the second cache module (the cache size is fixed), if so, it means that at least a part of the data is in the DRAM. Therefore, calculation can be performed during preprocessing, the storage position of the reference data to be read is judged through distance, and the position of the reference data to be read by combining length and distance is marked.

Specifically, under the condition that the distance value is smaller than the buffer memory space threshold value of the second buffer memory module, determining that the reference data to be read is located in the second buffer memory module, and marking the distance value as dist0, wherein dist0 is used for indicating that the reference data is located in the second buffer memory module; and under the condition that the distance value is larger than the buffer memory space threshold value of the second buffer memory module, determining that at least part of the reference data to be read is positioned in the DRAM, and marking the distance value as dist1, wherein dist1 is used for indicating that the reference data relates to the dynamic random access memory, and the corresponding reference data is required to be read from the dynamic random access memory.

Step S320: and caching a plurality of adjacent characteristic data into the same address through a first-in first-out cache management mode to obtain the cache characteristic data with the type mark.

In the step, a plurality of adjacent characteristic data are cached to the same address through a first-in first-out cache management mode, so that cache characteristic data with type marks are obtained, the cache structure and the cache management mode of the characteristic data are optimized, a plurality of piled adjacent characteristic data can be obtained in subsequent processing to perform decoding processing in parallel, and the processing efficiency in a single decoding processing period is improved.

Step S330: and obtaining the feature data to be decoded from the cache feature data.

In the step, the feature data to be decoded is obtained from the cache feature data, so that a plurality of stacked adjacent feature data can be processed quickly, the processing efficiency in a single decoding processing period is improved, and the overall decoding efficiency is improved. For example, when N consecutive absolute data are stored at the same address, at most two decoding processing cycles are required to process N absolute data, and N decoding processing cycles are not required to process N absolute data one by one.

Step S340: and reading and writing data according to the type mark of the characteristic data to be decoded and the prior decoding processing condition so as to perform disordered decoding processing on the characteristic data to be decoded.

In the step, a data read-write mode is determined according to the type mark of the characteristic data to be decoded, and the time for carrying out data read-write is determined according to the prior decoding processing condition, so that the disordered decoding processing is realized, the disordered decoding processing is carried out on the characteristic data to be decoded, the queuing delay during the data read-write and decoding processing is reduced, and the overall decoding efficiency is further effectively improved.

This step is further described with reference to FIG. 4, where relative data (distance and length) is represented in FIG. 4 by "dist+len" in a simplified manner, where dist0 represents the reference data at the second cache module; dist1 indicates that the reference data relates to the DRAM, corresponding reference data needs to be read from the DRAM, if the next decoding task is irrelevant to the data of the current decoding task, the decoding task can be skipped to process the next feature data, if the next decoding task needs to refer to the data of the previous decoding task (the scene probability is smaller), waiting is needed, and otherwise, decoding can be continued. Specifically, as shown in task 2 in fig. 4, the incompletion of task 2 may continue to process other decoding tasks until the data of task 2 is ready to be completed before the insert writing. In addition, as shown in tasks 6-13 of fig. 4, assuming that the bit width of the second buffer module is set to 64 bits, eight clock cycles are originally required to write into the second buffer module one by one, and considering the situation that data is not 8 bytes aligned, at most two clock cycles are required to write into the second buffer module. Therefore, the decoding efficiency of the whole LZ77 decoding is improved by the out-of-order processing mode.

In this step, determining the data read-write mode according to the type flag of the feature data to be decoded includes: if the characteristic data is the real, directly writing the characteristic data into the second cache module; if the characteristic data are distance and length, the reference data are read back from the dictionary data to obtain real decoding data, if the distance is dist0, the reference data are read from the second buffer module and then written back to the second buffer module, and if the distance is dist1, the corresponding reference data are read from the dynamic random access memory through the first data carrying module and then written into the first buffer module; if the feature data is end, the decoding is finished.

It should be emphasized that although the embodiment of the present invention employs an out-of-order decoding process to improve decoding efficiency, the accuracy and reliability of the decoding result are not affected. It will be appreciated that, because the number of slots or length values for each decoding task, and the write address of the current second buffer module are known, the position of the data decoded by the current decoding task in the uncompressed data stream can be calculated. For the decoding task which needs to be fetched from the DRAM, the position where the decoding result decoded by the decoding task is stored is recorded, and whether the needed reference data is related to the data of the previous decoding task which is still fetched from the DRAM can be obtained according to the distance value of the current decoding task. If the decoding result is irrelevant, even if the prior decoding task is not complete and the prior decoding task is irrelevant to the reference data to be read by the current decoding task, the accuracy and the reliability of the decoding result can be ensured as long as enough space is reserved in the second buffer module to store the decoding result to be decoded in the prior decoding task and the data splicing of the decoding results among different decoding tasks is finished.

As an example, further explanation: it is known whether the required reference data is related to the data of the preceding decoding task which is still being slowly fetched from the DRAM by calculating a distance value identifying the current decoding task.

Assuming that the reference data to be read from the DRAM by the nth decoding task is finally stored in 10000 th to 10063 th bytes (in decimal count), taking 64 bytes in total; assuming that the n+1th to n+8th decoding tasks are all real, the locations to be stored must be 10064 th to 10071 th bytes, requiring 8 bytes in total; let n+9th decoding task distance=10, length=8, meaning that a distance of 10 bytes is traced back, 8 bytes of data are taken, i.e. the positions from 10062 to 10069 bytes are copied to 10072 to 10079 bytes. Obviously, the n+9 decoding task fetch relates to the data of the n decoding task, that is, the reference data needed at this time is related to the reference data of the decoding task fetched from the DRAM that was still in progress. However, the reference data of the nth decoding task may not be retrieved from the DRAM, and only the nth decoding task can be waited for to finish from the DRAM and written to the second buffer module, and then the n+9 decoding task is processed. In the case that it is determined that the required reference data is not related to the reference data of the previous decoding task fetched from the DRAM, the n+1th, n+2th, and n+3rd decoding task data may be processed out of order without waiting for the n-th decoding task to finish, thereby improving decoding efficiency.

Through steps S310 to S340, in the embodiment of the present invention, the buffer structure and the buffer management manner of the feature data are optimized, and the feature data to be decoded is decoded by adopting an out-of-order processing manner, so that the processing efficiency in a single decoding processing period is improved, and the queuing delay during data reading and writing and decoding processing is reduced, thereby effectively improving the overall decoding efficiency. In a specific decoding test scene, the decoding efficiency after combining the two optimization measures is improved by nearly 50% compared with the decoding efficiency before optimization.

In one embodiment, step S340 is further described with reference to fig. 5, and step S340 may include, but is not limited to, steps S510 to S530.

Step S510: in case the type flag is the first flag, it is determined that the feature data to be decoded is absolute data.

Step S520: inquiring the occupation condition of the write interface of the second buffer module in the prior decoding processing condition.

Step S530: and under the condition that no decoding task occupies a write interface of the second buffer module, the absolute data is written into the second buffer module as a decoding result.

Through steps S510 to S530, when the absolute data is processed, if no decoding task occupies the write interface of the second buffer module, the absolute data of the current address is directly written into the second buffer module as a decoding result. Under the condition that N absolute data exist at the same address at present, the N absolute data exist at the same address and are written into the second buffer module as decoding results, so that decoding efficiency can be improved.

It will be appreciated that when processing the relative data, it is necessary to read back the reference data from the dictionary data to obtain a true decoding result. At this time, it is necessary to determine whether the storage location of the reference data to be read back is in the second buffer module or the dynamic random access memory according to distance and length. The decoding flow of processing relative data in various scenarios is further described below in conjunction with fig. 6 and 7.

In one embodiment, step S340 is further described with reference to fig. 6, and step S340 may include, but is not limited to, steps S610 to S640.

Step S610: and under the condition that the type mark is a second mark, confirming that the feature data to be decoded is relative data and the reference data to be read is positioned in the second cache module.

Step S620: and inquiring the proceeding condition of a fetch task in the prior decoding processing condition, wherein the fetch task represents the processing procedure of reading the reference data from the dynamic random access memory.

Step S630: and inquiring the occupation condition of the write interface of the second cache module under the condition that no fetch task is carried out or the ongoing fetch task is irrelevant to the reference data which needs to be read currently.

Step S640: and under the condition that the decoding task occupies the write interface of the second buffer module, after the reference data is read from the second buffer module, the reference data is written back to the second buffer module according to the current write address.

Through steps S610 to S640, in the case where the relative data is processed and the reference data to be read is located in the second buffer module, the processing condition of the access task in the previous decoding processing condition is queried first, and in the case where no access task is performed, or in the case where there is an ongoing access task but the ongoing access task is not related to the reference data to be read currently, the occupation condition of the write interface of the second buffer module is queried, and then in the case where no decoding task occupies the write interface of the second buffer module, the reference data is read and written from the second buffer module so as to facilitate the decoding processing of the relative data. If the current fetch task is in progress and is related to the reference data which needs to be read currently, or the decoding task occupies the write interface of the second buffer module, the data read-write can be continued after the end of the previous decoding task, so that the occurrence probability of the condition that the decoding task conflicts and the decoding efficiency is deteriorated is reduced. Thus, under the condition that decoding tasks are not in conflict, out-of-order processing is adopted instead of sequential processing so as to improve decoding efficiency.

In one embodiment, step S640 is further described. In the relative data, the distance and length can know which position in the memory (for storing the decoded data) has the same length of data, and the data can be directly copied and pasted to the current address. Specifically, under the condition that no decoding task occupies a write interface of the second cache module, reading out reference data with length from a certain address of the second cache module according to distance, and writing the reference data into a current write address of the second cache module.

In one embodiment, step S340 is further described with reference to fig. 7, and step S340 may include, but is not limited to, steps S710 to S750.

Step S710: in the case that the type flag is the third flag, it is confirmed that the feature data to be decoded is relative data and the reference data to be read is located in the dynamic random access memory.

Step S720: and inquiring the proceeding condition of a fetch task in the prior decoding processing condition, wherein the fetch task represents the processing procedure of reading the reference data from the dynamic random access memory.

Step S730: and under the condition that no fetching task is performed, sending a read data request to control the first data handling module to write the reference data read from the dynamic random access memory into the first cache module.

Step S740: a first feedback signal is received in response to the read data request.

Step S750: and in response to the first feedback signal, under the condition that the decoding task occupies a write interface of the second buffer module, reading the reference data from the first buffer module and writing the reference data into the second buffer module.

Through steps S710 to S750, in the case that the relative data is processed and the reference data to be read is located in the dynamic random access memory, firstly, the processing condition of the fetch task in the previous decoding processing condition is queried, and in the case that the fetch task is not processed, a read data request is sent to control the first data handling module to write the reference data read from the dynamic random access memory into the first buffer module, and then, in the case that the write interface of the second buffer module is occupied by the non-decoding task, the reference data is read from the first buffer module and written into the second buffer module in response to the first feedback signal, so as to facilitate decoding processing of the relative data. If the current access task is in progress, the data reading request is sent after the current access task is finished; if the decoding task occupies the write interface of the second buffer module, the decoding task is waited to finish and then is responded to the first feedback signal to be processed. The occurrence probability of the situation that decoding tasks collide and deteriorate decoding efficiency is reduced. Thus, under the condition that decoding tasks are not in conflict, out-of-order processing is adopted instead of sequential processing so as to improve decoding efficiency.

In an embodiment, the parallel acceleration LZ77 decoding method further includes: and simultaneously performing a plurality of fetching tasks by arranging a plurality of groups of first buffer modules and first data carrying modules in parallel so as to perform decoding processing on a plurality of feature data to be decoded in parallel. In this way, if adjacent decoding tasks are not related to the previous access tasks related to the DRAM and the data of the decoding tasks, a plurality of DRAM access tasks can be performed in parallel, so that the influence of the access tasks related to the DRAM on the whole decoding efficiency is further reduced, and the decoding efficiency is improved. That is, the embodiment of the invention can improve the efficiency of indirectly accessing the DRAM to acquire the dictionary data by arranging the parallel multiple groups of the first buffer memory modules and the first data carrying modules between the characteristic data processing modules and the dynamic random access memory.

In one embodiment, it is understood that the buffer size of the second buffer module is fixed, for example 1024×64 bits, and can store 1024×8=8192 bytes at most. After the second buffer module accumulates certain data, the second data handling module copies a part of data backup to the DRAM. Thus, the second cache module always stores the most recent 8192 bytes of data, while the DRAM may have all of the decoded data stored. When all the decoded data are stored, the second data handling module is indicated to backup all the data of the second buffer module to the DRAM. By calculation before fetch, it may be found that a portion of the tail data is beyond the data range of the DRAM, and the portion of the tail data is not copied to the DRAM in the second buffer module, where the reference data to be read spans the DRAM and the second buffer module. I.e. when the length of the data to be read is relatively long, and the storable range of the second cache module has been exceeded, it is necessary to start the fetch from the DRAM location.

For such a situation, the second data handling module is firstly waited to handle the current pen of reference data to the DRAM, then when no DRAM fetch task is in progress, a read data request is sent, otherwise, a new read data request is required to be started after the completion of the existing DRAM fetch task, so as to control the first data handling module to respond to the read data request, store the corresponding reference data read from the DRAM to the first cache module, and then return a first feedback signal responding to the read data request; and responding to the first feedback signal, and if no decoding task occupies a write interface of the second buffer module, reading the reference data from the first buffer module and writing the reference data into the second buffer module.

In another embodiment, when the second data handling module is not in time for handling, the second data handling module waits for the second data handling module to completely handle the data required by the previous fetch task from the second buffer module to the DRAM, and then reads the reference data from the DRAM.

Step S340 is further described in conjunction with fig. 8, and step S340 may include, but is not limited to, steps S810 to S850.

Step S810: in the case where the type flag is the fourth flag, the feature data to be decoded is confirmed as an end flag.

Step S820: the progress of the fetch task in the previous decoding processing case is queried.

Step S830: and under the condition that no fetching task is performed, sending a data writing request to control the second data handling module to read the reference data from the second cache module and write the reference data into the dynamic random access memory.

Step S840: a second feedback signal is received in response to the write data request.

Step S850: and responding to the second feedback signal, and ending the decoding processing of the characteristic data to be decoded.

Through steps S810 to S850, in the case of processing the end flag, the decoding process of the feature data to be decoded is ended, and different data reading processes can be performed according to different types of marks, thereby improving the processing efficiency.

It will be appreciated that the storage size of the second cache module is fixed, for example 1024×64 bits, storing a maximum of 1024×8=8192 bytes. When the data stored in the second buffer module reaches a certain amount, the characteristic data processing module sends a data writing request to the second data handling module so that the second data handling module writes the data in the second buffer module into the dynamic random access memory. And then, receiving a second feedback signal which is sent by the second data handling module and responds to the data writing request, and ending the decoding processing of the characteristic data to be decoded currently in response to the second feedback signal.

It will be appreciated that decoding may continue or may end after the current decoding task is completed.

Second aspect, referring to fig. 9, a parallel acceleration LZ77 decoding apparatus 900 includes: a data identification classification module 110, a feature data caching module 120, and a feature data processing module 130.

The data identification and classification module 110 is configured to perform identification and judgment processing and classification marking processing on the input feature data, so as to obtain feature data with type marks.

The feature data caching module 120 is configured to cache a plurality of adjacent feature data to the same address in a first-in first-out cache management manner, so as to obtain cached feature data with a type tag.

The feature data processing module 130 is configured to obtain feature data to be decoded from the cached feature data; and reading and writing data according to the type mark of the characteristic data to be decoded and the prior decoding processing condition so as to perform disordered decoding processing on the characteristic data to be decoded.

According to an embodiment of the second aspect of the present invention, the parallel acceleration LZ77 decoding apparatus 900 can obtain feature data with type marks by performing recognition judgment processing and classification mark processing on input feature data by using the data recognition classification module 110; then, caching a plurality of adjacent characteristic data into the same address by utilizing the characteristic data caching module 120 in a first-in first-out cache management mode to obtain cached characteristic data with type marks; finally, the feature data to be decoded is obtained from the cached feature data by utilizing the feature data processing module 130; and reading and writing data according to the type mark of the characteristic data to be decoded and the prior decoding processing condition so as to perform disordered decoding processing on the characteristic data to be decoded. The parallel acceleration LZ77 decoding device 900 can decode the feature data to be decoded by optimizing the cache structure and the cache management mode of the feature data and adopting the out-of-order processing mode, so as to improve the processing efficiency in a single decoding processing period, reduce the queuing delay when performing data reading and writing and decoding processing, and thereby effectively improve the overall decoding efficiency.

It should be noted that, since the parallel acceleration LZ77 decoding apparatus of the present embodiment can implement the parallel acceleration LZ77 decoding method of any of the foregoing embodiments, the parallel acceleration LZ77 decoding apparatus of the present embodiment has the same technical principle and the same technical effect as the parallel acceleration LZ77 decoding method of any of the foregoing embodiments, and in order to avoid redundancy of content repetition, a description is omitted here.

In a third aspect, referring to fig. 10, an electronic device 1000 includes: the memory 1020, the processor 1010 and a computer program stored on the memory 1020 and executable on the processor, the processor 1010 implementing a parallel accelerated LZ77 decoding method as the first aspect when executing the computer program.

The processor 1010 and the memory 1020 may be connected by a bus or other means.

The processor 1010 may be implemented by a general-purpose central processing unit, a microprocessor, an application specific integrated circuit, or one or more integrated circuits, etc., and is configured to execute related programs to implement the technical solutions provided by the embodiments of the present invention.

Memory 1020 is a non-transitory computer readable storage medium that may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some implementations, memory 1020 may optionally include memory located remotely from the processor to which the remote memory may be connected via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The non-transitory software programs and instructions required to implement the parallel accelerated LZ77 decoding method of the above embodiments are stored in memory, which when executed by a processor, perform the parallel accelerated LZ77 decoding method of the above embodiments, e.g., perform the method steps shown in fig. 3, 5, 6, 7, and 8 described above.

The apparatus embodiments or system embodiments described above are merely illustrative, in which elements illustrated as separate components may or may not be physically separate, i.e., may be located in one place, or may be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium storing computer-executable instructions for execution by a processor or controller, for example, by one of the above-described apparatus embodiments, which may cause the above-described processor to perform the parallel accelerated LZ77 decoding method of the above-described embodiment, for example, to perform the method steps shown in the above-described fig. 3, 5, 6, 7, and 8.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

While the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present invention, and these equivalent modifications and substitutions are intended to be included in the scope of the present invention.

Claims

1. A parallel accelerated LZ77 decoding method, comprising:

acquiring feature data to be decoded from the cache feature data;

2. The parallel accelerated LZ77 decoding method of claim 1, wherein said feature data comprises: absolute data, relative data, and end markers; the type tag includes: a first mark, a second mark, a third mark, and a fourth mark; the step of performing recognition judgment processing and classification marking processing on the input feature data to obtain the feature data with type marks comprises the following steps:

Receiving the characteristic data sequentially input one by one unordered;

marking the characteristic data by the first mark under the condition that the characteristic data is judged to be the absolute data in a recognition way;

when the characteristic data is identified and judged to be the relative data and the reference data which is required to be read for decoding the relative data is located in a second cache module, marking the characteristic data through the second mark, wherein the reference data is used for decoding the relative data;

when the characteristic data is identified and judged to be the relative data and the reference data which is required to be read for decoding the relative data is positioned in the dynamic random access memory, marking the characteristic data through the third mark;

and when the characteristic data is identified and judged to be the ending mark, marking the characteristic data through the fourth mark.

3. The parallel acceleration LZ77 decoding method according to claim 2, wherein said reading and writing of data according to said type flag of said feature data to be decoded and the previous decoding processing condition, comprises:

determining the feature data to be decoded as the absolute data in the case that the type flag is the first flag;

Inquiring the occupation condition of a write interface of the second cache module in the prior decoding processing condition;

and under the condition that no decoding task occupies the write interface of the second buffer module, writing the absolute data into the second buffer module as a decoding result.

4. The parallel acceleration LZ77 decoding method according to claim 2, wherein said reading and writing of data according to said type flag of said feature data to be decoded and the previous decoding processing condition, comprises:

under the condition that the type mark is the second mark, confirming that the feature data to be decoded is the relative data and the reference data to be read is positioned in the second cache module;

inquiring the proceeding condition of a fetch task in the prior decoding processing condition, wherein the fetch task represents the processing process of reading reference data from the dynamic random access memory;

inquiring the occupation condition of a write interface of the second cache module under the condition that the fetch task does not exist or the fetch task is not related to the reference data which needs to be read currently;

and under the condition that the decoding task occupies the write interface of the second cache module, after the reference data is read from the second cache module, the reference data is written back to the second cache module according to the current write address.

5. The parallel acceleration LZ77 decoding method according to claim 2, wherein said reading and writing of data according to said type flag of said feature data to be decoded and the previous decoding processing condition, comprises:

under the condition that the type mark is the third mark, confirming that the feature data to be decoded is relative data and the reference data to be read is positioned in the dynamic random access memory;

under the condition that the fetching task is not performed, sending a read data request to control a first data handling module to write the reference data read from the dynamic random access memory into a first cache module;

receiving a first feedback signal responsive to the read data request;

and in response to the first feedback signal, under the condition that a decoding task occupies a write interface of the second buffer module, reading the reference data from the first buffer module and writing the reference data into the second buffer module.

6. The parallel acceleration LZ77 decoding method according to claim 2, wherein said reading and writing of data according to said type flag of said feature data to be decoded and the previous decoding processing condition, comprises:

Confirming that the feature data to be decoded is the end mark under the condition that the type mark is the fourth mark;

inquiring the proceeding condition of the number taking task in the previous decoding processing condition;

under the condition that the fetching task is not performed, sending a data writing request to control a second data handling module to read the reference data from a second cache module and write the reference data into the dynamic random access memory;

receiving a second feedback signal responsive to the write data request;

and responding to the second feedback signal, and ending the decoding processing of the feature data to be decoded.

7. The parallel accelerated LZ77 decoding method of claim 5, further comprising:

and simultaneously performing a plurality of fetching tasks by arranging a plurality of groups of the first buffer modules and the first data carrying modules in parallel so as to perform the decoding processing on a plurality of feature data to be decoded in parallel.

8. A parallel acceleration LZ77 decoding device, comprising:

9. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the parallel accelerated LZ77 decoding method according to any of the claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that computer-executable instructions for performing the parallel acceleration LZ77 decoding method according to any one of claims 1 to 7 are stored.