CN212873459U

CN212873459U - System for data compression storage

Info

Publication number: CN212873459U
Application number: CN202020930026.7U
Authority: CN
Inventors: 李鹏; 王耀杰; 阮肇夏
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2021-04-02
Anticipated expiration: 2030-05-27

Abstract

The utility model provides a system (10) for data compression storage, include: a control unit (11), a programmable logic chip (12), and a double-rate synchronous dynamic random access memory (13). A compression instruction generation module (21) in the programmable logic chip (12) is connected to the control unit (11); the data output end (231) of the on-chip memory (23) is connected with the data input ends (221) of the at least two compression paths (22), the data output ends (222) of the at least two compression paths (22) are connected with the double-rate synchronous dynamic random access memory (13), and the control instruction input ends (223) of the at least two compression paths (22) are connected with the compression instruction generation module (21). When the system compresses user data, the compressed data occupies small space for the double-rate synchronous dynamic random access memory, and the programmable logic chip comprises at least two compression paths which can be used for parallel compression, thereby improving the compression efficiency.

Description

System for data compression storage

Technical Field

The present invention relates to the field of data processing, and more particularly, to a system for data compression storage.

Background

In an increasing number of scenarios, a large amount of data needs to be stored. In order to fully utilize the storage space of the memory, in order to store more data, the data is generally compressed and then stored.

However, the current data compression system has a small data compression amount, that is, even the compressed data occupies a large storage space. Moreover, for larger compressed data, the current data compression system consumes larger bandwidth resources in the process of performing read-write operation from the memory.

SUMMERY OF THE UTILITY MODEL

The utility model provides a system for data compression storage can reduce the bandwidth resource when being used for data compression storage in storage space and having reduced the reading and writing.

The utility model provides a system (10) for data compression storage, include: a control unit (11), a programmable logic chip (12), and a double-rate synchronous dynamic random access memory (13); the input end (121) of the programmable logic chip (12) is connected to the control unit (11), and the output end (122) of the programmable logic chip (12) is connected to the double-rate synchronous dynamic random access memory (13);

wherein the programmable logic chip (12) comprises:

a compression instruction generation module (21) connected to the control unit (11);

an on-chip memory (23), a data output (231) of the on-chip memory (23) being connected to at least two compression paths (22);

the data input ends (221) of the at least two compression paths (22) are connected to the on-chip memory (23), the data output ends (222) of the at least two compression paths (22) are connected to the double-rate synchronous dynamic random access memory (13), and the control instruction input ends (223) of the at least two compression paths (22) are connected to the compression instruction generation module (21).

In one implementation, the data input end (241) of the write arbitration module (24) is connected to the at least two compression paths (22), and the data output end (242) of the write arbitration module (24) is connected to the double-rate synchronous dynamic random access memory (13).

In one implementation, the read arbitration module (25) further comprises a control instruction input terminal (251) of the read arbitration module (25) connected to the at least two compression paths 22, and a control instruction output terminal (252) of the read arbitration module (25) connected to the on-chip memory 23.

In one implementation, each compression path (22) of the at least two compression paths includes a read feature map module (30), a control instruction input terminal (301) of the read feature map module (30) is connected to the compression instruction generation module (21), and a control instruction output terminal (302) of the read feature map module (30) is connected to the read arbitration module (25).

In one implementation, each compression path (22) of the at least two compression paths includes:

the input end (311) of the characteristic map caching module (31) is connected to the on-chip memory (23), and the output end (312) of the characteristic map caching module (31) is connected to the data compression module (32);

the input end (321) of the data compression module (32) is connected to the feature map caching module (31), and the output end (322) of the data compression module (32) is connected to the data packing module (33);

a data packing module (33), wherein an input end (331) of the data packing module (33) is connected to the data compression module (32), and an output end (332) of the data packing module (33) is connected to the length alignment module (34);

a length alignment module (34), wherein an input end (341) of the length alignment module (34) is connected to the data packing module (33), and an output end (342) of the length alignment module (34) is connected to a compression feature map buffer module (35);

a compressed feature map caching module (35), wherein an input end (351) of the compressed feature map caching module (35) is connected to the length alignment module (34), and an output end (352) of the compressed feature map caching module (35) is connected to a compressed feature map writing module (36);

a compression characteristic map writing module (36), wherein an input end (361) of the compression characteristic map writing module (36) is connected to the compression characteristic map buffering module (35), and an output end (362) of the compression characteristic map writing module (36) is connected to the double-rate synchronous dynamic random access memory (13).

In one implementation, the system further includes a write arbitration module (24), wherein,

the output end (362) of the compression characteristic map writing module (36) is connected to the first input end (2411) of the writing arbitration module (24), and the output end (242) of the writing arbitration module (24) is connected to the double-rate synchronous dynamic random access memory (13).

In one implementation, the feature map caching module (31) further has a second output (313), the length alignment module (34) further has a second input (343), and the second output (313) of the feature map caching module (31) is connected to the second input (343) of the length alignment module (34).

In one implementation, the data compression module (32) further has a second output (323), each compression path (22) of the at least two compression paths further comprises:

a compression header generation module (37), wherein an input end (371) of the compression header generation module (37) is connected to the second output end (323) of the data compression module (32), and an output end (372) of the compression header generation module (37) is connected to the compression header buffer module (38);

a compression header buffer module (38), wherein an input end (381) of the compression header buffer module (38) is connected to the compression header generation module (37), and an output end (382) of the compression header buffer module (38) is connected to a compression header write module (39);

a compression header writing module (39), wherein an input end (391) of the compression header writing module (39) is connected to the compression header caching module (38), and an output end (392) of the compression header writing module (39) is connected to the double-rate synchronous dynamic random access memory (13).

In one implementation, the output (392) of the compressed header write module (39) is connected to the second input (2412) of the write arbitration module (24).

In one implementation, the data compression module (32) includes a scan encoding module (41) and a difference algorithm compression module (42),

the input end (321) of the scanning coding module (41) is connected to the feature map caching module (31), and the output end (324) of the scanning coding module (41) is connected to the difference algorithm compression module (42);

the input end (325) of the difference algorithm compression module (42) is connected to the scan encoding module (41), the first output end (322) of the difference algorithm compression module (42) is connected to the data packing module (33), and the second output end (323) of the difference algorithm compression module (42) is connected to the compression header generation module (37).

In one implementation, the programmable logic chip (12) is a field programmable gate array chip.

In one implementation, the data width of the data lines used for data transmission in the system (10) is any one of: 8 bits, 16 bits, 32 bits, or 64 bits.

In one implementation, the system (10) is in any one of the following states, and switches between the different states: idle state, receive command state, parse command state, wait for completion state.

It can be seen that, the utility model provides a system for data compression storage can be used for compressing the characteristic map of on-chip memory and then saves double rate synchronous dynamic random access memory, can make the shared less storage space of compressed data, can reduce the space occupation to double rate synchronous dynamic random access memory on the one hand, and bandwidth resource when on the other hand also can reduce the reading and writing has saved the consumption. And the programmable logic chip in the system comprises at least two compression paths which can be used for parallel compression, thereby improving the compression efficiency.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings needed for the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive labor.

FIG. 1 is a schematic block diagram of a system 10 for data compression storage according to the present invention;

fig. 2 is a schematic diagram of the system 10 for data compression storage according to the present invention;

fig. 3 is another schematic diagram of the system 10 for data compression storage according to the present invention;

FIG. 4 is a schematic diagram of another embodiment of the system 10 for data compression storage according to the present invention;

fig. 5 is a schematic diagram of the compression path 22 according to the present invention;

fig. 6 is a schematic diagram of the data compression module 32 according to the present invention;

fig. 7 is a schematic diagram of the state machine of the system 10 of the present invention.

Detailed Description

The technical solution of the present invention will be described clearly and completely with reference to the accompanying drawings, and obviously, the described embodiments are some, but not all embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the protection scope of the present invention.

In order to understand the present invention in more detail, more detailed structures will be presented in the following description in order to explain the present invention. It is apparent that the practice of the invention is not limited to the specific details known to those skilled in the art. The following detailed description of the preferred embodiments of the invention, however, is not intended to limit the invention to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

It is to be understood that the terms "a," "an," and "the" as used herein are intended to describe specific embodiments only and are not to be taken as limiting the invention, which is intended to include the plural forms as well, unless the context clearly indicates otherwise. When the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "upper", "lower", "front", "rear", "left", "right" and the like as used herein are for illustrative purposes only and are not limiting.

Ordinal words such as "first" and "second" are referred to in this application as labels only, and do not have any other meanings, such as a particular order, etc. Also, for example, the term "first component" does not itself imply the presence of "second component", and the term "second component" does not itself imply the presence of "first component".

With the development of artificial intelligence technology, algorithms such as deep learning are involved in more and more fields. One of the cores of deep learning is Neural Networks, such as Convolutional Neural Networks (CNN). In the calculation process of the Convolutional Neural Network (CNN), a large amount of feature map data (or simply, feature map) is generated, and a data compression technique is usually used when the feature map data is written into an external memory of a processor, so that the occupied space of the external memory can be reduced, and the bandwidth during reading and writing can be reduced. The external Memory may be, for example, a Double Data Rate Synchronous Dynamic Random Access Memory (DDR), or DDR for short.

However, convolutional neural networks typically include a large number of convolutional layers, each of which produces a large amount of feature map data. When the DDR is read and written by the large amount of feature map data, precious bandwidth resources of the external memory of the system are consumed, so that other modules (such as CNN or other modules) with large bandwidth requirements cannot access the DDR quickly, and the computing performance is affected. Also, since the access amount to the DDR increases, power consumption further increases.

In order to further reduce the data bulk after the compression, bandwidth and then reduce the consumption when further reducing the reading and writing DDR, the utility model provides a system for data compression storage. Specifically, the feature map data obtained by the calculation of the convolutional neural network can be located in an on-chip memory, and the system aims to compress the feature map data in the on-chip memory and store the compressed data in a double-rate synchronous dynamic random access memory.

Illustratively, the system for data compression storage in the present invention may be as shown in fig. 1, the system 10 comprising: a control unit 11, a programmable logic chip 12, and a double rate synchronous dynamic random access memory 13. The input end 121 of the programmable logic chip 12 is connected to the control unit 11, and the output end 122 of the programmable logic chip 12 is connected to the ddr sdram 13.

The programmable logic chip 12 may be a field programmable gate array chip. The programmable logic chip 12 includes a compress instruction generation module 21, at least two compress paths 22, and an on-chip memory 23. As shown in fig. 2.

It should be understood that the number of the compression paths 22 in the present invention is at least two, for example, 3 or more, and specifically, may be configured according to the performance of the processor and the size of the feature map data to be processed. Therefore, the system can flexibly configure the number of the compression paths according to the output rate of the feature map data, thereby flexibly meeting the performance requirements of different processing tasks and improving the compression performance. For simplicity of illustration, only two compression paths 22 are shown in fig. 2, a first compression path 22 and a second compression path 22, respectively.

Specifically, the compression instruction generation module 21 is connected to the control unit 11. The data output 231 of the on-chip memory 23 is connected to at least two compression paths 22. Each of the at least two compression paths 22 has a data input 221, a data output 222, and a control instruction input 223, wherein the data inputs 221 of the at least two compression paths 22 are connected to the on-chip memory 23, the data outputs 222 of the at least two compression paths 22 are connected to the double-rate synchronous dynamic random access memory 12, and the control instruction inputs 223 of the at least two compression paths 22 are connected to the compression instruction generation module 21.

The input end 211 of the compressed instruction generating module 21 is connected to the control unit 11, and the output end 212 of the compressed instruction generating module 21 is connected to at least two compression paths 22.

Illustratively, the programmable logic chip 12 may further include a write arbitration module 24, as shown in fig. 3, the write arbitration module 24 has a data input 241 and a data output 242, the data input 241 of the write arbitration module 24 is connected to the at least two compression paths 22, and the data output 242 of the write arbitration module 24 is connected to the double-rate synchronous dynamic random access memory 13.

Illustratively, the programmable logic chip 12 may further include a read arbitration module 25, as shown in fig. 3, the read arbitration module 25 has a control command input end 251 and a control command output end 252, the control command input end 251 of the read arbitration module 25 is connected to the at least two compression paths 22, and the control command output end 252 of the read arbitration module 25 is connected to the on-chip memory 23. It is understood that the on-chip memory 23 has a control command input 232, and the control command input 232 of the on-chip memory 23 is connected to the read arbitration module 25.

Illustratively, as shown in fig. 3, a read command cache module 26 and a read data path identification cache module 27 may be further included. The read command buffer module 26 is connected to the read arbitration module 25, and the read data path identification buffer module 27 is connected to the read arbitration module 25.

The modular structure of each compression path 22 of the at least two compression paths 22 will be described in more detail below in connection with fig. 4 to 6.

Illustratively, each compression path 22 may include a read feature map module 30, a feature map cache module 31, a data compression module 32, a data packing module 33, a length alignment module 34, a compression feature map cache module 35, a compression feature map write module 36, a compression header generation module 37, a compression header cache module 38, and a compression header write module 39.

The control command input terminal 301 of the feature map reading module 30 is connected to the compressed command generating module 21, and the control command output terminal 302 of the feature map reading module 30 is connected to the read arbitrating module 25. Specifically, the control instruction input terminal 301 of the feature map reading module 30 is connected to the output terminal 212 of the compressed instruction generating module 21, and the control instruction output terminal 302 of the feature map reading module 30 is connected to the control instruction input terminal 251 of the read arbitrating module 25.

The input 311 of the feature map buffer module 31 is connected to the on-chip memory 23, and the output 312 of the feature map buffer module 31 is connected to the data compression module 32.

The input 321 of the data compression module 32 is connected to the feature map buffer module 31, and the output 322 of the data compression module 32 is connected to the data packing module 33. Specifically, the input 321 of the data compression module 32 is connected to the output 312 of the feature map buffer module 31.

The input 331 of the data packing module 33 is connected to the data compression module 32, and the output 332 of the data packing module 33 is connected to the length alignment module 34. Specifically, the input 331 of the data packing module 33 is connected to the output 322 of the data compression module 32.

The input 341 of the length alignment module 34 is connected to the data packing module 33, and the output 342 of the length alignment module 34 is connected to the compression feature map buffering module 35. In particular, the input 341 of the length alignment module 34 is connected to the output 332 of the data packing module 33.

The input 351 of the compressed feature map caching module 35 is connected to the length alignment module 34, and the output 352 of the compressed feature map caching module 35 is connected to the compressed feature map writing module 36. In particular, the input 351 of the compressed feature map caching module 35 is connected to the output 342 of the length alignment module 34.

The input 361 of the compression characteristic map writing module 36 is connected to the compression characteristic map buffering module 35, and the output 362 of the compression characteristic map writing module 36 is connected to the double-rate synchronous dynamic random access memory 13. Specifically, the input 361 of the compressed feature map writing module 36 is connected to the output 352 of the compressed feature map caching module 35.

In embodiments that include write arbitration module 24, output 362 of compressed signature write module 36 may be coupled to double rate synchronous dynamic random access memory 13 via write arbitration module 24. Specifically, the output 362 of the compressed signature write module 36 is coupled to the first input 2411 of the write arbitration module 24, and the output 242 of the write arbitration module 24 is coupled to the double-rate sdram 13. The input 241 of the write arbitration module 24 includes a first input 2411 and a second input 2412.

Referring to fig. 5, the feature map caching module 31 further has a second output 313, the length alignment module 34 further has a second input 343, and the second output 313 of the feature map caching module 31 is connected to the second input 343 of the length alignment module 34.

Referring to fig. 5, the data compression module 32 further has a second output terminal 323.

The input 371 of the compression header generation module 37 is connected to the second output 323 of the data compression module 32, and the output 372 of the compression header generation module 37 is connected to the compression header buffer module 38.

The input terminal 381 of the compression header buffer module 38 is connected to the compression header generation module 37, and the output terminal 382 of the compression header buffer module 38 is connected to the compression header write module 39. Specifically, the input terminal 381 of the compression header buffer module 38 is connected to the output terminal 372 of the compression header generation module 37.

The input 391 of the compression header writing module 39 is connected to the compression header buffer module 38, and the output 392 of the compression header writing module 39 is connected to the DDR SDRAM 13. Specifically, the input 391 of the compression header write module 39 is connected to the output 382 of the compression header buffer module 38.

In embodiments that include write arbitration module 24, output 392 of compressed header write module 39 may be coupled to double rate synchronous dynamic random access memory 13 via write arbitration module 24. Specifically, the output 392 of the header write module 39 is coupled to the second input 2412 of the write arbitration module 24, and the output 242 of the write arbitration module 24 is coupled to the double rate sdram 13.

As shown in fig. 6, the data compression module 32 may include a scan encoding module 41 and a difference algorithm compression module 42. The input 321 of the scan coding module 41 is connected to the feature map buffering module 31, and the output 324 of the scan coding module 41 is connected to the difference algorithm compressing module 42. The input 325 of the difference algorithm compression module 42 is connected to the scan encoding module 41, in particular to the output 324 of the scan encoding module 41. The first output 322 of the difference algorithm compression module 42 is connected to the data packing module 33, in particular to the input 331 of the data packing module 33. A second output 323 of the difference algorithm compression module 42 is connected to the compression header generation module 37.

In addition, the data width of the data line for data transmission in the system 10 of the present invention may be any of the following: 8 bits, 16 bits, 32 bits, or 64 bits.

Illustratively, the system 10 of the present invention may be in any of the following states, and switched between the different states: idle state, receive command state, parse command state, wait for completion state.

Referring to the state machine shown in fig. 7, the system 10 can switch from an idle state to a receive command state, from a receive command state to a resolve command state, from a resolve command state to a wait for completion state, and from a wait for completion state to an idle state.

The system 10 will be described below in connection with the process of compressing profile data.

The compressed instruction generation module may be to distribute compressed instructions to the various compression paths. The compression path may read corresponding feature map data from the on-chip memory according to the compression instruction and then compress the read feature map data. The read arbitration module may arbitrate read profile commands for the at least two compression paths against profile data in the on-chip memory. The write arbitration module can arbitrate write requests of at least two compression paths for writing compressed data into the double-rate synchronous dynamic random access memory.

A compression instruction generation module, which may be denoted as ENC _ INSTR _ PROC module, may receive a compression instruction, and parse the compression instruction; and further distributing the analyzed compression instruction to each compression path. Specifically, when the on-chip memory stores the feature map data and needs to be compressed and stored, the control unit may send a compression instruction to the compression instruction generation module. After receiving the compression instruction, the compression instruction generation module may correspondingly distribute the compression instruction to each compression path, so that each compression path may read and compress the feature map data from the on-chip memory. Wherein, the compression instruction distributed to a certain compression path may include: the method comprises the steps of compressing the feature maps of the double-rate synchronous dynamic random access memory, compressing header information of the feature maps of the double-rate synchronous dynamic random access memory, outputting the compressed header information of the feature maps of the double-rate synchronous dynamic random access memory to the base address of the double-rate synchronous dynamic random access memory, and outputting the compressed header information of the feature maps of the double-rate synchronous dynamic random access memory to the base address of the double-rate synchronous dynamic random access memory.

A read arbitration module, which may be denoted as an FM _ RD _ ARB module, and the system may further include a read command buffer module (which may be denoted as an RD _ CMD _ FIFO module) and a read data path identification buffer module (which may be denoted as an RDATA _ ID _ FIFO module); are all connected with the read arbitration module. The read arbitration module may obtain the read profile commands issued by the respective compression paths, that is, the read profile commands issued by the respective compression paths may be aggregated there. And the read characteristic diagram commands of each compression path can be cached in the corresponding read command caching module. The read arbitration module can arbitrate the read characteristic diagram commands in each read command cache module according to the arbitration rules to obtain an arbitration result. If the arbitration result indicates that the first compression path wins, the read characteristic diagram command of the first compression path is preferentially sent to the on-chip memory, and the path Identification (ID) of the first compression path is stored in the read data path identification cache module. After that, after returning the feature map data from the on-chip memory, the returned feature map data is sent to the compression path corresponding to the path Identification (ID) based on the path Identification (ID) stored in the read data path identification cache module. Optionally, the arbitration rule may be a priority mechanism configured according to the compression instruction or a fair polling mechanism; or may be other arbitration related mechanisms, not listed here. It is understood that if the arbitration rule is a priority mechanism, the compression processing of the compression path of high priority can be preferentially secured, ensuring the task performance of the compression path of that priority.

The write arbitration module, which may be denoted as FM _ WR _ ARB module, may obtain the write requests issued by the respective compression paths, i.e. the write requests issued by the respective compression paths may be aggregated there. And, each write request can be arbitrated according to the arbitration rule to obtain the arbitration result. And if the arbitration result indicates that the second compression path wins, the write request of the second compression path is preferentially sent to the double-rate synchronous dynamic random access memory, namely, the compressed data obtained by the second compression path is stored in the double-rate synchronous dynamic random access memory. Optionally, the arbitration rule may be a priority mechanism configured according to the compression instruction or a fair polling mechanism; or may be other arbitration related mechanisms, not listed here.

The compression PATH modules may be denoted as ENC _ PATH modules, and each compression PATH module may include a feature map reading module, a feature map caching module, a data compression module, a data packing module, a compression header generation module, a length alignment module, a compression header caching module, a compression feature map caching module, a compression header writing module, and a compression feature map writing module.

The read profile module, which may be denoted as an RD-FM module, may send a read profile command for the original profile in the on-chip memory according to the compression instruction received from the compression instruction generation module. The read feature map command may include, among other things, the width, height, base address of the on-chip memory, etc. of the original feature map to be read. Optionally, when the bypass operation needs to be performed, the feature map reading module is further configured to re-read the original feature map compressed this time from the on-chip memory.

The signature buffer module, which may be denoted as SRC _ FM _ FIFO module, may be used to store the original signature read back from the on-chip memory.

The data compression module may be configured to divide the original feature map in the feature map cache module into a plurality of data units, and perform difference compression on each data unit in the plurality of data units. Illustratively, the data compression module may include: a scanning coding module and a difference algorithm compression module. The SCAN coding module can be represented as a SCAN _ DPCM module, and the difference algorithm compression module can be represented as a RES _ ENC module.

And a DATA packing module, which can be represented as a DATA _ PACK module, for splicing the DATA compressed by the DATA compression module into complete compressed DATA. Particularly, the fragmented data compressed by the data compression module is spliced into complete data, for example, data in units of 16 bytes.

And the length alignment module can be represented as an LEN _ ALIGN module and is used for complementing the length of the compressed data spliced by the data packing module to a specific length. Illustratively, it is also used to patch the length of the original feature map to a specific length when a bypass operation needs to be performed. That is, the length of data to be output may be padded to a certain length. Wherein the specific length is related to the chip performance of the double rate synchronous dynamic random access memory. That is, the specific length may be preset according to the performance of the chip of the double rate synchronous dynamic random access memory. Illustratively, for the current data unit, at the end, the compressed length may be padded to a certain length with invalid data. For example, when the current data unit is finished, the length of the compressed data is filled from Nx16B to ceil (N/4). times.64B, wherein ceil represents rounding up, and N is a positive integer. It can be understood that, because the chip of some external memory (like DDR) just can high-efficient work when writing data and satisfying certain length, consequently the utility model provides a system can guarantee that double rate synchronous dynamic random access memory can work more high-efficiently through setting up the length alignment module to also can promote entire system's performance.

And a compression header generation module, which may be denoted as an ENC _ HDR _ GEN module, configured to generate compression header information corresponding to the compressed data obtained by the data compression module according to the compression instruction received from the compression instruction generation module. Specifically, the compression header information may be generated according to address information, feature size information, the length of the compression result in the current clock cycle, whether it is the end of the current data unit, and the like in the compression instruction. On one hand, the generated compression header information can be used to determine whether a bypass (bypass) of the current data unit is needed, and on the other hand, the generated compression header information is used to decompress the compressed data in the future. It is to be understood that the process of determining whether bypass is required based on the compression header information is optional, but not necessary, that is, the compressed data and the compression header information may be stored without determining whether bypass is required.

And the compression header caching module can be represented as an ENC _ HDR _ FIFO module and is used for caching the compression header information to be output, which is generated by the compression header generation module.

And the compressed feature map caching module, which can be represented as an ENC _ FM _ FIFO module, is configured to cache data to be output, where the cached data may be compressed data after length completion, or may be an original feature map after length completion of an original feature map read back from an on-chip memory during a bypass operation.

A compression header writing module, which may be denoted as ENC _ HDR _ WR module, performs a writing operation of the compression header information.

The compressed signature write module, which may be referred to as an ENC _ FM _ WR module, performs a data storage operation, and in particular writes the original signature read back from the on-chip memory into the double rate synchronous dynamic random access memory when compressing data or bypassing operations within the compressed signature cache module.

If the compressed data after compression is larger than the original data, that is, the storage space occupied by the compressed data is larger than the storage space occupied by the original data, at this time, it is unreasonable to store the compressed data, so that the bypass operation is performed and the original data is stored. Specifically, the workflow when performing the bypass operation can be briefly described as follows:

a. and recording the bypass information of the bypass operation to the compression header information for use in decompression. It is understood that the compression header information for the compressed data and the compression header information for the original data when the bypass operation was performed may both have different compression identifications. For example, the first compression flag represents compressed data and the second compression flag represents original data.

b. And the compression characteristic diagram writing module, namely the ENC _ FM _ WR module records the base address of the current writing operation, namely the address to be written into the double-rate synchronous dynamic random access memory.

c. The module currently operating in the compression path is reset. For example, the modules currently in operation may include data compression modules and the like.

d. And the characteristic diagram reading module, namely the RD _ FM module, resends the characteristic diagram reading instruction so as to restart the reading of the original characteristic diagram of the current compression unit. That is, the original feature map is read again from the on-chip memory, and the read original feature map may be stored in the feature map caching module.

e. After the bypass mechanism reads the original feature map, the original feature map does not pass through the data compression module and the data packing module, but directly goes from the feature map cache module to the length alignment module, and the bypass mechanism multiplexes the feature map cache module to output length alignment.

f. The compressed characteristic diagram writing module overwrites the previously obtained compressed data into an original characteristic diagram and outputs the original characteristic diagram to the double-rate synchronous dynamic random access memory.

Therefore, the utility model discloses a set for the bypass mechanism, can ensure to be littleer to the shared memory space of double rate synchronous dynamic random access memory.

Through the utility model discloses a system for data compression storage can compress the back with the characteristic map data that obtains in the treater through convolution neural network etc. and the storage is in double rate synchronous dynamic random access memory.

Additionally, the system 10 for data compression storage in the present invention may have a number of different states, which may include, but are not limited to: idle state, receive instruction state, parse instruction state, wait for completion state, etc. Illustratively, the state switching may be implemented as a state machine as shown in FIG. 7.

The IDLE state, which may be denoted as an IDLE state, is a state in which the system waits for a compression instruction start signal and switches to a receive instruction state after receiving the compression instruction start signal. Where the compress instruction start signal may be denoted instr _ strt.

The receive instruction state, which may be represented as the RCV _ INSTR state, is when the system is in that state, the compressed instruction is being received until the reception is complete. After the reception is completed, an instruction ready signal may be output, and a switch may be made to a parse instruction state while or after the instruction ready signal is output. The instruction ready signal may be denoted instr _ rdy.

The analysis instruction state may be represented as a PROC _ INSTR state, and when the system is in this state, the compression instructions received in the instruction receiving state are analyzed, and the compression instructions are distributed to the respective compression paths according to the analysis. The compressed instruction dispatched to each compression path may be denoted instr _ isu. Specifically, the compression instruction distributed to a certain compression path may include: the method comprises the steps of compressing the feature maps of the double-rate synchronous dynamic random access memory, compressing header information of the feature maps of the double-rate synchronous dynamic random access memory, outputting the compressed header information of the feature maps of the double-rate synchronous dynamic random access memory to the base address of the double-rate synchronous dynamic random access memory, and outputting the compressed header information of the feature maps of the double-rate synchronous dynamic random access memory to the base address of the double-rate synchronous dynamic random access memory.

Taking the compression instruction distributed to the first compression path as an example, the instruction information included in the compression instruction distributed to the first compression path may include: (1) FM _ NUM, which represents the number of the feature maps needing to be compressed in the first compression path; (2) FM _ WIDTH, which represents the WIDTH of the feature map needing to be compressed by the first compression path; (3) FM _ HIGHT, which represents the height of the characteristic diagram of the first compression path needing to be compressed; (4)

FM _ SRAM _ BADDR, representing the first compression path needs the compressed feature map on the chip memory base address; (5) FM _ SRAM _ LEN, which represents the inter-graph storage interval of the characteristic graph needing to be compressed in the first compression path in the on-chip memory; (6) FM _ DDR _ BADDR, which represents the output of the characteristic diagram after the compression of the first compression path to the base address of the double-rate synchronous dynamic random access memory; (7)

FM _ DDR _ LEN, which represents the inter-graph storage interval from the compressed characteristic graph of the first compression path to the double-rate synchronous dynamic random access memory; (8) FM _ HDR _ BADDR, which represents that the compressed header information corresponding to the characteristic diagram after the first compression path is compressed is output to the base address of the double-rate synchronous dynamic random access memory; (9) and FM _ HDR _ LEN represents the header information storage interval of the double-rate synchronous dynamic random access memory, wherein the compressed header information corresponding to the characteristic diagram after the compression of the first compression path is output to the double-rate synchronous dynamic random access memory.

The WAIT for completion state, which may be represented as a WAIT DONE state, may monitor completion signals for various compression paths while the system is in this state, and may switch to an idle state after monitoring completion of all compression paths. For example, after the compression path is completed, an instruction completion signal may be output to the upper module. The instruction completion signal may be denoted instr _ done.

It is visible, the utility model discloses a set for the state machine that is used for the system of data compression storage, can guarantee the normal operating of this system, ensure the safe orderly storage of characteristic map data.

It should be noted that while the above-mentioned description has described example embodiments of the invention with reference to the accompanying drawings, it is to be understood that the above-mentioned example embodiments are merely illustrative of, and are merely or illustrative of particular embodiments of the invention, and are not intended to limit the scope of the invention thereto. Those skilled in the art can make various changes, modifications, variations, substitutions, etc. within the scope of the disclosed technology or based on the disclosure of the present invention without departing from the scope and spirit of the present invention. All such changes, modifications, variations, substitutions and the like are intended to be included within the scope of the present invention as claimed in the appended claims.

Claims

1. A system (10) for compressed storage of data, comprising: a control unit (11), a programmable logic chip (12), and a double-rate synchronous dynamic random access memory (13); the input end (121) of the programmable logic chip (12) is connected to the control unit (11), and the output end (122) of the programmable logic chip (12) is connected to the double-rate synchronous dynamic random access memory (13);

wherein the programmable logic chip (12) comprises:

2. The system (10) of claim 1, further comprising a write arbitration module (24), a data input (241) of the write arbitration module (24) being coupled to the at least two compression paths (22), a data output (242) of the write arbitration module (24) being coupled to the double rate synchronous dynamic random access memory (13).

3. The system (10) of claim 1, further comprising a read arbitration module (25), a control command input (251) of the read arbitration module (25) being connected to the at least two compression paths (22), and a control command output (252) of the read arbitration module (25) being connected to the on-chip memory (23).

4. The system (10) according to claim 3, wherein each compression path (22) of the at least two compression paths comprises a read profile module (30), wherein a control instruction input (301) of the read profile module (30) is connected to the compression instruction generation module (21), and wherein a control instruction output (302) of the read profile module (30) is connected to the read arbitration module (25).

5. The system (10) of claim 1, wherein each compression path (22) of the at least two compression paths comprises:

6. The system (10) of claim 5, further comprising a write arbitration module (24), wherein,

7. The system (10) according to claim 5, wherein the feature map caching module (31) further has a second output (313), the length alignment module (34) further has a second input (343), and the second output (313) of the feature map caching module (31) is connected to the second input (343) of the length alignment module (34).

8. The system (10) of claim 6, wherein the data compression module (32) further has a second output (323), each compression path (22) of the at least two compression paths further comprising:

9. The system (10) of claim 8, wherein the output (392) of the compressed header write module (39) is connected to the second input (2412) of the write arbitration module (24).

10. The system (10) according to claim 5, wherein the data compression module (32) comprises a scan encoding module (41) and a difference algorithm compression module (42),

11. The system (10) of claim 1, wherein the programmable logic chip (12) is a field programmable gate array chip.

12. The system (10) of claim 1, wherein a data width of a data line used for data transmission in the system (10) is any one of: 8 bits, 16 bits, 32 bits, or 64 bits.