WO2020211000A1 - 数据解压缩的装置与方法 - Google Patents

数据解压缩的装置与方法 Download PDF

Info

Publication number
WO2020211000A1
WO2020211000A1 PCT/CN2019/082993 CN2019082993W WO2020211000A1 WO 2020211000 A1 WO2020211000 A1 WO 2020211000A1 CN 2019082993 W CN2019082993 W CN 2019082993W WO 2020211000 A1 WO2020211000 A1 WO 2020211000A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
register
compression unit
module
storage space
Prior art date
Application number
PCT/CN2019/082993
Other languages
English (en)
French (fr)
Inventor
杨康
李鹏
赵文军
和田祐司
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2019/082993 priority Critical patent/WO2020211000A1/zh
Priority to CN201980005236.5A priority patent/CN111279617A/zh
Publication of WO2020211000A1 publication Critical patent/WO2020211000A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6017Methods or arrangements to increase the throughput

Definitions

  • This application relates to the field of data processing, and more specifically, to a data decompression device and method.
  • the data to be processed is read from the memory and sent to the processing unit for processing.
  • the amount of data involved in the calculation is huge, mainly feature map data and weight data.
  • These data must be read from the memory and sent to the processing unit (for example, the calculation unit (Calculate Unit)) Participate in the calculation.
  • the processing unit for example, the calculation unit (Calculate Unit) Participate in the calculation.
  • Bandwidth limited data bandwidth
  • the prior art data compression and data decompression solutions usually read compressed data and compression instructions from a memory respectively, and then transmit the compressed data and compression instructions to a decompression device for data decompression.
  • a decompression device for data decompression.
  • the existing data decompression process there is a synchronization problem between the compressed data and the compressed instruction. If the transmission of the two is not synchronized, the time overhead of waiting for each other to be read from the memory will be generated, which will increase the decompression Time overhead. If the performance of the decompression device is poor, for example, the time overhead is large, the benefits of data compression will be reduced.
  • the present application provides a data decompression device and method, which can reduce the time overhead of data decompression to a certain extent compared with the prior art, thereby improving the efficiency of data decompression.
  • a data decompression device in a first aspect, includes a control module, a cache module, and a decompression module.
  • the control module is configured to load compressed data including a compression unit into the cache module, the compression unit including compression information and non-zero data, and the compression information represents non-zero data in the original data corresponding to the compression unit.
  • the control module is also used to send the initial storage location of the current compression unit in the cache module to the decompression module; the decompression module is used to, according to the initial storage location, in the
  • the cache module reads the compression information and non-zero data of the current compression unit, and obtains the original data corresponding to the current compression unit accordingly.
  • the compressed data involved in this application consists of a compression unit, and the compression unit includes non-zero data and compression information.
  • the decompression module obtains the non-zero data and compression information of the compression unit, the original data corresponding to the compression unit can be obtained, that is, decompression can be realized.
  • the non-zero data and compressed information of a compression unit are stored, transmitted and decompressed as a whole, that is, the compression unit. Therefore, the data decompression scheme provided in this application does not contain compressed data and compression instructions.
  • the synchronization problem between the two naturally, there is no time overhead caused by this synchronization problem. Therefore, compared with the prior art, the data decompression solution provided in the present application can effectively reduce the time overhead of data decompression, thereby improving data processing efficiency.
  • a data processing device in a second aspect, includes: a cache module, including a first register and a second register, the storage space of the first register and the second register are both divided into low R bit storage Space and high R bit storage space, R is a positive multiple of 8; the control module is used to load the data to be processed into the buffer module, and is also used to send to the processing module the initial storage of the current data in the buffer module Location, wherein the initial storage location of the data is acquired from the low R bit storage space of the first register; the processing module is configured to read all data from the cache module according to the initial storage location of the current data The current data and processing; the control module is also used to control the first register and the second register to buffer the data to be processed in the following manner: 1), after the data to be processed is loaded into the first register, Load the data in the high R bit storage space of the first register into the low R bit storage space of the second register, and load the next R bit to-be-processed data into the high R bit storage of the second
  • the application adopts the ping-pong mechanism to cache compressed data, which can ensure the continuity of the data to be decompressed provided by the cache module.
  • the condition for updating the compressed data in the first register is that the initial storage location of the current compression unit is moved to the high R-bit storage space of the first register.
  • the initial storage location of the current compression unit has not moved to the high R bit storage space of the first register, it is determined that there is no need to update the compressed data in the first register; when the initial storage location of the current compression unit has moved to the high R bit of the first register
  • bit storage space it is determined that the compressed data in the first register needs to be updated.
  • the initial storage location of the current compression unit is a generated register signal. Therefore, the above judgment logic is relatively simple, which helps to increase the operating clock frequency that the data decompression device can run.
  • a method for data decompression includes: loading compressed data including a compression unit into a cache module, the compression unit including compression information and non-zero data, and the compression information represents the compression The location of the non-zero data in the original data corresponding to the unit; sending the initial storage location of the current compression unit in the cache module to the decompression module, so that the decompression module reads the current location according to the initial storage location The compression information and non-zero data of the compression unit are obtained, and the original data corresponding to the current compression unit is obtained accordingly.
  • a method for controlling a data cache module includes a first register and a second register.
  • the storage space of the first register and the second register is divided into a low R-bit storage space and a high R-bit storage space.
  • R is a positive multiple of 8. number.
  • the control method includes: loading the data to be processed into the cache module; and sending the current data to the processing module at the initial storage location of the cache module, so that the processing module will start from the initial storage location of the current data.
  • the buffer module reads and processes the current data, wherein the initial storage location of the data is acquired from the low R bit storage space of the first register; the first register and the second register are controlled in the following manner
  • the register buffers the data to be processed: 1), after the data to be processed is loaded into the first register, the data in the high R bit storage space of the first register is loaded into the low R bit storage space of the second register , Load the next R-bit data to be processed into the high-R-bit storage space of the second register; 2), when the initial storage location of the current data is in the high-R-bit storage space of the first register Load the 2R bits of data in the second register into the first register, and repeat steps 1) and 2) until the data processing ends.
  • the data decompression scheme provided by the present application does not have the synchronization problem between instructions and data, can reduce time overhead, help realize real-time decompression, and ensure the continuity of data flow.
  • the data decompression solution provided by the present application can also reduce the bandwidth overhead between the decompression device and the external memory, and can also reduce the instruction cache overhead within the decompression device.
  • Figure 1 is a schematic diagram of the data decompression architecture.
  • Fig. 2 is a schematic block diagram of a data decompression device according to an embodiment of the present application.
  • Fig. 3 is a schematic diagram of a compression unit according to an embodiment of the present application.
  • Fig. 4 is a schematic diagram of a compression unit and compressed data according to an embodiment of the present application.
  • Fig. 6 is a schematic flowchart of a method for controlling a cache module according to an embodiment of the present application.
  • Fig. 7 is another schematic flowchart of a method for controlling a cache module according to an embodiment of the present application.
  • Fig. 8 is a schematic block diagram of a decompression module with a pipeline structure according to an embodiment of the present application.
  • Fig. 9 is a schematic block diagram of a data decompression method according to an embodiment of the present application.
  • the embodiments of the present application can be applied to scenarios of data decompression.
  • Figure 1 is a schematic diagram of the data decompression architecture.
  • the data stored in the memory is compressed data (that is, data after data compression processing).
  • the processing module includes a decompression unit and a processing unit (for example, a calculation unit).
  • the decompression unit decompresses the compressed data read from the memory to obtain original data corresponding to the compressed data, and the processing unit performs related processing on the original data.
  • the compressed data compared to the original data (that is, the data without data compression processing), the compressed data requires less storage space in the memory, and the data bandwidth (Bandwidth) required for transmission from the memory to the processing module is also smaller. That is, the data can be loaded into the processing module in a short time, which can improve the data processing efficiency.
  • this application proposes a data decompression solution, which can reduce the time overhead of data decompression to a certain extent, improve the efficiency of data decompression, and thus can improve the efficiency of data processing.
  • the control module 210 is used to control the compressed data in the external memory to be loaded into the cache module 220.
  • the external memory means a storage module other than the device 200 for data decompression.
  • the control module 210 may be a control circuit.
  • control module 210 controls the compressed data to be loaded into the cache module 220 may be: the control module 210 reads the compressed data in the external memory and writes it into the cache module 220.
  • the compressed data is composed of compression units.
  • Each compression unit is composed of non-zero data and compressed information, as shown in Figure 3.
  • non-zero data represents data whose value is not zero in the original data corresponding to the compression unit
  • compression information is information used to indicate the location of the non-zero data in the original data corresponding to the compression unit (also equivalent to compression information Is used to indicate the location of data whose value is not zero in the original data corresponding to the compression unit).
  • the original data corresponding to each compression unit includes M data, which is equivalent to that, in the original data, every M data is compressed as a compression unit.
  • M is a positive integer, for example, M is equal to 8.
  • the data accuracy of the original data is N bits, which is equivalent to that each data in the original data has an N bit accuracy.
  • N is a positive integer, for example, N is 8 bits (1 byte).
  • the original data corresponding to each compression unit includes 8 data
  • the data accuracy is 8 bits.
  • the original data corresponding to a compression unit is "D, C, 0, 0, 0, B, 0, A”
  • the compressed data obtained after data compression on the original data is "DCBA_11000101”
  • the compression unit is "DCBA_11000101”.
  • “DCBA” is the non-zero data in the original data
  • "11000101” is the 8-bit compressed information used to indicate the non-zero in the original data "D, C, 0, 0, 0, B, 0, A”
  • the position of the data "DCBA” is also equivalent to the position of the data whose value is zero in the original data "D, C, 0, 0, 0, B, 0, A”.
  • the size of the compression unit ranges from 1 byte to 9 bytes. For example, if 8 pieces of data in the original data corresponding to a compression unit are all 0, the size of the compression unit is 1 byte. For another example, in the above "DCBA_11000101" example, the size of the compression unit is 5 bytes. For another example, if 8 data in the original data corresponding to a compression unit are all non-zero data, the size of the compression unit is 9 bytes.
  • a piece of continuous original data is divided into multiple groups of original data corresponding to compression units, and the multiple groups of original data are compressed separately, and after continuous arrangement, a piece of continuous compressed data is formed, that is, compressed data including multiple compression units.
  • the control module 210 is also configured to send the initial storage location of the current compression unit in the cache module 220 to the decompression module 230.
  • the current compression unit described herein means the compression unit currently to be decompressed.
  • control module 210 determines the initial storage location of the current compression unit in the cache module 220.
  • the device 200 further includes a position determination module (not shown in FIG. 2), and the position determination module is used to determine the initial storage position of the current compression unit in the cache module 220.
  • the control module 210 is configured to obtain the initial storage location of the current compression unit from the location determination module.
  • the decompression module 230 is configured to read the compression information and non-zero data of the current compression unit from the cache module 220 according to the initial storage location, and decompress to obtain the current compression unit corresponding to the compression information and non-zero data obtained Raw data.
  • control module 210 sequentially sends the initial storage location to the decompression module 230 according to the compression unit. Therefore, the decompression module 230 sequentially decompresses the compressed data stored in the first register according to the granularity of the compression unit.
  • the process by which the decompression module 230 obtains the original data corresponding to the current compression unit is the inverse process of the compression process shown in FIG. 4.
  • the original data obtained by decompression by the decompression module 230 may be sent to a processing unit, such as a calculation unit, for data operation.
  • the compressed data involved in this application consists of a compression unit, and the compression unit includes non-zero data and compression information.
  • the decompression module obtains the non-zero data and compression information of the compression unit, the original data corresponding to the compression unit can be obtained, that is, decompression can be realized.
  • the non-zero data and compressed information of a compression unit are stored, transmitted and decompressed as a whole, that is, the compression unit. Therefore, the data decompression scheme provided in this application does not contain compressed data and compression instructions.
  • the synchronization problem between the two naturally, there is no time overhead caused by this synchronization problem. Therefore, compared with the prior art, the data decompression solution provided in the present application can effectively reduce the time overhead of data decompression, thereby improving data processing efficiency.
  • the compression information in the compression unit is regarded as a compression instruction, it can be considered that in the embodiment of the present application, the compression instruction and the compressed effective data are continuously stored, continuously transmitted, and sequentially decompressed. Therefore, there is no communication between the instruction and the data.
  • the synchronization problem can reduce time overhead, help realize real-time decompression, and ensure the continuity of data flow.
  • data decompression can be realized by only one instruction.
  • the data decompression device 200 of the embodiment of the present application receives a decompression instruction that instructs to decompress a piece of data, and the instruction also indicates how many compression units are included in the data, and then the device 200 follows the manner described in the above embodiment Perform decompression until the decompression of all compression units is completed.
  • the embodiment of the present application does not require a large number of instructions to assist decompression, which can reduce the bandwidth overhead between the decompression device and the external memory, and can also reduce the instruction cache overhead within the decompression device.
  • the data decompression scheme provided by the present application does not have the synchronization problem between instructions and data, can reduce time overhead, help realize real-time decompression, and ensure the continuity of data flow.
  • the data decompression solution provided by the present application can also reduce the bandwidth overhead between the decompression device and the external memory, and can also reduce the instruction cache overhead within the decompression device.
  • the cache module 220 includes at least two registers; the control module 230 is configured to control the at least two registers to cache compressed data using a ping-pong mechanism.
  • the storage space of each register included in the cache module 220 is the same.
  • the storage space of each register included in the cache module 220 is 2R bits, and R is a positive multiple of 8.
  • R is 128.
  • the first register is used to provide compressed data for the decompression module
  • the control module 230 is used to determine the initial storage location of the current compression unit on the first register
  • the second register is used to load the compressed data from the external memory.
  • the control module 230 is used to control the compressed data in the external memory to be loaded into the second register.
  • the control module 230 is used to control the compressed data in the second register to be loaded into the first register.
  • the above-mentioned way of caching data can be called a ping-pong mechanism.
  • the cache module 220 adopts a ping-pong mechanism to cache compressed data, which can ensure the continuity of the data to be decompressed provided by the cache module, thereby improving the efficiency of data decompression.
  • the cache module 220 includes a first register and a second register, and the storage space of the first register and the second register are both 2R bits.
  • the initial storage location of the current compression unit is acquired from the storage space of the first register.
  • the control module 230 is used to load the 2R bit compressed data in the external memory into the second register.
  • the 2R bit in the second register The compressed data is loaded into the first register.
  • the first 2R-bit compressed data in the data to be decompressed it can be directly loaded into the first register, or loaded into the second register first, and then loaded into the first register from the second register.
  • the cache module 220 includes a first register and a second register.
  • the storage space of the first register and the second register is 2R bits, and R is a positive multiple of 8. number. For example, R is 128.
  • the storage spaces of the first register and the second register are both divided into low R-bit storage space and high R-bit storage space. The initial storage location of the current compression unit is acquired from the low R bit storage space of the first register.
  • the way of buffering and updating compressed data in the first register and the second register is: after the compressed data is loaded into the first register, the compressed data in the high R-bit storage space of the first register is unconditionally loaded into the low of the second register. In the R bit storage space; when decompressing the compressed data stored in the first register, follow the compression unit sequentially. If the initial storage location of the current compression unit moves to the high R bit storage space of the first register, the first The 2R bits of compressed data in the two registers are loaded into the first register. It should be understood that this process is equivalent to moving the compressed data in the high R bit storage space of the first register to the low R bit storage space of the first register, and at the same time transferring the new R bit compressed data from the high R bit storage space of the second register. The bit storage space is moved to the high R bit storage space of the first register.
  • the control module 210 may be configured to execute the process shown in FIG. 6 to control the first register and the second register to buffer compressed data in the above-mentioned manner.
  • the process of FIG. 6 includes step 610 and step 620. These steps are described below.
  • step 610 1) after the compressed data is loaded into the first register, unconditionally load the compressed data in the high R bit storage space of the first register into the low R bit storage space of the second register; 2) load the external memory The next R-bit compressed data in is loaded into the upper R-bit storage space of the second register.
  • step 620 when the initial storage location of the current compression unit is located in the high R-bit storage space of the first register, the 2R-bit compressed data in the second register is loaded into the first register.
  • Step 610 and step 620 are executed cyclically until the decompression is completed.
  • control module 210 may directly load it into the first register; or firstly load the 2R-bit compressed data into the second register , And then load the 2R bit compressed data into the first register from the second register.
  • step 610 there may be no restriction on the sequence between 1) and 2).
  • step 610 if there is no compressed data of the next R bit in the external register at the current moment, step 2) in step 610 is not executed.
  • control module 210 may be configured to execute the process shown in FIG. 7 to control the first register and the second register to buffer compressed data in the above-mentioned manner.
  • the flow of FIG. 7 includes step 710 and step 780. These steps are described below.
  • the decompression instruction may indicate the number of compression units to be decompressed in the block of data.
  • the decompression instruction may also indicate how many 128-bit compressed data are in this piece of data.
  • step 720 Determine whether there is 128-bit compressed data in the external memory? If yes, go to step 730, if not, go to step 740.
  • step 760 Determine whether the decompression is completed, if so, end the decompression, if not, go to step 770.
  • the decompression module 230 completes decompression reaches the number of compression units that need to be decompressed, it is determined that the decompression is completed, otherwise, the decompression is not completed.
  • step 770 Determine whether the initial storage location of the current compression unit is in the upper 128-bit storage space of the first register, if yes, go to step 780, if not, go to step 750.
  • the condition for updating the compressed data in the first register is that the initial storage location of the current compression unit is moved to the high R-bit storage space of the first register.
  • the initial storage location of the current compression unit has not moved to the high R bit storage space of the first register, it is determined that there is no need to update the compressed data in the first register; when the initial storage location of the current compression unit has moved to the high R bit of the first register
  • bit storage space it is determined that the compressed data in the first register needs to be updated.
  • the initial storage location of the current compression unit is a generated register signal. Therefore, the above judgment logic is relatively simple, which helps to increase the operating clock frequency that the data decompression device can run.
  • the condition for determining that the compressed data in the first register needs to be updated is to determine whether the initial storage location of the next compression unit is no longer in the first register. This determination logic is more complicated and may It will limit the operating clock frequency that the data decompression device can run.
  • the initial storage location of the current compression unit is moved to the high R-bit storage space of the first register as a condition for determining the need to update the compressed data in the first register, which helps to improve data decompression. effectiveness.
  • the cache module 220 may also use other feasible ping-pong mechanisms to cache the compressed data.
  • the cache module 220 may include three or more registers, so as to implement a ping-pong mechanism to cache compressed data.
  • decompression module 230 The implementation of the decompression module 230 will be described below.
  • the process of obtaining the original data corresponding to the compression unit by the decompression module 230 may include the following steps 1 to 5.
  • Step 1 obtain the initial storage location of the compression unit.
  • Step 2 Acquire the compression information of the compression unit in the cache module 230 according to the initial storage location.
  • the first 8-bit information from the initial storage position is the compression information of the compression unit.
  • Step 3 obtain the storage location of the non-zero data of the compression unit in the cache module 230 according to the initial storage location and the compression information of the compression unit.
  • the compression information of the compression unit indicates that there are 4 non-zero data in the original data corresponding to the compression unit
  • the storage space of the first register start from the initial storage location
  • the second, third, fourth, and fifth 8-bit data of are respectively 4 non-zero data in the original data corresponding to the compression unit.
  • Step 4 according to the storage location, obtain the non-zero data of the compression unit in the cache module 230.
  • Step 5 according to the non-zero data of the compression unit and the compression information of the compression unit, restore the original data corresponding to the compression unit.
  • the original data corresponding to the compression unit is "D, C, 0, 0, 0, B, 0, A” .
  • the circuit structure of the decompression module 230 adopts a pipeline structure, and the decompression module 230 is configured to adopt a pipeline processing mode to decompress and obtain the original data corresponding to the current compression unit.
  • pipeline processing mode for data decompression can effectively increase the operating clock frequency of the data decompression device.
  • the decompression module 230 has a five-stage pipeline structure.
  • the decompression module 230 has 4 input ports A1, A2, A3, and A4, and 1 output port B.
  • Input port A1 is used to receive the compression information of the current compression unit
  • input port A2 is used to receive the initial storage location of the current compression unit
  • input port A3 is used to receive the compressed data stored in the first register at the current moment
  • input port A4 is used to receive The compressed data stored in the second register at the current moment.
  • the output port B is used to output the original data corresponding to the current compression unit.
  • the decompression module 230 has the following five-stage pipeline structure.
  • the first-stage pipeline is used to obtain the initial storage location of the current compression unit, the first compressed data, and the second compressed data.
  • the first compressed data is the compressed data stored in the first register at the current moment.
  • the second compressed data is the compressed data stored in the second register at the current moment, and is also used to obtain the compression information of the current compression unit according to the initial storage location, and is also used to send the obtained information to the next stage in a pipeline.
  • the first stage pipeline register includes the following registers.
  • the register cur_info is used to store the compression information of the previous compression unit and send it to the corresponding register of the next stage of pipeline.
  • the register cur_ptr is used to obtain the initial storage location (in Byte) of the previous compression unit and send it to the corresponding register of the next stage of pipeline.
  • the register nxt_unit is used to obtain the compressed data stored in the second register at the current moment and send it to the corresponding register of the next stage of pipeline.
  • the register cur_info_1dly is used to register the compression information of the compression unit sent by the register cur_info, and send it to the register bytex_0_flag and the register bytex_ptr of the next stage of pipeline.
  • the register cur_ptr_1dly is used to register the initial storage location of the compression unit sent by the register cur_ptr and send it to the corresponding register of the next stage of pipeline.
  • the register nxt_unit_1dly is used to register the compressed data sent by the register nxt_unit and send it to the corresponding register of the next stage of pipeline.
  • the third stage pipeline register includes the following registers.
  • the register cur_unit_2dly is used to register the compressed data sent by the register cur_unit_1dly and send it to the corresponding register of the next stage of pipeline.
  • the register nxt_unit_2dly is used to register the compressed data sent by the register nxt_unit_1dly and send it to the corresponding register of the next stage of pipeline.
  • the fourth-stage pipeline is used to separately compare the first compressed data received from the third-stage pipeline and the storage location of the non-zero data of the current compression unit received from the third-stage pipeline.
  • the first data and the second data obtained from the second compressed data are also used to generate indication information of each non-zero data according to the storage location of the non-zero data of the current compression unit, and the indication information is used to indicate the Each non-zero data is located in the first register or the second register, and is also used to send the following information to the next stage pipeline: the storage location of the first data, the second data, and each non-zero data And indication information, compression information of the current compression unit.
  • the fourth stage pipeline register includes the following registers.
  • the register bytex_0_flag_1dly is used to register the flag of x data in the compression unit sent by the register bytex_0_flag and send it to the corresponding register of the next stage of pipeline.
  • the register bytex_cur is used to obtain the first data from the compressed data sent by the register cur_unit_2dly according to the storage location sent by the register bytex_ptr (it should be understood that the most significant bit should be ignored), and send it to the relevant register of the next stage of pipeline.
  • the register bytex_nxt is used to obtain the second data from the compressed data sent by the register nxt_unit_2dly according to the storage location sent by the register bytex_ptr (it should be understood that the most significant bit should be ignored), and send it to the relevant register of the next stage of pipeline.
  • the fifth stage pipeline is used to obtain the original data corresponding to the current compression unit according to the information received from the fourth stage pipeline.
  • the decompression module 230 can also be designed as a two-stage, three-stage or four-stage pipeline structure.
  • the decompression module 230 can be designed as a four-stage pipeline structure.
  • combining any adjacent three streams in the five-stage pipeline structure in this embodiment into one pipeline can realize the design of the decompression module 230 as a three-stage pipeline structure.
  • any adjacent four pipelines in the five-stage pipeline structure in this embodiment are combined into one pipeline processing unit, that is, the decompression module 230 can be designed as a two-stage pipeline structure.
  • an embodiment of the present invention also provides a method 900 for data decompression, and the method 900 may be executed by the control module 210 in the foregoing embodiment.
  • the method 900 includes step 910 and step 920.
  • the cache module may be the cache module 220 described in any of the above embodiments.
  • the interpretation of the compression unit and compressed data is as described above.
  • the initial storage location of the current compression unit in the cache module may be obtained from the location determination module.
  • the decompression module may be the decompression module 230 described in any of the above embodiments.
  • the data decompression solution provided by this application does not have the synchronization problem between instructions and data, can reduce time overhead, help realize real-time decompression, and ensure the continuity of data flow.
  • the data decompression solution provided by the present application can also reduce the bandwidth overhead between the decompression device and the external memory, and can also reduce the instruction cache overhead within the decompression device.
  • the cache module includes at least two registers; the method 900 further includes: controlling the at least two registers to use a ping-pong mechanism to cache the compressed data.
  • the cache module has a structure as shown in FIG. 5, and the method 900 may further include the steps as shown in FIG. 6.
  • This application can be applied to scenarios that require data compression-decompression.
  • the data decompression scheme of this application can be applied to convolutional neural networks (Convolution Neural Networks, CNN) and Recurrent Neural Networks (RNN) hardware accelerators.
  • the application method is IP core and IP core. Cooperative working circuit.
  • an embodiment of the present application further provides a data processing device 1000.
  • the data processing device 1000 includes a control module 1010, a cache module 1020, and a processing module 1030.
  • the buffer module 1020 includes a first register and a second register.
  • the storage space of the first register and the second register is divided into a low R-bit storage space and a high R-bit storage space, and R is a positive multiple of 8 .
  • the control module 1010 is used to load the data to be processed into the buffer module 1020, and is also used to send the processing module 1030 the initial storage location of the current data in the buffer module 1020, wherein the initial storage location of the data is from the first The low R bit storage space of a register starts to be acquired.
  • the control module 1010 is further configured to control the first register and the second register to buffer the data to be processed in the following ways: 1), after the data to be processed is loaded into the first register, the first register Load the data in the high R bit storage space of the second register into the low R bit storage space of the second register, and load the next R bit to-be-processed data into the high R bit storage space of the second register; 2), When the initial storage location of the current data is in the high R-bit storage space of the first register, load the 2R-bit data in the second register into the first register, and repeat steps 1) and Step 2), until the end of data processing.
  • the solution of this embodiment moves the initial storage location of the current data to the high R-bit storage space of the first register as a condition for determining the need to update the data in the first register, which helps to improve the efficiency of data processing.
  • the embodiment of the present invention also provides a control method of the data cache module.
  • the data cache module may be the cache module 220 or the cache module 1020 described in the above embodiment in conjunction with FIG. 5.
  • the control method can be executed by the control module 210 or the control module 1010 described in the above embodiment.
  • the control method may include the processing flow shown in FIG. 6 or FIG. 7 above, and in order to avoid repetition, it will not be described in detail here.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)), etc.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

提供一种数据解压缩的装置与方法,该装置包括控制模块、缓存模块与解压缩模块。控制模块用于,将包括压缩单元的压缩数据载入缓存模块,压缩单元包括压缩信息与非零数据,压缩信息表示压缩单元对应的原始数据中非零数据的位置;控制模块还用于,向解压缩模块发送当前压缩单元在缓存模块中的初始存储位置;解压缩模块用于,根据初始存储位置,在缓存模块中读取当前压缩单元的压缩信息与非零数据,并据此获取当前压缩单元对应的原始数据。不存在压缩数据与压缩指令之间的同步问题,可以在一定程度上减小数据解压缩的时间开销,从而可以提高数据解压缩的效率。

Description

数据解压缩的装置与方法
版权申明
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。
技术领域
本申请涉及数据处理领域,并且更为具体地,涉及一种数据解压缩的装置与方法。
背景技术
在数据处理过程中,通常待处理的数据要从存储器(Memory)中读取,送入处理单元进行处理。例如,在神经网络的计算过程中,参与计算的数据量巨大,主要是特征图数据和权重数据,这些数据都要从存储器中读取,送入处理单元(例如,计算单元(Calculate Unit))中参与计算。可以理解到,在数据带宽(Bandwidth)有限的情况下,如果处理单元在单位时间内获得的数据越多,处理单元的利用率越高,直到处理单元满负载工作。
为了提高数据处理效率,数据压缩技术被提出来。可以理解到,如果将数据进行有效压缩,那么一块数据在存储器中的存储空间将会变小,相应地,只需要更小的数据带宽、更短的时间就可以把这块数据装载到处理单元中,这样可以提高数据处理效率。
从Memory中读取压缩数据并送入处理单元后,需要先解压缩,然后对解压缩之后的数据进行处理。
现有技术的数据压缩与数据解压缩方案通常为,分别从存储器读取压缩数据与压缩指令,然后将压缩数据与压缩指令传输到解压缩装置中进行数据解压。在现有的数据解压缩过程中,存在压缩数据与压缩指令之间的同步问题,如果二者的传输不同步,会产生相互等待从存储器中读取回来的时间开销,这会增大解压缩的时间开销。如果解压缩的装置性能不良,例如,时间开销较大,会降低数据压缩带来的效益。
发明内容
本申请提供一种数据解压缩的装置与方法,相比于现有技术,可以在一定程度上减小数据解压缩的时间开销,从而可以提高数据解压缩的效率。
第一方面,提供一种数据解压缩的装置,所述数据解压缩的装置包括控制模块、缓存模块与解压缩模块。所述控制模块用于,将包括压缩单元的压缩数据载入所述缓存模块,所述压缩单元包括压缩信息与非零数据,所述压缩信息表示所述压缩单元对应的原始数据中非零数据的位置;所述控制模块还用于,向所述解压缩模块发送当前压缩单元在所述缓存模块中的初始存储位置;所述解压缩模块用于,根据所述初始存储位置,在所述缓存模块中读取所述当前压缩单元的压缩信息与非零数据,并据此获取所述当前压缩单元对应的原始数据。
本申请中涉及的压缩数据由压缩单元组成,压缩单元包括非零数据与压缩信息。针对一个压缩单元,只要解压缩模块获取到该压缩单元的非零数据与压缩信息,就可以获得这个压缩单元对应的原始数据,即可以实现解压缩。在本申请中,一个压缩单元的非零数据与压缩信息都是以一个整体,即压缩单元来存储、传输和解压缩的,因此,本申请提供的数据解压缩方案,不存在压缩数据与压缩指令之间的同步问题,自然不存在由于这种同步问题引起的时间开销。因此,相比于现有技术,本申请提供的数据解压缩方案可以有效减少数据解压缩的时间开销,从而可以提高数据处理效率。
第二方面,提供一种数据处理的装置,所述装置包括:缓存模块,包括第一寄存器与第二寄存器,所述第一寄存器与所述第二寄存器的存储空间均划分为低R比特存储空间与高R比特存储空间,R为8的倍数的正数;控制模块,用于将待处理数据载入所述缓存模块,还用于向处理模块发送当前数据在所述缓存模块的初始存储位置,其中,数据的初始存储位置从所述第一寄存器的低R比特存储空间开始被获取;所述处理模块,用于根据所述当前数据的初始存储位置,从所述缓存模块读取所述当前数据并处理;所述控制模块还用于通过如下方式控制所述第一寄存器与所述第二寄存器缓存待处理数据:1),在待处理数据载入所述第一寄存器之后,将所述第一寄存器的高R比特存储空间中的数据载入所述第二寄存器的低R比特存储空间中,将下一个R比特的待处理数据载入所述第二寄存器的高R比特存储空间;2),当所述当前数据的初始存储位置位于所述第一寄存器的高R比特存储空间时,将所述第二寄存器中的2R比特的数据载入所述第一寄存器中,循环执 行步骤1)与步骤2),直至数据处理结束。
本申请采用乒乓机制缓存压缩数据,可以保证缓存模块提供待解压缩数据的连续性。
此外,在本申请中,更新第一寄存器中的压缩数据的条件是,当前压缩单元的初始存储位置移动到第一寄存器的高R比特存储空间。当当前压缩单元的初始存储位置未移动到第一寄存器的高R比特存储空间时,确定不需要更新第一寄存器中的压缩数据;当当前压缩单元的初始存储位置移动到第一寄存器的高R比特存储空间时,确定需要更新第一寄存器中的压缩数据。当前压缩单元的初始存储位置是一个已经产生好的寄存器信号,因此,上述判断逻辑较为简单,这样有助于提高数据解压缩装置可以运行到的工作时钟频率。
第三方面,提供一种数据解压缩的方法,所述方法包括:将包括压缩单元的压缩数据载入缓存模块,所述压缩单元包括压缩信息与非零数据,所述压缩信息表示所述压缩单元对应的原始数据中非零数据的位置;向解压缩模块发送当前压缩单元在所述缓存模块中的初始存储位置,以使所述解压缩模块根据所述初始存储位置,读取所述当前压缩单元的压缩信息与非零数据,并据此获取所述当前压缩单元对应的原始数据。
第四方面,提供一种数据缓存模块的控制方法。所述数据缓存模块包括第一寄存器与第二寄存器,所述第一寄存器与所述第二寄存器的存储空间均划分为低R比特存储空间与高R比特存储空间,R为8的倍数的正数。所述控制方法包括:将待处理数据载入所述缓存模块;向处理模块发送当前数据在所述缓存模块的初始存储位置,以使所述处理模块根据所述当前数据的初始存储位置,从所述缓存模块读取所述当前数据并处理,其中,数据的初始存储位置从所述第一寄存器的低R比特存储空间开始被获取;通过如下方式控制所述第一寄存器与所述第二寄存器缓存待处理数据:1),在待处理数据载入所述第一寄存器之后,将所述第一寄存器的高R比特存储空间中的数据载入所述第二寄存器的低R比特存储空间中,将下一个R比特的待处理数据载入所述第二寄存器的高R比特存储空间;2),当所述当前数据的初始存储位置位于所述第一寄存器的高R比特存储空间时,将所述第二寄存器中的2R比特的数据载入所述第一寄存器中,循环执行步骤1)与步骤2),直至数据处理结束。
因此,本申请提供的数据解压缩的方案,不存在指令与数据之间的同步问题,可以减少时间开销,有助于实现实时解压缩,保证数据流的连续性。此外,本申请提供的数据解压缩的方案,还可以减小解压缩装置与外部存储器之间的带宽开销,也可以减少解压缩装置内部的指令缓存开销。
附图说明
图1是数据解压缩的架构示意图。
图2是根据本申请实施例的数据解压缩的装置的示意性框图。
图3是根据本申请实施例的压缩单元的示意图。
图4是根据本申请实施例的压缩单元与压缩数据的示意图。
图5是根据本申请实施例的缓存模块的示意性框图。
图6是根据本申请实施例的缓存模块的控制方法的示意性流程图。
图7是根据本申请实施例的缓存模块的控制方法的另一示意性流程图。
图8是根据本申请实施例的具有流水结构的解压缩模块的示意性框图。
图9是根据本申请实施例的数据解压缩的方法的示意性框图。
图10是根据本申请另一实施例的数据处理装置的示意性框图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。
本申请实施例可以应用于数据解压缩的场景。
图1为数据解压缩的架构示意图。从存储器(Memory)中读取数据,送入处理模块进行处理。其中,存储器中存储的数据为压缩数据(即经过数据压缩处理后的数据)。处理模块中包括解压缩单元与处理单元(例如计算单元)。解压缩单元对从存储器读取到的压缩数据进行解压缩,获得压缩数据对应的原始数据,处理单元对该原始数据进行相关处理。
应理解,相对于原始数据(即未进行数据压缩处理的数据),压缩数据在存储器中所需的存储空间较小,从存储器传输到处理莫模块所需的数据带宽(Bandwidth)也较小,即可以通过较短的时间把数据装载到处理模块中, 这样可以提高数据处理效率。
如前文描述,在现有技术中,分别从存储器读取压缩数据与压缩指令,然后利用压缩指令与压缩数据进行数据解压缩,获得原始数据。该现有技术存在压缩数据与压缩指令之间的同步问题,倘若二者的传输不同步,就需要相互等待,会增加数据解压缩的时间开销,这种时间开销可能会降低上述数据压缩带来的提高数据处理效率的效益。
针对上述问题,本申请提出一种数据解压缩的方案,可以在一定程度上降低数据解压缩的时间开销,提高数据解压缩的效率,从而可以提高数据处理效率。
图2为本申请实施例提供的数据解压缩的装置200的示意性框图。该装置200包括控制模块210、缓存模块220和解压缩模块230。
控制模块210用于,控制外部存储器中的压缩数据载入缓存模块220。如图2所示,外部存储器表示数据解压缩的装置200之外的存储模块。
控制模块210可以是控制电路。
控制模块210控制压缩数据载入缓存模块220的方式可以是:控制模块210读取外部存储器中的压缩数据,将其写入缓存模块220。
压缩数据表示原始数据经过数据压缩处理之后得到的数据。
在本申请实施例中,压缩数据由压缩单元组成。每个压缩单元由非零数据与压缩信息组成,如图3所示。其中,非零数据表示该压缩单元对应的原始数据中取值不为零的数据;压缩信息为用于表示该压缩单元对应的原始数据中非零数据的位置的信息(也相当于,压缩信息为用于表示该压缩单元对应的原始数据中取值不为零的数据的位置)。
每个压缩单元对应的原始数据包括M个数据,相当于,在原始数据中,每M个数据作为一个压缩单元进行压缩。M为正整数,例如,M等于8。
原始数据的数据精度为N比特,相当于,在原始数据中每个数据都是N比特精度。N为正整数,例如,N为8比特(1byte)。
作为示例,假设每个压缩单元对应的原始数据包括8个数据,数据精度为8比特。例如,一个压缩单元对应的原始数据为“D,C,0,0,0,B,0,A”,对该原始数据进行数据压缩之后得到的压缩数据为“DCBA_11000101”,即该压缩单元为“DCBA_11000101”。其中,“DCBA”为该原始数据中的非零数据,“11000101”为8比特的压缩信息,用于表示原始数据“D,C,0,0,0,B,0,A”中非零 数据“DCBA”的位置,也相当于表示原始数据“D,C,0,0,0,B,0,A”中取值为零的数据的位置。
应理解,如果压缩单元对应的原始数据为8字节,则压缩单元的大小范围为1字节至9字节。例如,如果一个压缩单元对应的原始数据中8个数据均为0,则该压缩单元的大小为1字节。再例如,在上述“DCBA_11000101”的例子中,压缩单元的大小为5字节。再例如,如果一个压缩单元对应的原始数据中的8个数据均为非零数据,则压缩单元的大小为9字节。
图4为本申请实施例中涉及的对原始数据进行压缩获得压缩数据的示意图。一块连续的原始数据,划分成多组压缩单元对应的原始数据,对这多组原始数据分别压缩,连续排布之后,形成一块连续的压缩数据,即包括多个压缩单元的压缩数据。
控制模块210还用于,向解压缩模块230发送当前压缩单元在缓存模块220中的初始存储位置。
本文中描述的当前压缩单元表示当前待解压缩的压缩单元。
可选地,由控制模块210确定当前压缩单元在缓存模块220中的初始存储位置。
可选地,该装置200还包括位置确定模块(图2中未示出),该位置确定模块用于确定当前压缩单元在缓存模块220中的初始存储位置。控制模块210用于从位置确定模块获取当前压缩单元的初始存储位置。
解压缩模块230用于,根据初始存储位置,从缓存模块220中读取当前压缩单元的压缩信息与非零数据,并根据所获取的压缩信息与非零数据,解压缩获得当前压缩单元对应的原始数据。
应理解,控制模块210按照压缩单元依次向解压缩模块230发送初始存储位置,因此,解压缩模块230按照压缩单元为粒度依次对第一寄存器中存储的压缩数据进行解压缩。
可以理解到,解压缩模块230获得当前压缩单元对应的原始数据的过程为图4所示的压缩过程的逆过程。
解压缩模块230解压缩获得的原始数据可以送入处理单元,例如计算单元中进行数据运算。
本申请中涉及的压缩数据由压缩单元组成,压缩单元包括非零数据与压缩信息。针对一个压缩单元,只要解压缩模块获取到该压缩单元的非零数据 与压缩信息,就可以获得这个压缩单元对应的原始数据,即可以实现解压缩。在本申请中,一个压缩单元的非零数据与压缩信息都是以一个整体,即压缩单元来存储、传输和解压缩的,因此,本申请提供的数据解压缩方案,不存在压缩数据与压缩指令之间的同步问题,自然不存在由于这种同步问题引起的时间开销。因此,相比于现有技术,本申请提供的数据解压缩方案可以有效减少数据解压缩的时间开销,从而可以提高数据处理效率。
如果将压缩单元中的压缩信息视为压缩指令,可以认为,在本申请实施例中,压缩指令与压缩后的有效数据连续存储、连续传输、依次解压缩,因此,不存在指令与数据之间的同步问题,可以减少时间开销,有助于实现实时解压缩,保证数据流的连续性。
此外,可以理解到,在本申请实施例中,可以只通过一条指令来实现数据解压缩。例如,本申请实施例的数据解压缩的装置200接收一条解压缩指令,该指令指示解压缩一块数据,该指令还指示该数据中包括多少个压缩单元,然后装置200按照上述实施例描述的方式进行解压缩,直至完成所有压缩单元的解压缩。从这个角度来说,本申请实施例不需要大量的指令来协助解压缩,可以减小解压缩装置与外部存储器之间的带宽开销,也可以减少解压缩装置内部的指令缓存开销。
因此,本申请提供的数据解压缩的方案,不存在指令与数据之间的同步问题,可以减少时间开销,有助于实现实时解压缩,保证数据流的连续性。此外,本申请提供的数据解压缩的方案,还可以减小解压缩装置与外部存储器之间的带宽开销,也可以减少解压缩装置内部的指令缓存开销。
可选地,在一些实施例中,缓存模块220包括至少两个寄存器;控制模块230用于,控制至少两个寄存器采用乒乓机制缓存压缩数据。
缓存模块220中包括的每个寄存器的存储空间都相同。例如,缓存模块220中包括的每个寄存器的存储空间为2R比特,R为8的倍数的正数。例如,R为128。
以缓存模块220包括第一寄存器与第二寄存器为例,第一寄存器用于为解压缩模块提供压缩数据,换言之,控制模块230用于在第一寄存器上确定当前压缩单元的初始存储位置;第二寄存器用于从外部存储器载入压缩数据,换言之,控制模块230用于控制外部存储器中的压缩数据载入第二寄存器。在需要更新第一寄存器中的压缩数据时,控制模块230用于控制第二寄存器 中的压缩数据载入第一寄存器。上述这种缓存数据的方式可以称为乒乓机制。
应理解,缓存模块220采用乒乓机制缓存压缩数据,可以保证缓存模块提供待解压缩数据的连续性,从而提高数据解压缩的效率。
可选地,作为第一种实现方式,缓存模块220包括第一寄存器与第二寄存器,第一寄存器与第二寄存器的存储空间大小均为2R比特。当前压缩单元的初始存储位置从第一寄存器的存储空间开始被获取。控制模块230用于将外部存储器中的2R比特的压缩数据载入第二寄存器,在当前时刻,当确定下一个压缩单元的初始存储位置已经不在第一寄存器时,将第二寄存器中的2R比特的压缩数据载入第一寄存器中。
针对待解压缩的数据中的首个2R比特的压缩数据,可以直接载入第一寄存器,也可以先载入第二寄存器,再由第二寄存器载入第一寄存器。
可选地,作为第二种实现方式,如图5所示,缓存模块220包括第一寄存器与第二寄存器,第一寄存器与第二寄存器的存储空间为2R比特,R为8的倍数的正数。例如,R为128。第一寄存器与第二寄存器的存储空间均划分为低R比特存储空间与高R比特存储空间。当前压缩单元的初始存储位置从第一寄存器的低R比特存储空间开始被获取。
压缩数据在第一寄存器与第二寄存器中的缓存与更新方式为:在压缩数据载入第一寄存器之后,第一寄存器的高R比特存储空间中的压缩数据无条件被载入第二寄存器的低R比特存储空间中;对第一寄存器中存储的压缩数据进行解压缩时,按照压缩单元依次进行,如果当前压缩单元的初始存储位置移动到第一寄存器的高R比特存储空间中时,则第二寄存器中2R比特的压缩数据被装载入第一寄存器。应理解,这个过程相当于,将第一寄存器的高R比特存储空间中的压缩数据移动到第一寄存器的低R比特存储空间,同时把新的R比特的压缩数据从第二寄存器的高R比特存储空间移动到第一寄存器的高R比特存储空间。
控制模块210可以被配置成执行图6所示的流程,以控制第一寄存器与第二寄存器以上述方式缓存压缩数据。图6的流程包括步骤610与步骤620。下面对这些步骤进行描述。
在步骤610中,1)在压缩数据载入第一寄存器之后,无条件将第一寄存器的高R比特存储空间中的压缩数据载入第二寄存器的低R比特存储空间中;2)将外部存储器中的下一个R比特的压缩数据载入第二寄存器的高 R比特存储空间。
在步骤620中,当当前压缩单元的初始存储位置位于第一寄存器的高R比特存储空间时,将第二寄存器中的2R比特的压缩数据载入第一寄存器中。
循环执行步骤610与步骤620,直至解压缩结束。
需要说明的是,对于待解压缩的压缩数据中的首个2R比特的压缩数据,控制模块210可以将其直接载入第一寄存器;也可以先将该2R比特的压缩数据载入第二寄存器,再由第二寄存器将该2R比特的压缩数据载入第一寄存器。
在步骤610中,1)与2)之间可以没有先后顺序的限制。
应理解,在步骤610中,如果当前时刻外部寄存器中没有下一个R比特的压缩数据了,则不执行步骤610中的2)。
作为示例,如图7所示,控制模块210可以被配置成执行图7所示的流程,以控制第一寄存器与第二寄存器以上述方式缓存压缩数据。图7的流程包括步骤710与步骤780。下面对这些步骤进行描述。
接收用于指示解压缩一块数据的解压缩指令。
例如,该解压缩指令可以指示该块数据中需要解压缩的压缩单元的数量。再例如,该解压缩指令还可以指示这块数据中有多少个128比特的压缩数据。
710,将256比特的压缩数据载入第一寄存器。
将外部存储器中待解压缩的数据中的首个256比特的压缩数据载入第一寄存器,或者,将该首个256比特的压缩数据载入第二寄存器,再由第二寄存器将该首个256比特的压缩数据载入第一寄存器。
720,判断外部存储器中是否还有128比特的压缩数据?若是,转到步骤730,若否,转到步骤740。
730,将外部存储器中的下一个128比特的压缩数据载入第二寄存器的高128比特存储空间。
740,将第一寄存器的高128比特存储空间中的压缩数据载入第二寄存器的低128比特存储空间中。
750,确定待解压缩的压缩单元在第一寄存器中的初始存储位置,指示解压缩模块230对该压缩单元进行解压缩。
指示解压缩模块对该压缩单元进行解压缩,表示控制模块210向解压缩模块230发送当前压缩单元的初始存储位置,以便于解压缩模块230根据该 初始存储位置从第一寄存器读取当前压缩单元,进而对该压缩单元进行解压缩,恢复得到该压缩单元对应的原始数据。
760,判断是否解压结束,若是,结束解压缩,若否,转到步骤770。
例如,当解压缩模块230完成解压缩的压缩单元的数量达到需要解压缩的压缩单元的数量时,确定解压结束,反之,解压未结束。
770,判断当前压缩单元的初始存储位置是否在第一寄存器的高128比特存储空间,若是,转到步骤780,若否,转到步骤750。
780,将第二寄存器中的256比特的压缩数据载入第一寄存器中。
应理解,在上述第二种实现方式中,更新第一寄存器中的压缩数据的条件是,当前压缩单元的初始存储位置移动到第一寄存器的高R比特存储空间。当当前压缩单元的初始存储位置未移动到第一寄存器的高R比特存储空间时,确定不需要更新第一寄存器中的压缩数据;当当前压缩单元的初始存储位置移动到第一寄存器的高R比特存储空间时,确定需要更新第一寄存器中的压缩数据。当前压缩单元的初始存储位置是一个已经产生好的寄存器信号,因此,上述判断逻辑较为简单,这样有助于提高数据解压缩装置可以运行到的工作时钟频率。
在前文描述的第一种实现方式中,确定需要更新第一寄存器中的压缩数据的条件是,判断下一个压缩单元的初始存储位置是否已经不在第一寄存器中了,这个判断逻辑较为复杂,可能会限制数据解压缩装置可以运行到的工作时钟频率。
因此,本实施例的方案,将当前压缩单元的初始存储位置移动到第一寄存器的高R比特存储空间,作为确定需要更新第一寄存器中的压缩数据的条件,有助于提高数据解压缩的效率。
应理解,除了上文描述的实现方式一与实现方式二,缓存模块220还可以采用其它可行的乒乓机制缓存压缩数据。例如,缓存模块220可以包括三个或三个以上的寄存器,以实现采用乒乓机制缓存压缩数据。
下文将描述解压缩模块230的实现方式。
解压缩模块230获取压缩单元对应的原始数据的过程可以包括如下步骤①至步骤⑤。
步骤①,获取压缩单元的初始存储位置。
可以根据控制模块210下发的控制信号,获取压缩单元的初始存储位置。
步骤②,根据该初始存储位置,在缓存模块230中获取该压缩单元的压缩信息。
假设,数据精度为8比特,则在第一寄存器的存储空间中,从该初始存储位置起始的第一个8比特的信息为该压缩单元的压缩信息。
步骤③,根据该初始存储位置以及该压缩单元的压缩信息,获取该压缩单元的非零数据在缓存模块230中的存储位置。
还以数据精度为8比特为例,假设该压缩单元的压缩信息指示该压缩单元对应的原始数据中有4个非零数据,则在第一寄存器的存储空间中,从该初始存储位置起始的第2个、第3个、第4个和第5个8比特的数据分别为该压缩单元对应的原始数据中的4个非零数据。
步骤④,根据该存储位置,在缓存模块230中获取该压缩单元的非零数据。
步骤⑤,根据该压缩单元的非零数据与该压缩单元的压缩信息,恢复出该压缩单元对应的原始数据。
例如,假设该压缩单元的压缩信息为“11000101”,该压缩单元的非零数据为DCBA,则该压缩单元对应的原始数据为“D,C,0,0,0,B,0,A”。
可选地,在一些实施例中,解压缩模块230的电路结构采用流水结构,解压缩模块230用于,采用流水线处理模式,解压缩获得当前压缩单元对应的原始数据。
解压缩模块230可以由多个寄存器构成,如图8所示,图8中所示的每个方框均可表示一个寄存器。
应理解,采用流水线处理模式进行数据解压缩,可以有效提高数据解压缩装置的工作时钟频率。
可选地,在一些实施例中,如图8所示,解压缩模块230具有五级流水结构。解压缩模块230具有4个输入端口A1、A2、A3和A4,1个输出端口B。输入端口A1用于接收当前压缩单元的压缩信息,输入端口A2用于接收当前压缩单元的初始存储位置,输入端口A3用于接收当前时刻第一寄存器中存储的压缩数据,输入端口A4用于接收当前时刻第二寄存器中存储的压缩数据。输出端口B用于输出当前压缩单元对应的原始数据。解压缩模块230具有如下五级流水结构。
第一级流水,用于获取所述当前压缩单元的初始存储位置、第一压缩数 据与第二压缩数据,所述第一压缩数据为当前时刻所述第一寄存器存储的压缩数据,所述第二压缩数据为当前时刻所述第二寄存器存储的压缩数据,还用于根据所述初始存储位置获取所述当前压缩单元的压缩信息,还用于向下一级流水发送所获取的信息。
例如,如图8所示,第一级流水寄存器包括如下寄存器。
寄存器cur_info,用于存储前压缩单元的压缩信息,并将其发送至下一级流水的相应寄存器。
寄存器cur_ptr,用于获取前压缩单元的初始存储位置(以字节(Byte)为单位),并将其发送至下一级流水的相应寄存器。
寄存器cur_unit,用于获取当前时刻第一寄存器存储的压缩数据,并将其发送至下一级流水的相应寄存器。
寄存器nxt_unit,用于获取当前时刻第二寄存器存储的压缩数据,并将其发送至下一级流水的相应寄存器。
第二级流水,用于向下一级流水发送从所述第一级流水接收的信息。
例如,如图8所示,第二级流水寄存器包括如下寄存器。
寄存器cur_info_1dly,用于寄存寄存器cur_info发送的压缩单元的压缩信息,并将其发送至下一级流水的寄存器bytex_0_flag与寄存器bytex_ptr。
寄存器cur_ptr_1dly,用于寄存寄存器cur_ptr发送的压缩单元的初始存储位置,并将其发送至下一级流水的相应寄存器。
寄存器cur_unit_1dly,用于寄存寄存器cur_unit发送的压缩数据,并将其发送至下一级流水的相应寄存器。
寄存器nxt_unit_1dly,用于寄存寄存器nxt_unit发送的压缩数据,并将其发送至下一级流水的相应寄存器。
第三级流水,用于根据从所述第二级流水接收的所述当前压缩单元的压缩信息与初始存储位置,获取所述当前压缩单元的非零数据的存储位置,还用于向下一级流水发送如下信息:所述当前压缩单元的非零数据的存储位置、从所述第二级流水接收的所述第一压缩数据、所述第二压缩数据、所述当前压缩单元的压缩信息。
例如,如图8所示,第三级流水寄存器包括如下寄存器。
寄存器bytex_0_flag,用于根据寄存器cur_info_1dly发送的压缩单元的压缩信息,生成压缩单元中x个数据的标记(flag),并且这x个数据的flag 发送至下一级流水的相应寄存器。其中,x表示一个压缩单元包括的数据的个数,x的取值可以根据寄存器cur_info_1dly发送的压缩单元的压缩信息获得,例如,一个压缩单元的压缩信息为“11000101”,则x的取值为8。一个数据的flag表示这个数据是“0”或“1”。
寄存器bytex_ptr,用于根据寄存器cur_ptr_1dly发送的压缩单元的初始存储位置以及寄存器cur_info_1dly发送的压缩单元的压缩信息,获取压缩单元的非零数据在缓存模块中的存储位置,并将该存储位置发送至下一级流水的寄存器bytex_ptr_msb_1dly、寄存器bytex_cur与寄存器bytex_nxt。
寄存器cur_unit_2dly,用于寄存寄存器cur_unit_1dly发送的压缩数据,并将其发送至下一级流水的相应寄存器。
寄存器nxt_unit_2dly,用于寄存寄存器nxt_unit_1dly发送的压缩数据,并将其发送至下一级流水的相应寄存器。
第四级流水,用于根据从所述第三级流水接收的所述当前压缩单元的非零数据的存储位置,分别在从所述第三级流水接收的所述第一压缩数据与所述第二压缩数据中获取第一数据与第二数据,还用于根据所述当前压缩单元的非零数据的存储位置,生成每个非零数据的指示信息,所述指示信息用于指示所述每个非零数据位于所述第一寄存器或所述第二寄存器,还用于向下一级流水发送如下信息:所述第一数据、所述第二数据、每个非零数据的存储位置与指示信息、所述当前压缩单元的压缩信息。
例如,如图8所示,第四级流水寄存器包括如下寄存器。
寄存器bytex_0_flag_1dly,用于寄存寄存器bytex_0_flag发送的压缩单元中x个数据的flag,并将其发送至下一级流水的相应寄存器。
寄存器bytex_ptr_msb_1dly,用于根据寄存器bytex_ptr发送的压缩单元的非零数据的存储位置信息,以及各个非零数据所在的寄存器,生成各个非零数据的第一存储位置信息,第一存储位置信息为9比特,最高位比特用于指示非零数据所在的寄存器,例如,当最高位比特为“0”,表示非零数据位于第一寄存器,当最高位比特为“1”,表示非零数据位于第二寄存器。并将各个非零数据的第一存储位置信息发送至下一级流水。
寄存器bytex_cur,用于根据寄存器bytex_ptr发送的存储位置(应理解,要忽略最高位),在寄存器cur_unit_2dly发送的压缩数据中获取第一数据,并将其发送至下一级流水的相关寄存器。
寄存器bytex_nxt,用于根据寄存器bytex_ptr发送的存储位置(应理解,要忽略最高位),在寄存器nxt_unit_2dly发送的压缩数据中获取第二数据,并将其发送至下一级流水的相关寄存器。
第五级流水,用于根据从所述第四级流水接收的信息,获取所述当前压缩单元对应的原始数据。
例如,如图8所示,第五级流水寄存器包括寄存器Bytex。寄存器Bytex用于根据寄存器bytex_0_flag_1dly发送的压缩单元中x个数据的flag,以及寄存器bytex_ptr_msb_1dly发送的非零数据的第一存储位置信息,从寄存器bytex_cur或寄存器bytex_nxt获取对应的非零数据。按照与第四级流水相同的规则,例如,第一存储位置信息的最高为“0”时,从寄存器bytex_cur获取非零数据,第一存储位置信息的最高为“1”时,从寄存器bytex_nxt获取非零数据。至此,恢复出该压缩单元对应的原始数据。寄存器Bytex将压缩单元对应的原始数据输出至输出端口B。
可选地,解压缩模块230也可以设计为两级、三级或四级流水结构。
例如,将本实施例中的五级流水结构中任意相邻的两个流水合并为一个流水,即可以实现将解压缩模块230设计为四级流水结构。再例如,将本实施例中的五级流水结构中任意相邻的三个流水合并为一个流水,即可以实现将解压缩模块230设计为三级流水结构。再例如,将本实施例中的五级流水结构中任意相邻的四个流水合并为一个流水线处理单元,即可以实现将解压缩模块230设计为两级流水结构。
如图9所示,本发明实施例还提供了一种数据解压缩的方法900,该方法900可以由上述实施例中的控制模块210执行。该方法900包括步骤910和步骤920。
910,将包括压缩单元的压缩数据载入缓存模块,压缩单元包括压缩信息与非零数据,压缩信息表示压缩单元对应的原始数据中非零数据的位置。
该缓存模块可以是上文任一实施例描述的缓存模块220。压缩单元与压缩数据的解释如前文描述。
920,向解压缩模块发送当前压缩单元在缓存模块中的初始存储位置,以使解压缩模块根据初始存储位置,读取当前压缩单元的压缩信息与非零数据,并据此获取当前压缩单元对应的原始数据。
可选地,可以从位置确定模块获取所述当前压缩单元在所述缓存模块中 的初始存储位置。
该解压缩模块可以是上文任一实施例描述的解压缩模块230。
如前文的解释,本申请提供的数据解压缩的方案,不存在指令与数据之间的同步问题,可以减少时间开销,有助于实现实时解压缩,保证数据流的连续性。此外,本申请提供的数据解压缩的方案,还可以减小解压缩装置与外部存储器之间的带宽开销,也可以减少解压缩装置内部的指令缓存开销。
可选地,在一些实施例中,该缓存模块包括至少两个寄存器;所述方法900还包括:控制所述至少两个寄存器采用乒乓机制缓存所述压缩数据。
采用乒乓机制缓存压缩数据,可以保证缓存模块提供待解压缩数据的连续性。
可选地,在一些实施例中,该缓存模块具有如图5所示的结构,该方法900还可以包括如图6所示的步骤。
应理解,方法实施例的描述与装置实施例的描述相互对应,因此,未详细描述的部分可以参见前面装置实施例。
本申请可以应用于需要数据压缩-解压缩的场景。例如,本申请的数据解压缩方案可以适用于卷积神经网络(Convolution Neural Networks,CNN)与循环神经网络(Recurrent Neural Networks,RNN)硬件加速器,例如,应用方式为IP核以及IP核之间的协同工作电路。
如图10所示,本申请实施例还提供一种数据处理的装置1000,数据处理的装置1000包括控制模块1010、缓存模块1020与处理模块1030。
缓存模块1020,包括第一寄存器与第二寄存器,所述第一寄存器与所述第二寄存器的存储空间均划分为低R比特存储空间与高R比特存储空间,R为8的倍数的正数。
控制模块1010,用于将待处理数据载入所述缓存模块1020,还用于向处理模块1030发送当前数据在所述缓存模块1020的初始存储位置,其中,数据的初始存储位置从所述第一寄存器的低R比特存储空间开始被获取。
所述处理模块1030,用于根据所述当前数据的初始存储位置,从所述缓存模块1020读取所述当前数据并处理。
所述控制模块1010还用于通过如下方式控制所述第一寄存器与所述第二寄存器缓存待处理数据:1),在待处理数据载入所述第一寄存器之后,将所述第一寄存器的高R比特存储空间中的数据载入所述第二寄存器的低R 比特存储空间中,将下一个R比特的待处理数据载入所述第二寄存器的高R比特存储空间;2),当所述当前数据的初始存储位置位于所述第一寄存器的高R比特存储空间时,将所述第二寄存器中的2R比特的数据载入所述第一寄存器中,循环执行步骤1)与步骤2),直至数据处理结束。
可选地,控制模块1010可以采用图7所示的方式控制第一寄存器与第二寄存器缓存数据,其中,将图7中的“压缩数据”替换为“数据”,将“解压缩模块”替换为“处理模块”,将“解压缩”替换为“处理”。
因此,本实施例的方案,将当前数据的初始存储位置移动到第一寄存器的高R比特存储空间,作为确定需要更新第一寄存器中的数据的条件,有助于提高数据处理的效率。
应理解,本实施例可以应用数据处理领域,并不限定于数据压缩-解压缩场景。
本发明实施例还提供了一种数据缓存模块的控制方法。该数据缓存模块可以是上文实施例结合图5描述的缓存模块220或缓存模块1020。该控制方法可以用上文实施例描述的控制模块210或控制模块1010执行。该控制方法可以包括如上文中的图6或图7所示的处理流程,为避免重复,此处不再详述。
本文中描述的各个实施例可以为独立的方案,也可以根据内在逻辑进行组合,这些方案都落入本申请的保护范围中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储 设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (16)

  1. 一种数据解压缩的装置,其特征在于,包括控制模块、缓存模块与解压缩模块;
    所述控制模块用于,将包括压缩单元的压缩数据载入所述缓存模块,所述压缩单元包括压缩信息与非零数据,所述压缩信息表示所述压缩单元对应的原始数据中非零数据的位置;
    所述控制模块还用于,向所述解压缩模块发送当前压缩单元在所述缓存模块中的初始存储位置;
    所述解压缩模块用于,根据所述初始存储位置,在所述缓存模块中读取所述当前压缩单元的压缩信息与非零数据,并据此获取所述当前压缩单元对应的原始数据。
  2. 根据权利要求1所述的装置,其特征在于,所述缓存模块包括至少两个寄存器;
    所述控制模块用于,控制所述至少两个寄存器采用乒乓机制缓存所述压缩数据。
  3. 根据权利要求2所述的装置,其特征在于,所述至少两个寄存器包括第一寄存器与第二寄存器,所述第一寄存器与所述第二寄存器的存储空间均划分为低R比特存储空间与高R比特存储空间,R为8的倍数的正数,压缩单元的初始存储位置从所述第一寄存器的低R比特存储空间开始被获取;
    所述控制模块用于通过如下方式控制所述第一寄存器与所述第二寄存器缓存所述压缩数据:
    1),在压缩数据载入所述第一寄存器之后,将所述第一寄存器的高R比特存储空间中的压缩数据载入所述第二寄存器的低R比特存储空间中,将下一个R比特的压缩数据载入所述第二寄存器的高R比特存储空间;
    2),当所述当前压缩单元的初始存储位置位于所述第一寄存器的高R比特存储空间时,将所述第二寄存器中的2R比特的压缩数据载入所述第一寄存器中,
    循环执行步骤1)与步骤2),直至解压缩结束。
  4. 根据权利要求1至3中任一项所述的装置,其特征在于,所述解压缩模块具有流水结构,所述解压缩模块用于,采用流水线处理模式获得所述当前压缩单元对应的原始数据。
  5. 根据权利要求3所述的装置,其特征在于,所述解压缩模块具有如下五级流水结构:
    第一级流水,用于获取所述当前压缩单元的初始存储位置、第一压缩数据与第二压缩数据,所述第一压缩数据为当前时刻所述第一寄存器存储的压缩数据,所述第二压缩数据为当前时刻所述第二寄存器存储的压缩数据,还用于根据所述初始存储位置获取所述当前压缩单元的压缩信息,还用于向下一级流水发送所获取的信息;
    第二级流水,用于向下一级流水发送从所述第一级流水接收的信息;
    第三级流水,用于根据从所述第二级流水接收的所述当前压缩单元的压缩信息与初始存储位置,获取所述当前压缩单元的非零数据的存储位置,还用于向下一级流水发送如下信息:所述当前压缩单元的非零数据的存储位置、从所述第二级流水接收的所述第一压缩数据、所述第二压缩数据、所述当前压缩单元的压缩信息;
    第四级流水,用于根据从所述第三级流水接收的所述当前压缩单元的非零数据的存储位置,分别在从所述第三级流水接收的所述第一压缩数据与所述第二压缩数据中获取第一数据与第二数据,还用于根据所述当前压缩单元的非零数据的存储位置,生成每个非零数据的指示信息,所述指示信息用于指示所述每个非零数据位于所述第一寄存器或所述第二寄存器,还用于向下一级流水发送如下信息:所述第一数据、所述第二数据、每个非零数据的存储位置与指示信息、所述当前压缩单元的压缩信息;
    第五级流水,用于根据从所述第四级流水接收的信息,获取所述当前压缩单元对应的原始数据。
  6. 根据权利要求1至5中任一项所述的装置,其特征在于,所述装置还包括:
    位置确定模块,用于确定所述当前压缩单元在所述缓存模块中的初始存储位置;
    所述控制模块用于,从所述位置确定模块获取所述当前压缩单元的初始存储位置。
  7. 根据权利要求1至6中任一项所述的装置,其特征在于,所述压缩单元对应8字节的原始数据,所述压缩单元中的压缩信息为8比特。
  8. 根据权利要求3所述的装置,其特征在于,R等于128。
  9. 一种数据处理的装置,其特征在于,包括:
    缓存模块,包括第一寄存器与第二寄存器,所述第一寄存器与所述第二寄存器的存储空间均划分为低R比特存储空间与高R比特存储空间,R为8的倍数的正数;
    控制模块,用于将待处理数据载入所述缓存模块,还用于向处理模块发送当前数据在所述缓存模块的初始存储位置,其中,数据的初始存储位置从所述第一寄存器的低R比特存储空间开始被获取;
    所述处理模块,用于根据所述当前数据的初始存储位置,从所述缓存模块读取所述当前数据并处理;
    所述控制模块还用于通过如下方式控制所述第一寄存器与所述第二寄存器缓存待处理数据:
    1),在待处理数据载入所述第一寄存器之后,将所述第一寄存器的高R比特存储空间中的数据载入所述第二寄存器的低R比特存储空间中,将下一个R比特的待处理数据载入所述第二寄存器的高R比特存储空间;
    2),当所述当前数据的初始存储位置位于所述第一寄存器的高R比特存储空间时,将所述第二寄存器中的2R比特的数据载入所述第一寄存器中,
    循环执行步骤1)与步骤2),直至数据处理结束。
  10. 一种数据解压缩的方法,其特征在于,包括:
    将包括压缩单元的压缩数据载入缓存模块,所述压缩单元包括压缩信息与非零数据,所述压缩信息表示所述压缩单元对应的原始数据中非零数据的位置;
    向解压缩模块发送当前压缩单元在所述缓存模块中的初始存储位置,以使所述解压缩模块根据所述初始存储位置,读取所述当前压缩单元的压缩信息与非零数据,并据此获取所述当前压缩单元对应的原始数据。
  11. 根据权利要求10所述的方法,其特征在于,所述缓存模块包括至少两个寄存器;
    所述方法还包括:控制所述至少两个寄存器采用乒乓机制缓存所述压缩数据。
  12. 根据权利要求11所述的方法,其特征在于,所述至少两个寄存器包括第一寄存器与第二寄存器,所述第一寄存器与所述第二寄存器的存储空间均划分为低R比特存储空间与高R比特存储空间,R为8的倍数的正数, 压缩单元的初始存储位置从所述第一寄存器的低R比特存储空间开始被获取;
    控制所述至少两个寄存器采用乒乓机制缓存所述压缩数据,包括:
    通过如下方式控制所述第一寄存器与所述第二寄存器缓存所述压缩数据:
    1),在压缩数据载入所述第一寄存器之后,将所述第一寄存器的高R比特存储空间中的压缩数据载入所述第二寄存器的低R比特存储空间中,将下一个R比特的压缩数据载入所述第二寄存器的高R比特存储空间;
    2),当所述当前压缩单元的初始存储位置位于所述第一寄存器的高R比特存储空间时,将所述第二寄存器中的2R比特的压缩数据载入所述第一寄存器中,
    循环执行步骤1)与步骤2),直至解压缩结束。
  13. 根据权利要求10至12中任一项所述的方法,其特征在于,所述方法还包括:
    从位置确定模块获取所述当前压缩单元在所述缓存模块中的初始存储位置。
  14. 根据权利要求10至13中任一项所述的方法,其特征在于,所述压缩单元对应8字节的原始数据,所述压缩单元中的压缩信息为8比特。
  15. 根据权利要求12所述的方法,其特征在于,R等于128。
  16. 一种数据缓存模块的控制方法,其特征在于,所述数据缓存模块包括第一寄存器与第二寄存器,所述第一寄存器与所述第二寄存器的存储空间均划分为低R比特存储空间与高R比特存储空间,R为8的倍数的正数;
    所述控制方法包括:
    将待处理数据载入所述缓存模块;
    向处理模块发送当前数据在所述缓存模块的初始存储位置,以使所述处理模块根据所述当前数据的初始存储位置,从所述缓存模块读取所述当前数据并处理,其中,数据的初始存储位置从所述第一寄存器的低R比特存储空间开始被获取;
    所述方法还包括:通过如下方式控制所述第一寄存器与所述第二寄存器缓存待处理数据:
    1),在待处理数据载入所述第一寄存器之后,将所述第一寄存器的高R 比特存储空间中的数据载入所述第二寄存器的低R比特存储空间中,将下一个R比特的待处理数据载入所述第二寄存器的高R比特存储空间;
    2),当所述当前数据的初始存储位置位于所述第一寄存器的高R比特存储空间时,将所述第二寄存器中的2R比特的数据载入所述第一寄存器中,
    循环执行步骤1)与步骤2),直至数据处理结束。
PCT/CN2019/082993 2019-04-17 2019-04-17 数据解压缩的装置与方法 WO2020211000A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/082993 WO2020211000A1 (zh) 2019-04-17 2019-04-17 数据解压缩的装置与方法
CN201980005236.5A CN111279617A (zh) 2019-04-17 2019-04-17 数据解压缩的装置与方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/082993 WO2020211000A1 (zh) 2019-04-17 2019-04-17 数据解压缩的装置与方法

Publications (1)

Publication Number Publication Date
WO2020211000A1 true WO2020211000A1 (zh) 2020-10-22

Family

ID=71001175

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/082993 WO2020211000A1 (zh) 2019-04-17 2019-04-17 数据解压缩的装置与方法

Country Status (2)

Country Link
CN (1) CN111279617A (zh)
WO (1) WO2020211000A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11792303B1 (en) 2022-09-30 2023-10-17 International Business Machines Corporation Fast clear memory of system memory

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111884658A (zh) * 2020-07-09 2020-11-03 上海兆芯集成电路有限公司 数据解压缩方法、数据压缩方法及卷积运算装置
CN116208170B (zh) * 2023-03-01 2023-10-27 山东华科信息技术有限公司 分布式能源并网监测的数据解压缩系统、方法及其设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101689863A (zh) * 2007-03-15 2010-03-31 线性代数技术有限公司 用于压缩数据的电路和利用该电路的处理器
CN102821275A (zh) * 2011-06-08 2012-12-12 中兴通讯股份有限公司 数据压缩方法及装置、数据解压缩方法及装置
WO2018150024A1 (en) * 2017-02-17 2018-08-23 Cogisen S.R.L. Method for image processing and video compression
CN108880559A (zh) * 2017-05-12 2018-11-23 杭州海康威视数字技术股份有限公司 数据压缩方法、数据解压缩方法、压缩设备及解压缩设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3968276B2 (ja) * 2002-07-31 2007-08-29 株式会社東芝 時系列データ圧縮・解凍装置およびその方法
US8847798B2 (en) * 2012-12-17 2014-09-30 Maxeler Technologies, Ltd. Systems and methods for data compression and parallel, pipelined decompression
US10735023B2 (en) * 2017-02-24 2020-08-04 Texas Instruments Incorporated Matrix compression accelerator system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101689863A (zh) * 2007-03-15 2010-03-31 线性代数技术有限公司 用于压缩数据的电路和利用该电路的处理器
CN102821275A (zh) * 2011-06-08 2012-12-12 中兴通讯股份有限公司 数据压缩方法及装置、数据解压缩方法及装置
WO2018150024A1 (en) * 2017-02-17 2018-08-23 Cogisen S.R.L. Method for image processing and video compression
CN108880559A (zh) * 2017-05-12 2018-11-23 杭州海康威视数字技术股份有限公司 数据压缩方法、数据解压缩方法、压缩设备及解压缩设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11792303B1 (en) 2022-09-30 2023-10-17 International Business Machines Corporation Fast clear memory of system memory

Also Published As

Publication number Publication date
CN111279617A (zh) 2020-06-12

Similar Documents

Publication Publication Date Title
WO2020211000A1 (zh) 数据解压缩的装置与方法
US10680643B2 (en) Compression scheme with control of search agent activity
KR102499335B1 (ko) 신경망 데이터 처리 장치, 방법 및 전자 장비
WO2019041833A1 (zh) 一种用于深度神经网络的压缩装置
US10218382B2 (en) Decompression using cascaded history windows
JP5008106B2 (ja) データ圧縮装置および方法
US9176977B2 (en) Compression/decompression accelerator protocol for software/hardware integration
JP2009531976A (ja) セットアソシアティブキャッシュマッピング技術に基づく高速データ圧縮
RU2265879C2 (ru) Устройство и способ для извлечения данных из буфера и загрузки их в буфер
JP2010061518A (ja) データ保存装置及びデータ保存方法並びにプログラム
US10637498B1 (en) Accelerated compression method and accelerated compression apparatus
US10489322B2 (en) Apparatus and method to improve performance in DMA transfer of data
EP2787738B1 (en) Tile-based compression for graphic applications
KR20000017360A (ko) 압축데이터 입출력기능을 갖는 메모리lsi
US10879926B2 (en) Accelerated compression method and accelerated compression apparatus
TW202344956A (zh) 使用前饋壓縮比的動態時脈和電壓縮放(dcvs)前瞻頻寬表決
JP2015080149A (ja) 映像記録装置及び映像記録方法
KR100509009B1 (ko) 소프트웨어 및 하드웨어 루프 압축 기능을 갖는 선입선출기록/후입선출 판독 트레이스 버퍼
US11126430B2 (en) Vector processor for heterogeneous data streams
US10686467B1 (en) Accelerated compression method and accelerated compression apparatus
US10637499B1 (en) Accelerated compression method and accelerated compression apparatus
JP2003030129A (ja) データバッファ
TWI692746B (zh) 應用於行動裝置之顯示驅動器的資料快取方法
WO2017186049A1 (zh) 信息处理方法和装置
CN114710421B (zh) 一种基于数据预取的网络连接状态维护装置和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19925362

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19925362

Country of ref document: EP

Kind code of ref document: A1