WO2021237510A1 - 数据解压缩的方法、系统、处理器及计算机存储介质 - Google Patents

数据解压缩的方法、系统、处理器及计算机存储介质 Download PDF

Info

Publication number
WO2021237510A1
WO2021237510A1 PCT/CN2020/092608 CN2020092608W WO2021237510A1 WO 2021237510 A1 WO2021237510 A1 WO 2021237510A1 CN 2020092608 W CN2020092608 W CN 2020092608W WO 2021237510 A1 WO2021237510 A1 WO 2021237510A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
code stream
decompression
stream information
header information
Prior art date
Application number
PCT/CN2020/092608
Other languages
English (en)
French (fr)
Inventor
赵尧
赵文军
林蔓虹
陈帅
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/092608 priority Critical patent/WO2021237510A1/zh
Publication of WO2021237510A1 publication Critical patent/WO2021237510A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/436Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements

Definitions

  • the embodiments of the present invention relate to the field of data processing, and more specifically, to a method, system, processor, and computer storage medium for data decompression.
  • an embodiment of the present invention provides a method for data decompression, including:
  • parsed decompression instruction distribute the channel decoding instruction to each decompression path of the at least two decompression paths;
  • each of the at least two decompression paths obtains its corresponding data to be decompressed, and decompresses the data to be decompressed to obtain decompressed data, where ,
  • the data to be decompressed includes at least compressed data corresponding to one compressed block;
  • a system for data decompression including an instruction parsing module and at least two decompression paths,
  • the instruction parsing module is configured as:
  • parsed decompression instruction distribute a channel decoding instruction to each decompression path of the at least two decompression paths;
  • Each of the at least two decompression paths is configured to:
  • a processor including:
  • a computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method described in the second aspect are implemented.
  • the data decompression method, system, processor, and computer storage medium provided by the embodiments of the present invention can reduce power consumption.
  • Fig. 1 is a schematic diagram of data compression and decompression according to an embodiment of the present invention.
  • Figure 2 is a schematic diagram of a compressed block according to an embodiment of the present invention.
  • Figure 3 is a schematic diagram of compressed data according to an embodiment of the present invention.
  • Figure 4 is a schematic diagram of header information, code stream information and output characteristic diagrams in an embodiment of the present invention.
  • Figure 5 is a schematic block diagram of a decompression system according to an embodiment of the present invention.
  • FIG. 6 is another schematic block diagram of a decompression system according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a variable-length shift register according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of decompression performed by a data decompression module according to an embodiment of the present invention.
  • FIG. 9 is a schematic flowchart of decompressing compressed data by the decompression system according to an embodiment of the present invention.
  • FIG. 10 is a schematic flowchart of a method for data decompression according to an embodiment of the present invention.
  • neural networks such as Convolution Neural Networks (CNN).
  • CNN Convolution Neural Networks
  • a large amount of feature map data will be generated.
  • data compression technology is usually used, which can reduce the space occupied by the external memory. , And can reduce the bandwidth when reading and writing.
  • the external memory may be, for example, a Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate Synchronous Dynamic Random Access Memory), or DDR for short.
  • a convolutional neural network generally includes a large number of convolutional layers, and each convolutional layer generates a large amount of feature map data.
  • these large amounts of feature map data are read and written to DDR, they will consume valuable system external memory bandwidth resources, resulting in other modules with high bandwidth requirements (such as CNN or other modules) because they cannot quickly access DDR and affect computing performance. .
  • the feature map calculated by the convolutional neural network can be located in the on-chip memory 10, and then the compression system 20 compresses the feature map in the on-chip memory 10, and stores the compressed data in the external memory.
  • compression may be performed in units of compressed blocks, and compressed data may be formed after at least one compressed block is compressed, and the compressed data may be code stream information.
  • header information corresponding to the code stream information is also generated.
  • a compression block includes M compression groups, and each compression group includes N pixels, and both M and N are positive integers.
  • the code stream information after compressing a compression block includes at least M flag bits, which are used to indicate whether the M compression groups are all zeros. If the M compression groups include a non-all-zero compression group, the bitstream information also includes residual length and residual field (RES). Among them, the flag bit and the residual length can be collectively referred to as the dock field (HDR).
  • HDR dock field
  • one compressed block includes 8 pixels, such as pixels p1 to p8 as shown in FIG. 2.
  • One compression block shown in FIG. 2 includes two compression groups, the first compression group is p1 to p4, and the second compression group is p5 to p8.
  • the compression block shown in Figure 2 includes two compression groups, so the corresponding code stream information includes two flag bits. Referring to Figure 3, the flag "00" indicates that both compression groups are all zeros; "01” indicates that the first compression group is all zeros, and the second compression group is not all zeros; "10” indicates that the first compression group is all zeros. If it is not all zeros, the second compression group is all zeros; "11” means that both compression groups are not all zeros.
  • not all compression groups are all zeros, it also includes residual length and residual fields.
  • the number of residual lengths is consistent with and corresponding to the number of compression groups that are not all zeros.
  • the number of residual length is 1; if there are two "1"s in the flag bit (i.e. case 3), the number of residual length is 2 .
  • the residual length is used to indicate the bit width of each of the subsequent corresponding residual fields.
  • the residual length Rbit1 represents the bit width of each of R0-R3
  • the residual length Rbit2 represents the bit width of each of R4-R7.
  • R0-R7 are obtained by subtracting adjacent pixels (as shown in Figure 2) to obtain D1-D8, and then compressing the difference.
  • the difference compression method can be used to obtain the compression block corresponding to The compressed data.
  • the flag bits corresponding to the two compression groups are both "1".
  • D5 p5-p4
  • D6 p6-p5
  • D7 p7-p5
  • D8 p8-p5.
  • the pixel value of each pixel is an 8-bit signed number.
  • R0 to R7 are obtained on the basis of the difference values D1 to D8.
  • D1 to D8 are signed numbers, and it is assumed that the corresponding unsigned numbers after removing the sign bit are d1 to d8.
  • the value range of the bits is 1-8, and there are 8 cases in total, which can be identified by 3bit Rbit1.
  • the value of Rbit1 ranges from 0 to 7, which corresponds to the value range of bits d1 to d4 from 1 to 8. Remove the first several bits of d1 to d4 and only retain the following (Rbit1+1) bits, and then add the corresponding sign bits of D1 to D4 to obtain R0 to R3.
  • each pixel includes 8 bits, of which 1 bit is a sign bit. Then, in addition to the sign bit, the pixel value of each pixel is represented by 7 bits. Therefore, the residual length of each pixel can occupy 3 bits.
  • case 1 corresponds to a compressed block and the compressed data after compression occupies 2 bits
  • case 2 corresponds to a pixel after compression and the compressed data occupies (5+(Rbit+2) ⁇ 4) bits
  • 3 corresponds to the compressed data after one pixel compression occupies (8+(Rbit1+2) ⁇ 4+(Rbit2+2) ⁇ 4) bits.
  • the header information corresponding to the code stream information is used to include the base address information (shown as row_baddr in FIG. 4) and length information of the code stream information.
  • the header information of a row may include one row base address information and K length information (row_len 0 ⁇ row_len(k-1) as shown in FIG. 4).
  • the row base address information occupies 32 bits
  • the compressed data can be decompressed to obtain the original data after decompression, that is, the decompressed image or the output feature map (Output Feature Map, OFM).
  • the instructions can be flexibly configured, so that the corresponding header information and code stream information can be selected according to the instructions. For example, it can be configured so that the selected header information and code stream information correspond to the non-starting position of the feature map, for example, a specific column or a specific channel, etc., so that the column direction (FM_H) from the compressed data can be supported And channel direction (FM_C) image cropping.
  • the first header information and the first code stream information are passed The stream information can be decompressed into a cropped feature map.
  • the header information base address info_baddr can be configured to read the corresponding code stream information from a certain row of the header information table.
  • the feature map height direction offset FM_H_OFST can be configured, and the corresponding code stream information can be read from a certain row length row_len in the header information of a row.
  • FIG. 4 shows a schematic diagram of header information, code stream information, and decompressed image.
  • the decompressed image may be a feature map.
  • the decompressed image is an output feature map (OFM).
  • the output feature map has three dimensions, namely width, height, and channel.
  • the feature map width FM_W, the feature map height FM_H, and the feature map channel number FM_C are shown.
  • the base address of the feature map is OFM_BADDR, the address offset between image channels is OFM_SEGM_LEN, and the height offset of the feature map is FM_H_OFST.
  • the header information includes multiple tables. For example, for one feature map, one table can correspond to a channel of the feature map, and the address offset between the header information channels is INFO_SEGM_LEN.
  • the base address of the header information is IMFO_BADDR.
  • the header information of a row includes a row base address information and K length information. Taking the first row example shown in the header information table in Figure 4, the row base address information is row_baddr 0, and the K length information is row_len 0 to row_len(k -1).
  • the code stream information can be found according to the base address in the header information, as shown in Figure 4, the starting address row_baddr 0 of the code stream information.
  • the residual information in the code stream information can be extracted, and the decompressed image can be obtained by decompressing the residual information.
  • an embodiment of the present invention provides a decompression system. Still referring to FIG. 1, the decompression system 40 reads the compressed data from the external memory 30. Then, the original data is obtained by decompression, and then the decompressed data is stored in the on-chip memory 10 so that the processor can process the data in the on-chip memory 10.
  • the decompression system 40 for data decompression (hereinafter referred to as the system) at least includes: an instruction parsing module, a read arbitration module, at least two decompression paths, and a write arbitration module, as shown in FIG. 5.
  • the number of decompression paths in the embodiment of the present invention is at least two, for example, it can be 3 or more, which can be specifically configured according to the performance of the processor and the size of the feature map data to be processed. .
  • the embodiment of the present invention can use one system to flexibly instantiate multiple decompression paths (decoding paths, DEC_PATH), so as to be compatible with applications of various data output rates.
  • the number of decompression paths can be flexibly configured according to the processing requirements of the feature map data. When a decompression path cannot meet the task performance requirements, the number of decompression paths can be configured to provide decompression performance, thereby flexibly Meet the performance requirements of different processing tasks.
  • decompression path 1 decompression path 1
  • decompression path 2 decompression path 2.
  • the instruction parsing module may receive the decompression instruction and parse the decompression instruction; then, according to the parsed decompression instruction, the channel decoding instruction may be distributed to each decompression path of the at least two decompression paths.
  • the instruction parsing module reads the decompression instruction from the external memory.
  • Each of the at least two decompression paths can obtain respective corresponding data to be decompressed according to the received channel decoding instruction, and decompress the data to be decompressed to obtain decompressed data, where the data to be decompressed is at least Including compressed data corresponding to a compressed block; and performing write operations on the decompressed data.
  • the read arbitration module and the write arbitration module are respectively used to arbitrate the read command and the write data command from at least two decompression paths to ensure effective and correct data transmission.
  • the instruction parsing module which can be expressed as an INSTR_PROC (instruction process) module, can receive a decompression instruction, and then can analyze the decompression instruction, and distribute the channel decoding instruction to each decompression path according to the result of the analysis.
  • INSTR_PROC instruction process
  • the processor may send a decompression instruction to the instruction parsing module.
  • the instruction parsing module receives the decompression instruction, it can correspondingly distribute the channel decoding instruction to each decompression path, so that each decompression path reads the compressed data from the external memory and performs decompression.
  • the channel decoding instruction (denoted as instr_cfgs) distributed to a certain decompression path may include: the base address of the header information (info_baddr), the length of the header information (info_len), the width of the output feature map (fm_w), and the output feature map The height of the output feature map (fm_h), the offset of the output feature map in the height direction (fm_h_ofst), the base address of the output feature map (ofm_baddr), etc.
  • the instruction parsing module may distribute the decompression task for the one-time decompression of the feature map with the dimension FM_W ⁇ FM_H ⁇ FM_C to at least two decompression paths (such as P) according to the parsed decompression instruction. For example, one way is to allocate one channel to multiple different decompression paths.
  • the channel decoding instructions distributed to the decompression path may include all necessary information for decompressing the feature map of a channel.
  • the read arbitration module which can be expressed as the RD_ARB (read arbiter) module, can arbitrate the header information read command and/or the code stream information read command from at least two decompression paths, and send the command that wins the arbitration first (for example, , Read command).
  • the read arbitration module sends a read command to the external memory according to the result of the arbitration winning.
  • the arbitration rules used for arbitration may include a pre-configured priority mechanism or a fair polling mechanism.
  • the read arbitration module determines the read command (header information read command and/or code stream information read command) that wins the arbitration according to the arbitration rules, and records the ID of the winning read command (for example, the solution of its source). Compression path ID), send the winning read command to the data bus (DATA_BUS), and after receiving the data return from the external memory, according to the recorded ID of the winning read command, send the received return data to the corresponding The decompression path of the ID.
  • the arbitration rule is a priority mechanism, the compression processing of the compression path with a high priority can be guaranteed first, and the task performance of the compression path with the priority can be ensured.
  • the write arbitration module which can be expressed as a WR_ARB (write arbiter) module, can arbitrate the write data commands from at least two decompression paths, and send the command that wins the arbitration first.
  • WR_ARB write arbiter
  • the arbitration rules used for arbitration may include a pre-configured priority mechanism or a fair polling mechanism.
  • the write arbitration module determines the data write command that wins the arbitration according to the arbitration rule, and first writes the decompressed data corresponding to the data write command that wins to the on-chip memory through the data bus (DATA_BUS).
  • each decompression path can include header information loading module, code stream information loading module, header information cache module, code stream information cache module, variable length shift register module, data Decompression module and data packaging module. It should be noted that, in order to simplify the illustration, only the decompression path 1 is shown in FIG. 6. For the other decompression paths (such as decompression path 2) of the at least two decompression paths, the included module structure and the implemented functions They are all similar and will not be listed in detail.
  • the header information loading module can be expressed as the INFO_LOAD (Header information load) module, also known as the header information load parsing module, which can determine the base address (info_baddr) of the header information and the length of the header information (info_len) based on the channel decoding instruction, and then Send the header information read command.
  • the header information read command may include the base address of the header information and the length of the header information. It can be understood that the header information read command is sent to the read arbitration module so that the read arbitration module arbitrates the header information read commands from at least two decompression modules.
  • the header information loading module After the header information loading module obtains the read back header information (also called header information data), the header information can be parsed to determine the base address of the code stream information (represented as ifm_baddr) and the length of the code stream information (ifm_len ). The base address of the code stream information and the length of the code stream information are sent to the code stream information loading module, so that the code stream information loading module sends the code stream information read command and obtains the code stream information.
  • the base address of the code stream information represented as ifm_baddr
  • the length of the code stream information ifm_len
  • the header information loading module determines the base address of the code stream information (ifm_baddr) and the length of the code stream information (ifm_len) based on the header information, and is divided into the following two situations:
  • h st is the start line of the decompression header information in FIG. 4, which is used to indicate the start of the height direction.
  • a row of the header information table corresponds to the row number h of the original image before compression.
  • ifm_baddr is the starting address of the code stream (that is, the base address of the code stream information), which means that the corresponding code stream information is read from a certain row of the header information table.
  • fm_h_ofst is the offset of the decompression starting line corresponding to h st.
  • the length of the code stream information is:
  • the header information of a row includes a row base address information and K length information, and the row base address information is expressed as The row length information is expressed as row_len h .
  • the length of the code stream information is:
  • the non-first-line header information refers to the first second-line header information of the shaded part.
  • the base address of the code stream information means that the start address of the code stream information corresponding to the second row of the shaded part is the row_baddr of the row header information.
  • h refers to the serial number of the header row_len. As long as h is still within the effective height range, that is, h ⁇ (h st +fm_h_ofst+fm_h), the length of row_len_h parsing code stream information can be accumulated.
  • the two-line header information can be configured to merge and output Load the module to the header information, merge and update it to:
  • the code stream information loading module can be expressed as the IFM_LOAD (Input feature map load) module, which can receive the base address of the code stream information and the length of the code stream information from the header information loading module, and send the code stream information read command accordingly to obtain the code Stream information.
  • IFM_LOAD Input feature map load
  • the base address of the code stream information and the length of the code stream information can be converted into a code stream information read command that meets the maximum burst length and address alignment requirements, and sent to the read arbitration module for reading
  • the arbitration module arbitrates the code stream information read commands from at least two decompression modules.
  • the header information cache module can be expressed as an INFO_FIFO module, which can be used to store the header information read back from the external memory by the header information loading module. That is, after the header information loading module receives the header information corresponding to the header information read command sent from the read arbitration module, it buffers the received header information in the INFO_FIFO module.
  • the code stream information buffer module can be expressed as the IFM_FIFO module, which can be used to store the code stream information read back from the external memory by the code stream information loading module. That is, after the code stream information loading module receives the code stream information corresponding to the code stream information read command sent from the read arbitration module, it buffers the received code stream information in the IFM_FIFO module.
  • variable length shift register module can be expressed as a VLSR (Veri-Length shift register) module, which is used to read the code stream information from the code stream information buffer module and buffer it in the form of a shift register.
  • VLSR Very-Length shift register
  • the code stream information to be cached can be first written into the highest register space in the L register spaces, and the register identifier of the highest register space is set It is the first identification; when the registration identification in the lower-order storage space is the second identification, the code stream information located in the high-order storage space is shifted to the low-order storage space.
  • the first flag indicates that the corresponding storage space has available code stream information cached, and the second flag indicates that the corresponding storage space does not have available code stream information cached.
  • the first identifier Take the first identifier as 1 and the second identifier as 0 as an example.
  • VLSR including L register spaces
  • the L register spaces are 0th, 1st,..., L-1th
  • a character string of L bits can also be used to identify L storage spaces together, for example, the identification is initialized to a character string composed of "L zeros".
  • the code stream information of BW bits is read from the code stream information buffer module, buffered in the L-1th register space, and the identifier of the L-1th register space is changed to 1. Since the identification of the L-2th register space is 0, then the bit stream information of the BW bits in the L-1th register space can be shifted to the L-2th register space, and the L-1th register space The identifier of each register space becomes 0, and the identifier of the L-2 th register space becomes 1. In this way, as long as the identifier of the lower-order register space is 0, shift to the lower-order position until the lowest register space (that is, the 0th register space).
  • the dock field pointer of the register (represented as bit_ofst_cur_hdr) can be used to indicate the starting position of the current dock field in the VLSR.
  • bit_ofst_cur_hdr When the data of a frame (one line) is decompressed, the initial position of the dock field pointer is 0. It can be understood that according to the above description of compressed data in conjunction with the parts of FIGS. 2 and 3, the dock field pointer indicates the starting position of the compressed data of the current compressed block in the VLSR.
  • bit_ofst_cur_res bit_ofst_cur_hdr+bit_len_cur_hdr.
  • bit_ofst_cur_res bit_ofst_cur_hdr+bit_len_cur_hdr.
  • the registration identification of the lowest register space of the register is the first identification (such as 1)
  • the dock field (HDR) and residual field (RES) at the current dock field pointer (bit_ofst_cur_hdr) are valid
  • the dock field (HDR) and residual field (RES) at the current dock field pointer (bit_ofst_cur_hdr) can be sent to the data decompression module, so that the data decompression module can decompress them.
  • VLSR can be updated and the dock field pointer can be updated, so that after the compressed data of the first compressed block is decompressed, the compressed data of the second compressed block is continued to be decompressed, where , The second compressed block is the next compressed block adjacent to the first compressed block.
  • the updating process may include: updating the current dock field pointer (bit_ofst_cur_hdr) to bit_len_nxt_hdr.
  • bit_ofst_cur_hdr updated pointer offset
  • bit_len_nxt_hdr bus bit width
  • the bit stream information in the register can be shifted and the identifier updated accordingly, and the bit stream information buffer module can continue to read the BW bit
  • BW bus bit width
  • FIG. 7 shows only two register spaces. The one on the right is the low register space (LSB), and the one on the left is the high register space. (MSB).
  • the code stream information from the code stream information buffer module (IFM_FIFO) is first cached in the highest register space. When the lower register space is identified as 0, it shifts to the lower position, as shown by the two dotted arrows at the top in Figure 7. Show.
  • FIG. 7 also shows the current position of the dock field pointer (bit_ofst_cur_hdr) and the current starting position of the residual field (bit_ofst_cur_res).
  • the dock field pointer (bit_ofst_nxt_hdr) of the determined next compressed block is also shown.
  • the current dock field pointer can be updated, specifically to bit_len_nxt_hdr. Since bit_len_nxt_hdr is greater than BW, as shown in Figure 7, it is located in the upper register space on the left. Therefore, at this time, it can be updated by shifting the upper register space on the left to the lower register space on the right.
  • IMM_FIFO code stream information buffer module
  • the data decompression module denoted as the DATA_DEC (data decoder) module, can be used to obtain the compressed data corresponding to a compressed block from the VLSR and decompress it.
  • DATA_DEC data decoder
  • the code stream information corresponding to a compressed block can be obtained from the position of the dock field pointer of the register, including the dock field (HDR) and the residual field (RES). And further decompress to obtain the original data of M ⁇ N pixels.
  • the residual length can be used to obtain the N residual fields of the corresponding compression group, and the N residual fields can be expanded by digits (such as adding 0 after the sign bit) to obtain the difference. Value, and then by summing with the previous pixel value, N pixel values are restored.
  • the data decompression module is described in conjunction with the situation 3 in Fig. 2 and Fig. 3.
  • the length of the 4 residual fields in the first compression group can be calculated by using the residual length Rbit1, and then the corresponding residual data is selected, which are R0, R1, R2, and R3 in sequence.
  • the sign bit of R0 to R7 is expanded (the sign bit in R0 to R7 is expanded at the high bits of R0 to R7) to become 9-bit pixel difference values D1 to D8.
  • the data selector performs residual selection and sign bit expansion based on Rbit1 and residual RES1 (ie R0 ⁇ R3) to obtain D1 ⁇ D4; the data selector performs based on Rbit2 and residual RES2 (ie R4 ⁇ R7) and performs Residual error selection and sign bit expansion get D5 ⁇ D8.
  • p0 is the last pixel value in the compressed block before the compressed block, which can be obtained from the initial pixel register (reg_pi).
  • the algorithm for decompression performed by the data decompression module in the embodiment of the present invention such as which number is added to which number, etc., depends on the algorithm used for data compression, that is, the process of decompression and compression. It should be corresponding.
  • the data packing module can be expressed as a DATA_PACKER module, which is used to pack the decompressed data obtained by the data decompression module to match the bus bit width. Further, a data write command can be sent to the write arbitration module, so that the write arbitration module arbitrates the write data commands from at least two decompression modules, and writes the data after the data is packed into the on-chip memory.
  • the header information loading module may send an instruction end signal (for example, expressed as instr_done_path) to the instruction analysis module, so as to further obtain information for the next channel from the instruction analysis module Decompression instructions and so on.
  • an instruction end signal for example, expressed as instr_done_path
  • the decompression system includes at least two decompression paths, and the number can be flexibly configured as required to be compatible with applications of various data output rates and meet different processing tasks. Performance requirements.
  • the decompression system 40 can decompress the data after the compression system uses a difference method to recover the original data, which is used in the subsequent processing of the data by the processor. Since the compressed data occupies a small space, the bandwidth when reading data between the processor and the external memory can be reduced, the number of times of reading the external memory can be reduced, and the power consumption can be reduced.
  • the process of decompressing compressed data by the decompression system 40 in the embodiment of the present invention may be as shown in FIG. 9.
  • the instruction parsing module may receive the decompression instruction, parse the decompression instruction, and according to the resolved decompression instruction, distribute the channel decoding instruction to each decompression path of the at least two decompression paths.
  • Each decompression path executes a data decompression operation after receiving a channel decoding instruction, and writes the decompressed data into the on-chip memory. After completing the decompression of the data of one channel, continue to decompress the data of the next channel until the decompression process of all the data indicated by the decompression instruction is completed.
  • a decompression path to perform a data decompression operation includes: after the header information loading module parses the channel decoding instruction, sends a header information read command, reads the header information accordingly, and determines the base address and code stream information by parsing the header information Length information.
  • the code stream information loading module sends the code stream information read command according to the analysis result of the header information by the header information loading module, and reads the code stream information accordingly.
  • the code stream information is analyzed by length in VLSR and decompressed by the data decompression module.
  • the data packing module packs the decompressed data and sends a write data command.
  • FIG. 10 Exemplarily, another schematic flowchart of a method for data decompression according to an embodiment of the present invention is shown in FIG. 10, which includes:
  • S102 Distribute a channel decoding instruction to each of the at least two decompression paths according to the parsed decompression instruction;
  • each of the at least two decompression paths obtains respective corresponding data to be decompressed, and decompresses the data to be decompressed to obtain decompressed data, where the decompression is
  • the data includes at least compressed data corresponding to one compressed block;
  • S101 and S102 may be executed by the instruction parsing module of the decompression system.
  • S101 may include: when the data processing unit of the processor needs to obtain data from the on-chip memory, it may receive a decompression instruction from the processor, and determine the content contained in the decompression instruction through analysis, for example, the decompression instruction may Including: the number of feature maps to be decompressed, the base address of the compressed data corresponding to these numbers of feature maps in the external memory and the storage interval between images, the base address and the storage interval between the images when these numbers of feature maps are output to the on-chip memory, The width, height, number of channels, etc. of each feature map data.
  • each feature map can be obtained one by one.
  • decompression can be performed channel by channel to obtain the feature map. For example, for feature map A, if the number of channels is FM_C, you can first perform S103 and S104 for the first channel number, and then perform S103 and S104 for the second channel, ... until the FM_C channel is completed.
  • S102 may include: determining the channel of the feature map currently to be decompressed (such as the first channel of feature map A) according to the resolved decompression instruction, and correspondingly determining each of the at least two decompression paths The channel decoding instruction of the decompression path. Subsequently, the corresponding channel decoding instructions are distributed to each decompression path.
  • the decompression task of one channel can be allocated to at least two decompression paths for execution, and the degree of parallelism is improved, and the original data of one channel of the feature map can be obtained as soon as possible for data processing of the processor.
  • the channel decoding instruction sent to a decompression path may include: the width and height of the feature map to be obtained by the decompression path, the base address and length information of the header information of the compressed data of the feature map in the external memory, After decompression, it will be written into the base address of the on-chip memory, etc.
  • each decompression path can execute S103 and S104 according to the received channel decoding instruction.
  • the header information read command and the code stream information read command can be sent in sequence, thereby obtaining the header information and the code stream information.
  • the method may further include: arbitrating the header information read commands from at least two decompression paths, and preferentially sending the header information read command that wins the arbitration.
  • the arbitration rules used during arbitration may include a pre-configured priority mechanism or a fair polling mechanism.
  • the data to be decompressed in S103 includes header information and code stream information.
  • S103 may include: sending a header information read command according to the received channel decoding instruction; acquiring header information; sending a code stream information read instruction according to the header information; acquiring code stream information.
  • the received channel decoding instruction can be parsed to obtain the base address of the header information and the length of the header information. Then send a header information read command, where the header information read command includes the base address of the header information and the length of the header information. Further, header information corresponding to the header information read command can be received. It is understandable that the base address of the header information here refers to the base address of the header information in the external memory.
  • the header information can be parsed to determine the base address of the code stream information and the length of the code stream information. Then send the code stream information read command, where the code stream information read command includes the base address of the code stream information and the length of the code stream information. Further, the code stream information corresponding to the code stream information read command can be received.
  • the base address of the code stream information here refers to the base address of the code stream information in the external memory.
  • the code stream information read command may be a code stream information read command that meets the maximum burst length and address alignment requirements after conversion according to the configuration of the bus. That is to say, first, according to the configuration of the bus, it is converted into a code stream information read command that meets the maximum burst length and address alignment requirements before sending.
  • the header information loading module parse the received channel decoding instruction, send the header information read command, obtain the header information, and parse the header information to determine the base address of the code stream information and the code stream information length. It can be executed by the code stream information loading module: send the code stream information read command, and obtain the code stream information.
  • one line of header information may include a base address and K pieces of length information.
  • the header information is the first line header information of the feature map, the base address of the code stream information and the code stream length are based on the first line header information.
  • the base address of the identification and the offset in the height direction of the feature map are determined; if the header information is the non-first line header information of the feature map, the base address of the code stream information is the base address identified by the first line header information.
  • a line of header information includes a line base address information (represented as ) And K length information (row_len h ). So:
  • the length of the code stream information is:
  • the length of the code stream information is:
  • the two-line header information can be configured to merge and output Load the module to the header information, merge and update it to:
  • the base address of the code stream information and the length of the code stream information can be determined.
  • the base address of the code stream information and the length of the code stream information are continuous in the address space.
  • the code stream information is obtained by combining multiple pieces of first code stream information.
  • the multiple first code stream information includes current code stream information and previous code stream information of the current code stream information.
  • decompressing the data to be decompressed in S103 may include: buffering code stream information in a register, where the code stream information includes compressed data corresponding to at least one compressed block; reading compressed data corresponding to one compressed block from the register And perform decompression to obtain each pixel in the compressed block.
  • the register may be a variable-length shift register, and the register may include L register spaces, and each register space has BW bits. Buffering the code stream information in the register may include: first writing the code stream information into the highest register space in the L register spaces, and setting the register identifier of the highest register space as the first identifier; when there is a lower register space When the registration identifier of is the second identifier, the code stream information is shifted to the lower register space.
  • all L storage spaces initially have a second identifier.
  • the first identifier is 1 and the second identifier is 0.
  • other forms may also be used to represent the first identifier and the second identifier, such as 1 and 0, and 0 and 1.
  • an L-bit character string may be used to represent the identifiers of L storage spaces, etc., which is not limited in the present invention.
  • reading the compressed data corresponding to a compressed block from the register and performing decompression may include: starting from the position of the current dock field pointer of the register, reading the code stream information corresponding to a compressed block and performing decompression.
  • the decompression starts from the position of the current dock field pointer.
  • the position of the second compressed data corresponding to the second compressed block is determined according to the length of the first compressed data, and the current dock field pointer is shifted to The position of the second compressed data corresponding to the second compressed block so as to decompress the second compressed data corresponding to the second compressed block.
  • one compression block includes M compression groups, and each compression group includes N pixels.
  • the compressed data corresponding to a compressed block includes a dock field, and the dock field includes at least M flag bits, which are used to indicate whether the M compressed groups are all zeros. If the second flag bit in the M flag bits indicates that the second compression group of the M compression groups is not all zeros, then the compressed data corresponding to one compression block also includes the residual length and the residual error corresponding to the second flag bit Field.
  • the residual field corresponding to the second flag bit includes N signed bit residuals, and the length of each signed bit residual excluding the sign bit is the residual length.
  • the first compression group When decompressing compressed data, if the first flag bit in the M flag bits indicates that the first compression group of the M compression groups is all zeros, then the first compression group is decompressed into N zero pixels; also That is, the pixel values of the N pixels included in the first compression group are all 0 (for example, 000000000, the first 0 is the sign bit). If the second flag bit in the M flag bits indicates that the second compression group of the M compression groups is not all zeros, the compressed data corresponding to a compressed block also includes the residual length and the residual field corresponding to the second flag bit , The second compression group is decompressed into N non-all-zero pixels according to the residual length and residual field corresponding to the second flag bit.
  • the position of the residual field can be determined according to the number of flag bits indicating non-all zeros in the M flag bits and the bit width of the residual length; starting from the position of the residual field, the number of bits (such as Q A) residual field; each residual field in the number of residual fields is decompressed into N pixels.
  • the residual field corresponding to the second compression group can be obtained, and the residual field includes N Sign bit residual.
  • Decompress the residual field into N pixels including: according to the N signed bit residuals, through the sign bit extension, determine the corresponding N signed pixel difference; determine the difference according to the N signed pixel difference The difference between the N pixels and the initial pixel; the pixel value of the N pixels is determined according to the difference between the N pixels and the initial pixel.
  • the initial pixel refers to: if the position of the first unsigned bit pixel is the first position of a line of the feature map, the initial pixel is zero; otherwise, the initial pixel is the last pixel located before N pixels.
  • the data decompression module is described in conjunction with the situation 3 in Fig. 2 and Fig. 3.
  • Rbit2 can be used to calculate the length of the 4 residual fields in the second compression group and select the residual data, which are R4, R5, R6, and R7 in sequence.
  • the sign bit of R0 to R7 is expanded (the sign bit in R0 to R7 is expanded at the high bits of R0 to R7) to become 9-bit pixel difference values D1 to D8.
  • p0 is the last pixel value in the compressed block before the compressed block, which can be obtained from the initial pixel register (reg_pi).
  • the algorithm for decompression performed by the data decompression module in the embodiment of the present invention such as which number is added to which number, etc., depends on the algorithm used for data compression, that is, the process of decompression and compression. It should be corresponding.
  • the decompressed data may be data-packed to match the bus bit width; and the data after the data-packing may be written into.
  • executing the write operation may include: sending a data write command so as to write the decompressed data to the on-chip memory.
  • the write operation can be performed on the base address of the on-chip memory according to the feature map contained in the decompression instruction.
  • the method may further include: arbitrating data write commands from at least two decompression paths, and preferentially sending the data write command that wins the arbitration.
  • the arbitration rules used during arbitration may include a pre-configured priority mechanism or a fair polling mechanism.
  • the method for data decompression provided by the embodiment of the present invention can decompress the data after the compression system is compressed by the difference method to restore the original data, which is used in the subsequent processing of the data by the processor. Since the compressed data occupies a small space, the bandwidth when reading data between the processor and the external memory can be reduced, the number of times of reading the external memory can be reduced, and the power consumption can be reduced.
  • the system for data decompression in the embodiment of the present invention can be implemented on a processor, for example, it can be a processor of various devices such as a computer, a server, a workstation, a mobile terminal, and a pan/tilt.
  • the decompressed data obtained by the processor through decompression may be feature map data.
  • the feature map data may be generated during the execution of the convolutional neural network, and may continue to be used in other iterations of the convolutional neural network. Operation etc.
  • an embodiment of the present invention also provides a processor, and the processor may include an on-chip memory and a decompression system 40.
  • the processor may also include a compression system 20.
  • the decompression system 40 may be as shown in FIG. 5 or FIG. 6.
  • the processor may include a central processing unit (Central Processing Unit, CPU) or other forms of processing units with data processing capabilities and/or instruction execution capabilities, such as Field-Programmable Gate Array (Field-Programmable Gate Array). , FPGA) or Advanced RISC (Reduced Instruction Set Computer) Machine (ARM), etc., and the processor may include other components to perform various desired functions.
  • CPU Central Processing Unit
  • FPGA Field-Programmable Gate Array
  • ARM Advanced RISC
  • the processor may include other components to perform various desired functions.
  • characteristic map means that the decompression system of the embodiment of the present invention will compress
  • the data after the data is decompressed may have three dimensions of width, height, and channel, or alternatively may have two dimensions of width and height.
  • the embodiment of the present invention also provides a computer storage medium on which a computer program is stored.
  • the computer program is executed by the processor, the steps of the method for data decompression shown above can be realized.
  • the computer storage medium is a computer-readable storage medium.
  • the computer or processor executes the steps of the method shown in FIG. 9 or FIG. 10.
  • the computer or processor executes the following steps: receiving a decompression instruction, and parsing the decompression instruction; according to the parsed decompression instruction, Each decompression path of the at least two decompression paths distributes channel decoding instructions; according to the received channel decoding instruction, each decompression path of the at least two decompression paths obtains its corresponding data to be decompressed, and Decompress the to-be-decompressed data to obtain decompressed data, where the to-be-decompressed data includes at least one compressed data corresponding to a compressed block; a write operation is performed on the decompressed data.
  • the computer storage medium may include, for example, the memory card of a smart phone, the storage component of a tablet computer, the hard disk of a personal computer, a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disk read-only memory ( CD-ROM), USB memory, or any combination of the above storage media.
  • the computer-readable storage medium may be any combination of one or more computer-readable storage media.
  • an embodiment of the present invention also provides a computer program product, which contains instructions, which when executed by a computer, cause the computer to execute the steps of the method for data decompression shown in FIG. 9 or FIG. 10 .
  • the computer when the instruction is executed by the computer, the computer is caused to execute: receive the decompression instruction, and parse the decompression instruction; according to the parsed decompression instruction, send to each of the at least two decompression paths
  • the decompression path distributes channel decoding instructions; according to the received channel decoding instructions, each decompression path of the at least two decompression paths obtains its corresponding data to be decompressed, and decompresses the data to be decompressed Obtain decompressed data, where the data to be decompressed includes at least compressed data corresponding to one compressed block; a write operation is performed on the decompressed data.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)), etc.
  • the decompression system provided by the embodiment of the present invention includes at least two decompression paths, and the number can be flexibly configured as required to be compatible with applications of various data output rates and meet the performance requirements of different processing tasks.
  • the decompression system can decompress the data after the compression system uses a difference method to recover the original data, which is used in the subsequent processing of the data by the processor. Since the compressed data occupies a small space, the bandwidth when reading data between the processor and the external memory can be reduced, the number of times of reading the external memory can be reduced, and the power consumption can be reduced.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processor, or each unit may exist alone physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种数据解压缩的方法、系统、处理器及计算机存储介质。该方法包括:接收并解析解压缩指令(S101);根据已解析的解压缩指令,向至少两个解压缩路径中的各个解压缩路径分发通道解码指令(S102);根据接收到的通道解码指令,至少两个解压缩路径中的每个解压缩路径分别获取各自对应的待解压数据,并对待解压数据进行解压缩得到解压缩后的数据(S103);针对解压缩后的数据执行写入操作(S104)。该方法能够将压缩系统采用差值方法压缩之后的数据进行解压缩以恢复原始数据,用于处理器后续对数据的处理过程。由于压缩数据的占用空间小,从而能够减小处理器与外部存储器之间进行数据读取时的带宽,减小了对外部存储器进行读取的次数,进而能够降低功耗。

Description

数据解压缩的方法、系统、处理器及计算机存储介质 技术领域
本发明实施例涉及数据处理领域,并且更具体地,涉及一种用于数据解压缩的方法、系统、处理器及计算机存储介质。
背景技术
在越来越多的场景中,需要将大量的数据进行存储。为了充分利用存储器的存储空间,为了存储更多的数据,一般会将数据进行压缩以后再进行存储。
在将数据进行压缩存储之后,需要有对应的解压缩算法实现数据的还原,如果解压缩的方式不对应,则无法恢复出原始数据,影响针对数据的处理过程。
发明内容
第一方面,本发明实施例提供了一种用于数据解压缩的方法,包括:
接收解压缩指令,并解析所述解压缩指令;
根据已解析的解压缩指令,向至少两个解压缩路径中的各个解压缩路径分发通道解码指令;
根据接收到的通道解码指令,所述至少两个解压缩路径中的每个解压缩路径分别获取各自对应的待解压数据,并对所述待解压数据进行解压缩得到解压缩后的数据,其中,所述待解压数据至少包括与一个压缩块对应的压缩数据;
针对所述解压缩后的数据执行写入操作。
第二方面,提供了一种用于数据解压缩的系统,包括指令解析模块和至少两个解压缩路径,
所述指令解析模块,被配置为:
接收解压缩指令,并解析所述解压缩指令;
根据已解析的解压缩指令,向所述至少两个解压缩路径中的各个解压缩路径分发通道解码指令;
所述至少两个解压缩路径中的每个解压缩路径,被配置为:
根据接收到的通道解码指令,获取对应的待解压数据,并对所述待解压数据进行解压缩得到解压缩后的数据,其中,所述待解压数据至少包括与一个压缩块对应的压缩数据;
针对所述解压缩后的数据执行写入操作。
第三方面,提供了一种处理器,包括:
片上存储器,以及
上述第二方面所述的用于数据解压缩的系统。
第四方面,提供了一种计算机存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述第二方面所述方法的步骤。
本发明实施例提供的数据解压缩的方法、系统、处理器及计算机存储介质,能够降低功耗。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例的数据压缩与解压缩的一个示意图。
图2是本发明实施例的一个压缩块的示意图;
图3是本发明实施例的压缩数据的一个示意图;
图4是本发明实施例中的头信息、码流信息和输出特征图的示意图;
图5是本发明实施例的解压缩系统的一个示意框图;
图6是本发明实施例的解压缩系统的另一个示意框图;
图7是本发明实施例的变长移位寄存器的一个示意图;
图8是本发明实施例的数据解压缩模块进行解压缩的一个示意图;
图9是本发明实施例的解压缩系统对压缩数据进行解压缩的一个示意性流程图;
图10是本发明实施例的用于数据解压缩的方法的一个示意性流程图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。
随着人工智能技术的发展,在越来越多的领域中都会涉及深度学习等算法。深度学习的核心之一是神经网络,例如卷积神经网络(Convolution Neural Networks,CNN)。在卷积神经网络的计算过程中,会产生大量的特征图数据,在将这些特征图数据写入到处理器的外部存储器,通常会使用数据压缩技术,如此能够减小对外部存储器的占用空间,且能够降低读写时的带宽。其中,外部存储器可以诸如为双倍速率同步动态随机存取存储器(Double Data Rate Synchronous Dynamic Random Access Memory),或简称为DDR。
但是,卷积神经网络一般会包括大量的卷积层,每个卷积层都会产生大量的特征图数据。这些大量的特征图数据对DDR进行读写操作时,会消耗宝贵的系统外部存储器带宽资源,从而导致其他的带宽需求大的模块(如CNN或其他模块)因为无法快速访问DDR而影响了计算性能。并且,由于对DDR的访问量增大,会进一步地导致功耗更高。
为了进一步地减小压缩后的数据量,进一步减小读写DDR时的带宽进而降低功耗,考虑到神经网络卷积层输出的特征图中零的情况以及相邻特征图之间数值接近的特点,可以在数据压缩过程中,通过考虑相邻像素之间的差值进行压缩存储。具体地,如图1所示,卷积神经网络计算所得到的特征图可以位于片上存储器10,然后压缩系统20将片上存储器10中的特征图进行压缩,并将压缩之后的数据存储在外部存储器30。
在压缩时,可以以压缩块为单位进行压缩,将至少一个压缩块进行压缩之后可以形成压缩数据,压缩数据可以为码流信息。另外,在进行压缩形成压缩数据时,还生成了与码流信息对应的头信息。
其中,一个压缩块包括M个压缩组,每个压缩组包括N个像素,M和N均为正整数。对一个压缩块进行压缩后的码流信息至少包括M个标志位,用于表示M个压缩组是否是全零的。如果M个压缩组中包括非全零压缩组,那么码流信息还包括残差长度和残差字段(RES)。其中,可以将标志位和残差长度统称为码头字段(HDR)。
作为一例,假设M=2,N=4。那么一个压缩块包括8个像素,例如为如图2所示的像素p1至p8。图2中所示的一个压缩块包括2个压缩组,第一个压缩组为p1至p4,第二个压缩组为p5至p8。
图2中所示的压缩块包括两个压缩组,因此对应的码流信息包括两个标志位。参照图3,标志位“00”表示两个压缩组都是全零的;“01”表示第一个压缩组全零,第二个压缩组非全零;“10”表示第一个压缩组非全零,第二个压缩组全零;“11”表示两个压缩组都非全零。
另外,如果不是所有的压缩组都全零,则还包括残差长度和残差字段,残差长度的数量与非全零的压缩组的数量一致且对应,针对图2中的例子,参照图3,如果标志位中一个“1”(即情形2),则残差长度的数量为1个;如果标志位中两个“1”(即情形3),则残差长度的数量为2个。其中,残差 长度用于表示其后对应的残差字段中每一个的位宽。以图3的情形3为例,残差长度Rbit1表示R0-R3中每个的位宽,残差长度Rbit2表示R4-R7中每个的位宽。而,R0-R7是通过对相邻像素作差(如图2)得到D1-D8后,再对差进行压缩后的得到的。
具体地,参照图2,如果第一个压缩组(即p1至p4)非全零且第二个压缩组(即p5至p8)非全零,则可以采用差值压缩方法得到该压缩块对应的压缩数据。其中,与两个压缩组对应的标志位均为“1”。
可以首先通过作差得到D1至D8。具体地,将一个压缩组的首个像素减去该像素的前一个像素得到第一个差值,将一个压缩组的其余像素减去该首个像素得到其余差值。针对第一个压缩组(即p1至p4):D1=p1-p0,D2=p2-p1,D3=p3-p1,D4=p4-p1,其中,p0是p1的前一个像素,p0可以属于上一个压缩组,其可以被存储在寄存器中。针对第二个压缩组(即p5至p8):D5=p5-p4,D6=p6-p5,D7=p7-p5,D8=p8-p5。其中,每个像素的像素值为8比特有符号数。
进一步地,在差值D1至D8的基础上得到R0至R7。具体地,D1至D8是有符号数,假设去掉符号位后对应的无符号数为d1至d8。针对第一个压缩组,通过判断d1至d4最少需要多少个比特位表示,比特位的数值范围为1~8,一共8种情况,可以用3bit的Rbit1进行标识。Rbit1的数值为0~7,其分别对应d1至d4比特位的数值范围1~8。去掉d1至d4的前面若干位只保留后面的(Rbit1+1)个比特位,然后再加上对应的D1至D4的符号位,从而得到R0至R3。类似地,针对第二个压缩组,判断d5至d8最少需要多少个比特位,即(Rbit2+1);去掉d5至d8的前面若干位只保留后面的(Rbit2+1)个比特位,然后再加上对应的D5至D8的符号位,从而得到R4至R7。
示例性地,对于图2和图3所示的例子,假设各个像素包括8比特,其中,1比特为符号位。那么,除符号位之外,各个像素的像素值用7比特表示,因此,每个像素的残差长度可以占3比特。在图3所示的各个情形中, 情形1对应一个压缩块压缩之后的压缩数据占2比特;情形2对应一个像素压缩之后的压缩数据占(5+(Rbit+2)×4)比特;情形3对应一个像素压缩之后的压缩数据占(8+(Rbit1+2)×4+(Rbit2+2)×4)比特。
与码流信息对应的头信息用于包括码流信息的基地址信息(如图4中的row_baddr所示)和长度信息。示例性地,一行的头信息可以包括一个行基地址信息和K个长度信息(如图4所示的row_len 0~row_len(k-1))。举例来说,行基地址信息占用32比特,长度信息占用12(即K=12)个8比特。
本发明实施例中,可以对压缩数据(码流信息)进行解压缩,从而得到解压之后的原始数据,即解压图像或称为输出特征图(Output Feature Map,OFM)。可理解,可以对指令进行灵活配置,从而根据指令选择对应的头信息和码流信息。举例来说,可以配置使得选择的头信息和码流信息对应特征图的非起始位置,例如,某个特定列或某个特定通道等,如此能够支持从压缩数据中进行列方向(FM_H)和通道方向(FM_C)的图像裁剪。在一个实施方式中,若第一头信息和第一码流信息不是对应于特征图的起始地址而是对应于特征图的需要裁剪的边界,则通过该第一头信息和该第一码流信息可以解压缩出经过裁剪的特征图。在一个实施方式中,对于列方向(FM_H)图像裁剪,可以配置头信息基地址info_baddr从头信息表格的某一行开始读取对应的码流信息。同时,可以配置特征图高度方向偏移FM_H_OFST,从一行头信息中某一个行长度row_len开始读取对应的码流信息。对于通道方向的图像裁剪,可以配置头信息基地址info_baddr从任意一张头信息表格(对应任意一个OFM通道),开始读取码流信息,从而解压缩通道方向裁剪的特征图。
示例性地,如图4示出了头信息、码流信息和解压图像的示意图。
解压图像可以为特征图,在一个实施方式中,解压图像为输出特征图(OFM)。输出特征图具有三个维度,分别为宽、高、通道,在图4示出为特征图宽度FM_W、特征图高度FM_H和特征图通道数FM_C。并且特征图 的基地址为OFM_BADDR,图像通道间的地址偏移为OFM_SEGM_LEN,特征图高度方向偏移为FM_H_OFST。
头信息包括多个表格,示例性地,针对一个特征图而言,一个表格可以对应一个特征图的通道,并且头信息通道间的地址偏移为INFO_SEGM_LEN。头信息的基地址为IMFO_BADDR。一行的头信息包括一个行基地址信息和K个长度信息,以图4中头信息的表格所示的第一行为例,包括行基地址信息为row_baddr 0,K个长度信息为row_len 0至row_len(k-1)。可以根据头信息中的基地址找到码流信息,如图4中码流信息的起始地址row_baddr 0。另外,可理解,根据头信息中的基地址找到的码流信息之后,可以提取码流信息中的残差信息,并且通过解压缩残差信息可以得到解压图像。
为了对如图1中压缩系统20进行压缩后的数据进行解压缩读取,本发明实施例提供了一种解压缩系统,仍参照图1,解压缩系统40从外部存储器30读取压缩数据,然后进行解压缩得到原始数据,随后将解压缩后的数据存入片上存储器10,以便处理器对片上存储器10中的数据进行处理。
本发明实施例中,用于数据解压缩的解压系统40(后简称为系统)至少包括:指令解析模块、读仲裁模块、至少两个解压缩路径以及写仲裁模块,如图5所示。
应当理解的是,本发明实施例中的解压缩路径的数量为至少两个,例如可以为3个甚至更多个,具体地可以依据处理器的性能、待处理的特征图数据的大小进行配置。
这样,本发明实施例能够使用一个系统灵活地实例化多个解压缩路径(decoding path,DEC_PATH),从而兼容各种数据输出速率的应用。具体地可以根据对特征图数据的处理需求来灵活地配置解压缩路径的数量,当一个解压缩路径无法满足任务性能需求时,通过配置解压缩路径的数量来提供解 压缩的性能,从而灵活地满足不同的处理任务的性能需求。
为了简化示意,图5中仅示出了两个解压缩路径,分别为解压缩路径1和解压缩路径2。
示例性地,指令解析模块可以接收解压缩指令,并解析该解压缩指令;随后可以根据已解析的解压缩指令,向至少两个解压缩路径中的各个解压缩路径分发通道解码指令。在一个实施方式中,指令解析模块从外部存储器读取解压缩指令。至少两个解压缩路径中的各个解压缩路径可以根据接收到的通道解码指令,分别获取各自对应的待解压数据,并对待解压数据进行解压缩得到解压缩后的数据,其中,待解压数据至少包括与一个压缩块对应的压缩数据;以及针对解压缩后的数据执行写入操作。
示例性地,读仲裁模块和写仲裁模块分别用于对来自至少两个解压缩路径的读取命令和写数据命令进行仲裁,以保证数据的有效正确传输。
指令解析模块,可以表示为INSTR_PROC(instruction process)模块,可以接收解压缩指令,随后可以对该解压缩指令进行解析,并根据解析的结果,将通道解码指令分发到各个解压缩路径。
具体地,当处理器的数据处理单元需要从片上存储器获取数据时,可以由处理器向该指令解析模块发送解压缩指令。该指令解析模块接收到解压缩指令后,可以对应地向各个解压缩路径分发通道解码指令,以便各个解压缩路径从外部存储器读取压缩数据并进行解压缩。其中,分发给某个解压缩路径的通道解码指令(表示为instr_cfgs)可以包括:头信息的基地址(info_baddr)、头信息的长度(info_len)、输出特征图的宽度(fm_w)、输出特征图的高度(fm_h)、输出特征图在高度方向的偏移(fm_h_ofst)、输出特征图的基地址(ofm_baddr)等。
示例性地,指令解析模块可以根据已解析的解压缩指令,将针对维度为 FM_W×FM_H×FM_C的特征图的一次解压的解压缩任务分发给至少两个解压缩路径(如P个)。例如,一种方式为将一个通道(channel)分配给多个不同的解压缩路径。并且,分发到解压缩路径的通道解码指令可以包含解压一个通道(channel)的特征图的所有必要信息。
读仲裁模块,可以表示为RD_ARB(read arbiter)模块,可以对来自至少两个解压缩路径的头信息读取命令和/或码流信息读取命令进行仲裁,并优先发送仲裁胜出的命令(例如,读命令)。在一个实施方式中,读仲裁模块依据仲裁胜出的结果,发送读命令发送至外部存储器。
其中,进行仲裁所使用的仲裁规则可以包括预先配置的优先级机制或者公平轮询机制。
示例性地,读仲裁模块根据仲裁规则确定仲裁胜出的读取命令(头信息读取命令和/或码流信息读取命令),并记录该胜出的读取命令的ID(例如其来源的解压缩路径的ID),向数据总线(DATA_BUS)发送该胜出的读取命令,并在从外部存储器接收数据返回后,根据所记录的胜出的读取命令的ID,将接收的返回数据发送至对应ID的解压缩路径。可理解,如果仲裁规则是优先级机制,那么可以优先保障高优先级的压缩路径的压缩处理,确保该优先级的压缩路径的任务性能。
写仲裁模块,可以表示为WR_ARB(write arbiter)模块,可以对来自至少两个解压缩路径的写数据命令进行仲裁,并优先发送仲裁胜出的命令。
其中,进行仲裁所使用的仲裁规则可以包括预先配置的优先级机制或者公平轮询机制。
示例性地,写仲裁模块根据仲裁规则确定仲裁胜出的写数据命令,并先将该胜出的写数据命令对应的解压缩数据通过数据总线(DATA_BUS)写入 到片上存储器。
解压缩路径,可以表示为DEC_PATH,参照图6,每个解压缩路径可以包括头信息装载模块、码流信息装载模块、头信息缓存模块、码流信息缓存模块、变长移位寄存器模块、数据解压缩模块和数据打包模块。应当注意的是,为了简化示意,图6中仅示出了解压缩路径1,对于至少两个解压缩路径中的其他解压缩路径(如解压缩路径2),所包含的模块结构以及实现的功能等都是类似的,不再详细列出。
头信息装载模块,可以表示为INFO_LOAD(Header information load)模块,也称为头信息装载解析模块,可以基于通道解码指令,确定头信息的基地址(info_baddr)和头信息的长度(info_len),随后发送头信息读取命令。其中,头信息读取命令可以包括头信息的基地址以及头信息的长度。可理解,该头信息读取命令发送到读仲裁模块以便读仲裁模块对来自至少两个解压缩模块的头信息读取命令进行仲裁。
在头信息装载模块获取读回的头信息(也称为头信息数据)之后,可以根据对该头信息进行解析以确定码流信息的基地址(表示为ifm_baddr)和码流信息的长度(ifm_len)。并将码流信息的基地址和码流信息的长度发送至码流信息装载模块,以便码流信息装载模块发送码流信息读取命令并获取码流信息。
其中,头信息装载模块基于头信息确定码流信息的基地址(ifm_baddr)和码流信息的长度(ifm_len)分为下述两种情况:
情况一:
当解析首行头信息时,需要考虑解压数据在高度方向的偏移(fm_h_ofst),从而确定码流信息的基地址为:
Figure PCTCN2020092608-appb-000001
其中,h st是图4中的解压缩头信息起始行,用于表示高度方向的起始。也就是说,头信息表格的某一行对应的压缩前原图的行序数h。ifm_baddr是码流的起始地址(即,码流信息的基地址),其是指从头信息表格的某一行开始读取对应的码流信息。fm_h_ofst为解压缩起始行对应于h st的偏移。
相应地,码流信息的长度为:
ifm_len=∑ fm_h_ofst≤h<krow_len h
其中,一行的头信息包括一个行基地址信息和K个长度信息,该行基地址信息表示为
Figure PCTCN2020092608-appb-000002
该行长度信息表示为row_len h
情况二:
当解析非首行头信息时,确定码流信息的基地址为:
ifm_baddr=row_baddr;
相应地,码流信息的长度为:
Figure PCTCN2020092608-appb-000003
在图4中,非首行头信息是指阴影部分的起始的第二行头信息。码流信息的基地址是指,阴影部分第二行对应的码流信息的起始地址就是该行头信息的row_baddr。h指的是头信息row_len的序号。只要h还在有效高度的范围内,即h<(h st+fm_h_ofst+fm_h),就可以累加row_len_h解析码流信息的长度。
另外,可以理解的是,当两行头信息解析得到的码流信息的基地址和长度在地址空间上是连续的(即ifm_baddr last+ifm_len last=ifm_baddr cur),可配置将两行头信息合并后输出给头信息装载模块,合并更新为:
ifm_baddr cur=ifm_baddr last,ifm_len cur=ifm_len last+ifm_len cur
这样,能够通过一个读取命令完成对原本需要两次才能获取的两个码流信息,节省了处理操作,减少了对带宽的占用,提升了效率。
码流信息装载模块,可以表示为IFM_LOAD(Input feature map load)模块,可以从头信息装载模块接收码流信息的基地址和码流信息的长度,并据此发送码流信息读取命令,获取码流信息。
具体地,可以根据数据总线的配置,将码流信息的基地址和码流信息的长度转换为满足最大猝发长度和地址对齐要求的码流信息读取命令,并发送至读仲裁模块,以便读仲裁模块对来自至少两个解压缩模块的码流信息读取命令进行仲裁。
头信息缓存模块,可以表示为INFO_FIFO模块,可以用于存储头信息装载模块从外部存储器读回的头信息。也就是,头信息装载模块从读仲裁模块接收到与其发送的头信息读取命令对应的头信息后,将接收到的头信息缓存在该INFO_FIFO模块。
码流信息缓存模块,可以表示为IFM_FIFO模块,可以用于存储码流信息装载模块从外部存储器读回的码流信息。也就是,码流信息装载模块从读仲裁模块接收到与其发送的码流信息读取命令对应的码流信息后,将接收到的码流信息缓存在该IFM_FIFO模块。
变长移位寄存器模块,可以表示为VLSR(Veri-Length shift register)模块,用于从码流信息缓存模块读取码流信息并以移位寄存器的形式进行缓存。
VLSR可以包括L个寄存空间,且每个寄存空间具有总线位宽(BW)个比特,其中,L为正整数,且本实施例对BW不具体限定,例如BW=16或BW=128等等。
在将码流信息缓存模块中的码流信息缓存在VLSR中时,可以将待缓存的码流信息首先写入L个寄存空间中的最高位寄存空间,并将最高位寄存空间的寄存标识置为第一标识;当存在更低位寄存空间的寄存标识为第二标识 时,将位于高位寄存空间的码流信息向低位寄存空间进行移位。其中,第一标志指示对应的寄存空间缓存有可用的码流信息,第二标志指示对应的寄存空间没有缓存有可用的码流信息。
以第一标识为1,第二标识为0为例进行阐述。对于包括L个寄存空间的VLSR,假设L个寄存空间为第0个、第1个、…、第L-1个,可以表示为vlsr_data l(l=0,1,…,L-1)。并且可以首先将每个寄存空间都初始化为:每个寄存空间的标识都为0。可选地,也可以使用L比特的一个字符串来一起标识L个寄存空间,如将标识初始化为“L个0”构成的字符串。
随后,从码流信息缓存模块读取BW比特的码流信息,缓存在第L-1个寄存空间,并将第L-1个寄存空间的标识变为1。由于第L-2个寄存空间的标识为0,那么此时在第L-1个寄存空间中的BW比特的码流信息可以移位到第L-2个寄存空间,并将第L-1个寄存空间的标识变为0,同时将第L-2个寄存空间的标识变为1。这样,只要有更低位的寄存空间的标识为0,则向更低位进行移位,直到最低位寄存空间(即第0个寄存空间)。
其中,寄存器的码头字段指针(表示为bit_ofst_cur_hdr)可以用于指示当前的码头字段在VLSR中的起始位置。当开始一帧(一行)数据解压缩时,码头字段指针的初始化位置为0。可理解,根据上述结合图2-图3部分关于压缩数据的描述可知,码头字段指针指示的是当前的压缩块的压缩数据在VLSR中的起始位置。
这里将沿用上述假设,即一个压缩块包括M个压缩组,每个压缩组包括N个像素。那么可以从码头字段指针处开始读取M个比特,即读取码头字段中的标志位,确定M个标志位中1的个数,并通过计算得出当前码头字段的长度(单位为比特):bit_len_cur_hdr=M+[标志位中1的个数]×残差长度的位宽。其中,“M”表示标志位所占用的比特数,“[标志位中1的个数]×残差长度的位宽”表示残差长度所占用的比特数。为了下文描述方便,可以假设标志位中1的个数为Q,Q≤M。
在计算得出当前码头字段的长度(bit_len_cur_hdr)之后,可以确定当前残差字段的起始位置(也称为当前残差字段指针)为:bit_ofst_cur_res=bit_ofst_cur_hdr+bit_len_cur_hdr。并且可以对其中的残差长度(即[标志位中1的个数]×残差长度的位宽)进行解析,从而确定每个压缩组残差字段的长度为:
Figure PCTCN2020092608-appb-000004
其中,
Figure PCTCN2020092608-appb-000005
表示Q个残差长度中第q个的解析结果。进而,可以得出当前压缩块的压缩数据中的残差字段的总长度为:
Figure PCTCN2020092608-appb-000006
并且,可以确定下一个压缩块的压缩数据的起始位置,即下一个压缩块的码头字段指针为:bit_ofst_nxt_hdr=bit_ofst_cur_res+bit_len_cur_res。
结合上述关于寄存器的标识,当寄存器的最低位寄存空间的寄存标识为第一标识(如1),且当前的码头字段指针(bit_ofst_cur_hdr)处的码头字段(HDR)和残差字段(RES)有效,则可以将当前的码头字段指针(bit_ofst_cur_hdr)处的码头字段(HDR)和残差字段(RES)发送至数据解压缩模块,以便数据解压缩模块对其进行解压缩。
另外,可理解,可以通过对VLSR进行更新并对码头字段指针进行更新,从而使得在完成对第一压缩块的压缩数据的解压缩之后,继续对第二压缩块的压缩数据进行解压缩,其中,第二压缩块是与第一压缩块相邻的下一个压缩块。
其中,更新的过程可以包括:将当前的码头字段指针(bit_ofst_cur_hdr)更新为bit_len_nxt_hdr。并且,如果更新的指针偏移(bit_len_nxt_hdr)大于总线位宽(BW),那么可以将寄存器中的码流信息进行移位并相应地更新标识,并从码流信息缓存模块继续读取BW bit的码流信息存入第L-1个寄存空间。否则这L个寄存空间中的码流信息和标识保持不变。
下面将结合图7所示来举例说明寄存器。图7中示出BW的一个典型值为16B,为了简化示例,图7中仅示出了两个寄存空间,其中位于右侧的为 低位寄存空间(LSB),位于左侧的为高位寄存空间(MSB)。来自码流信息缓存模块(IFM_FIFO)的码流信息首先缓存在最高位寄存空间,当更低位的寄存空间标识为0时,向低位进行移位,如图7中位于上方的两个虚线箭头所示。另外,作为一例,图7中还示出了当前的码头字段指针(bit_ofst_cur_hdr)的位置以及当前残差字段的起始位置(bit_ofst_cur_res)。进一步地,还示出了所确定的下一个压缩块的码头字段指针(bit_ofst_nxt_hdr)。并且,可理解,当完成对图7中当前的码头字段指针(bit_ofst_cur_hdr)处的码流数据的解压缩之后,可以将当前的码头字段指针进行更新,具体地更新至bit_len_nxt_hdr。由于bit_len_nxt_hdr大于BW,如图7中其位于左侧的高位寄存空间中,因此,此时可以通过将为位于左侧的高位寄存空间向右侧的低位寄存空间进行移位以进行更新,从而可以继续从码流信息缓存模块(IFM_FIFO)读取码流信息存在左侧的高位寄存空间中。
数据解压缩模块,表示为DATA_DEC(data decoder)模块,可以用于从VLSR中获取一个压缩块对应的压缩数据,并进行解压缩。
具体地,可以从寄存器的码头字段指针的位置处开始,获取一个压缩块对应的码流信息,包括码头字段(HDR)和残差字段(RES)。并进一步进行解压缩得到M×N个像素的原始数据。
示例性地,针对一个压缩组而言,可以利用残差长度获取对应的压缩组的N个残差字段,将N个残差字段进行位数拓展(如在其符号位之后加0)得到差值,然后再通过与前一像素值求和,恢复出N个像素值。
以M=2,N=4为例,结合图2和图3中的情形3,来描述数据解压缩模块。如图8所示,首先可以利用残差长度Rbit1计算出第一个压缩组中的4个残差字段的长度,随后选择对应的残差数据,依次为R0、R1、R2和R3。类似地,可以利用残差长度Rbit2计算出第二个压缩组中的4个残差字段的 长度并选择残差数据,依次为R4、R5、R6和R7。
随后,将R0~R7进行符号位拓展(在R0~R7的高位拓展R0~R7中的符号位)成为9bit的像素差值D1~D8。参照图8,数据选择器根据Rbit1和残差RES1(即R0~R3)并进行残差选择和符号位拓展得到D1~D4;数据选择器根据Rbit2和残差RES2(即R4~R7)并进行残差选择和符号位拓展得到D5~D8。
随后,将同压缩组内的像素差值除第一个之外的每个都与第一个相加,得到同初始像素的差值D’1~D’8:
{D’1,D’2,D’3,D’4}={D1,D2+D1,D3+D1,D4+D1},
{D’5,D’6,D’7,D’8}={D5,D6+D5,D7+D5,D8+D5}。
随后,根据上述D’1到D’8恢复出像素值p1至p8。具体地,将D’1到D’4同初始像素p0相加,将D’5到D’8同像素p4相加,计算p1=D’1+p0,p2=D’2+p0,p3=D’3+p0,p4=D’4+p0以及p5=D’5+p4,p6=D’6+p4,p7=D’7+p4,p8=D’8+p4。其中,p0为该压缩块之前的压缩块中的最后一个像素值,可以从初始像素寄存器(reg_pi)中获取。
如此,便可以通过解压缩得到一个压缩块中的所有像素值。并且,还可以将该压缩块的最后一个像素值,即p8,存入到初始像素寄存器(reg_pi)中,以便用于下一个压缩块的解压缩。
应当理解的是,本发明实施例中的数据解压缩模块进行解压缩的算法,例如哪个数与哪个数相加等,依赖于数据压缩所使用的算法,也就是说,解压缩与压缩的过程应该是对应的。
示例性地,数据解压缩模块还可以对解压缩得到像素的个数进行计数(表示为pix_cnt),当该计数达到一行时,即pix_cnt=FM_W时,将初始像素寄存器(reg_pi)和像素个数计数(pix_cnt)均复位为0。
可见,通过上述解压缩系统40,能够逐行地实现对压缩数据的解压缩过 程,因此可以称为行模式的特征图解压缩系统。
数据打包模块,可以表示为DATA_PACKER模块,其用于将数据解压缩模块得到的解压缩后的数据进行数据打包,以匹配总线位宽。进一步地,还可以向写仲裁模块发送写数据命令,以便写仲裁模块对来自至少两个解压缩模块的写数据命令进行仲裁,并将数据打包之后的数据写入到片上存储器。
另外,应当注意的是,尽管在上述图5和图6中示出了部分箭头以表示信号、信息、数据、命令、等传输的方向,但这仅仅是示意性的。例如,当完成对特征图的一个通道(channel)的数据解压缩之后,头信息装载模块可以向指令解析模块发送指令结束信号(例如表示为instr_done_path),以便进一步从指令解析模块获取针对下一个通道的解压缩指令等。
基于上述关于解压缩系统40的描述可知,本发明实施例提供的解压缩系统包括至少两个解压缩路径,能够根据需要灵活配置数量,以兼容各种数据输出速率的应用,满足不同的处理任务的性能需求。并且,该解压缩系统40能够将压缩系统采用差值方法压缩之后数据进行解压缩以恢复原始数据,用于处理器后续对数据的处理过程。由于压缩数据的占用空间小,从而能够减小处理器与外部存储器之间进行数据读取时的带宽,减小了对外部存储器进行读取的次数,进而能够降低功耗。
示例性地,本发明实施例中的解压缩系统40对压缩数据进行解压缩的过程可以如图9所示。具体地,指令解析模块可以接收解压缩指令,解析解压缩指令,并根据已解析的解压缩指令,向至少两个解压缩路径中的各个解压缩路径分发通道解码指令。每一个解压缩路径均在接收到通道解码指令后,执行数据解压缩操作,并将解压之后的数据写入片上存储器。在完成对一个通道的数据解压缩之后,继续对下一个通道的数据进行解压缩,直到完成对 解压缩指令所指示的所有数据的解压缩过程。其中,一个解压缩路径执行数据解压缩操作包括:头信息装载模块解析通道解码指令后,发送头信息读取命令,相应地读取头信息,通过对头信息进行解析确定码流信息的基地址和长度信息。码流信息装载模块根据头信息装载模块对头信息的解析结果,发送码流信息读取命令,相应地读取码流信息。码流信息在VLSR中被长度解析,并由数据解压缩模块进行解压缩。数据打包模块将解压缩后的数据进行打包,并发送写数据命令。
示例性地,本发明实施例的用于数据解压缩的方法的另一示意性流程图如图10所示,其中包括:
S101,接收解压缩指令,并解析该解压缩指令;
S102,根据已解析的解压缩指令,向至少两个解压缩路径中的各个解压缩路径分发通道解码指令;
S103,根据接收到的通道解码指令,至少两个解压缩路径中的每个解压缩路径分别获取各自对应的待解压数据,并对待解压数据进行解压缩得到解压缩后的数据,其中,待解压数据至少包括与一个压缩块对应的压缩数据;
S104,针对解压缩后的数据执行写入操作。
示例性地,S101和S102可以由解压缩系统的指令解析模块执行。示例性地,S101可以包括:当处理器的数据处理单元需要从片上存储器获取数据时,可以从处理器接收解压缩指令,并通过解析确定解压缩指令中所包含的内容,例如解压缩指令可以包括:待解压的特征图的数量,这些数量的特征图对应的压缩数据在外部存储器的基地址以及图间存储间隔,这些数量的特征图输出到片上存储器时的基地址以及图间存储间隔,每个特征图数据的宽度、高度、通道数,等等。
假设解压缩指令包括需要解压得到多张特征图,那么可以逐一地得到各个特征图。针对一个特征图,示例性地,可以逐通道地进行解压缩以得到特 征图。例如,对于特征图A,如果其通道数为FM_C,可以先针对第一个通道数执行S103和S104,随后再针对第二个通道执行S103和S104,…,直到完成第FM_C个通道。
示例性地,S102可以包括:根据已解析的解压缩指令,确定当前待解压的特征图的通道(如特征图A的第一个通道),并相应地确定至少两个解压缩路径中的各个解压缩路径的通道解码指令。随后,向各个解压缩路径分发对应的通道解码指令。
这样,可以将一个通道的解压任务分配给至少两个解压缩路径进行执行,提高并行度,能够尽快地得出特征图的一个通道的原始数据用于处理器的数据处理。
示例性地,发送至一个解压缩路径的通道解码指令可以包括:该解压缩路径待得到的特征图的宽度和高度,该特征图的压缩数据的头信息在外部存储器的基地址和长度信息,解压后将要写入片上存储器的基地址等。
在此之后,每一个解压缩路径都可以根据接收到的通道解码指令执行S103和S104。其中,为了获取待解压数据,可以依次发送头信息读取命令和码流信息读取命令,从而获取头信息和码流信息。
示例性地,该方法还可以包括:对来自至少两个解压缩路径的头信息读取命令进行仲裁,并优先发送仲裁胜出的头信息读取命令。对来自至少两个解压缩路径的码流信息读取命令进行仲裁,并优先发送仲裁胜出的码流信息读取命令。其中,在进行仲裁时所使用的仲裁规则可以包括预先配置的优先级机制或者公平轮询机制等。
下面将结合一个解压缩路径(例如图6中所示的解压缩路径1)进行解压缩的具体过程来描述S103。
S103中的待解压数据包括头信息和码流信息,S103可以包括:根据接收到的通道解码指令发送头信息读取命令;获取头信息;根据头信息发送码 流信息读取指令;获取码流信息。
具体地,可以对接收到的通道解码指令进行解析,从而得到头信息的基地址以及头信息的长度。随后再发送头信息读取命令,其中该头信息读取命令包括头信息的基地址以及头信息的长度。进一步地,可以接收与该头信息读取命令对应的头信息。可理解,此处的头信息的基地址是指该头信息在外部存储器的基地址。
具体地,在获取头信息后,可以对头信息进行解析,以确定码流信息的基地址以及码流信息的长度。随后再发送码流信息读取命令,其中,该码流信息读取命令包括码流信息的基地址和码流信息的长度。进一步地,可以接收与该码流信息读取命令对应的码流信息。可理解,此处的码流信息的基地址是指该码流信息在外部存储器的基地址。并且,可选地,码流信息读取命令可以是根据总线的配置转换后的满足最大猝发长度和地址对齐要求的码流信息读取命令。也就是说,首先根据总线的配置转换为满足最大猝发长度和地址对齐要求的码流信息读取命令之后再进行发送。
结合图6,可以由头信息装载模块执行:对接收到的通道解码指令进行解析,发送头信息读取命令,获取头信息,对头信息进行解析,以确定码流信息的基地址以及码流信息的长度。可以由码流信息装载模块执行:发送码流信息读取命令,并获取码流信息。
示例性地,头信息的一行可以包括一个基地址和K个长度信息。在对头信息进行解析已确定码流信息的基地址和码流信息的长度信息时:如果头信息是特征图的首行头信息,则码流信息的基地址以及码流长度是基于首行头信息所标识的基地址以及特征图的高度方向的偏移所确定的;如果头信息是特征图的非首行头信息,则码流信息的基地址是首行头信息所标识的基地址。
头信息的一行包括一个行基地址信息(表示为
Figure PCTCN2020092608-appb-000007
)和K个长度信息(row_len h)。那么:
情况一:
当解析首行头信息时,需要考虑解压数据在高度方向的偏移(fm_h_ofst),从而确定码流信息的基地址为:
Figure PCTCN2020092608-appb-000008
相应地,码流信息的长度为:
ifm_len=∑ fm_h_ofst≤h<krow_len h
情况二:
当解析非首行头信息时,确定码流信息的基地址为:
ifm baddr=row baddr
相应地,码流信息的长度为:
ifm_len=∑ h<fm_h_ofst+fm_hrow_len h
另外,可以理解的是,当两行头信息解析得到的码流信息的基地址和长度在地址空间上是连续的(即ifm_baddr last+ifm_len last=ifm_baddr cur),可配置将两行头信息合并后输出给头信息装载模块,合并更新为:
ifm_baddr cur=ifm_baddr last,ifm_len cur=ifm_len last+ifm_len cur
这样,能够通过一个读取命令完成对原本需要两次才能获取的两个码流信息,节省了处理操作,减少了对带宽的占用,提升了效率。
在另一实施方式中,根据头信息,可以确定码流信息的基地址和码流信息的长度。其中,所述码流信息的基地址和所述码流信息的长度在地址空间上是连续的。并且,所述码流信息是对多个第一码流信息合并而得到的。其中,多个第一码流信息包括当前码流信息和当前码流信息的上一条码流信息。
示例性地,S103中对待解压数据进行解压缩可以包括:将码流信息缓存在寄存器中,该码流信息包括至少一个压缩块对应的压缩数据;从寄存器中读取一个压缩块对应的压缩数据并进行解压缩,得到该一个压缩块中的各个像素。
其中,寄存器可以为变长移位寄存器,该寄存器可以包括L个寄存空间,每个寄存空间具有BW比特。将码流信息缓存在寄存器中可以包括:将码流信息首先写入L个寄存空间中的最高位寄存空间,并将最高位寄存空间的寄存标识置为第一标识;当存在更低位寄存空间的寄存标识为第二标识时,将码流信息向低位寄存空间进行移位。
可理解,L个寄存空间初始时都具有第二标识,作为一例,第一标识为1,第二标识为0。可理解,也可以采用其他的形式表示第一标识和第二标识,例如1和0,0和1。或者,可以用L位的字符串来表示L个寄存空间的标识等,本发明对此不限定。
其中,从寄存器中读取一个压缩块对应的压缩数据并进行解压缩,可以包括:从寄存器的当前码头字段指针的位置开始,读取一个压缩块对应的码流信息,并进行解压缩。
具体地,当L个寄存空间中的最低位寄存空间具有第一标识(如1)时,开始从当前码头字段指针的位置处进行解压缩。
并且,在完成对第一压缩块对应的第一压缩数据的解压缩之后,根据第一压缩数据的长度确定第二压缩块对应的第二压缩数据的位置,并将当前码头字段指针移位到第二压缩块对应的第二压缩数据的位置,以便对第二压缩块对应的第二压缩数据进行解压缩。
示例性地,可以假设一个压缩块包括M个压缩组,每个压缩组包括N个像素。一个压缩块对应的压缩数据包括码头字段,码头字段至少包括M个标志位,用于表示M个压缩组是否为全零。如果M个标志位中的第二标志位指示M个压缩组中的第二压缩组为非全零,那么一个压缩块对应的压缩数据还包括与第二标志位对应的残差长度以及残差字段。其中,与第二标志位对应的残差字段包括N个有符号位残差,每个有符号位残差除去符号位之外的长度为该残差长度。
在对压缩数据进行解压缩时,如果M个标志位中的第一标志位指示M个压缩组中的第一压缩组为全零,则将第一压缩组解压缩为N个零像素;也就是说,第一压缩组包括的N个像素的像素值都为0(例如000000000,第一个0为符号位)。如果M个标志位中的第二标志位指示M个压缩组中的第二压缩组为非全零,一个压缩块对应的压缩数据还包括与第二标志位对应的残差长度以及残差字段,则根据与第二标志位对应的残差长度以及残差字段将第二压缩组解压缩为N个非全零像素。
具体地,可以根据M个标志位中指示非全零的标志位的数量,以及残差长度的位宽,确定残差字段的位置;从残差字段的位置开始,读取数量个(如Q个)残差字段;将数量个残差字段中的每个残差字段解压缩为N个像素。
如果M个标志位中的第二标志位指示M个压缩组中的第二压缩组为非全零,那么可以获取与该第二压缩组对应的残差字段,该残差字段包括N个有符号位残差。将该残差字段解压缩为N个像素,包括:根据N个有符号位残差,通过符号位扩展,确定对应的N个有符号数像素差值;根据N个有符号数像素差值确定N个像素同初始像素的差值;根据N个像素同初始像素的差值确定N个像素的像素值。
其中,初始像素是指:如果第一个无符号位像素的位置是特征图的一行的第一个位置,则初始像素为零;否则初始像素为位于N个像素之前的最后一个像素。
以M=2,N=4为例,结合图2和图3中的情形3,来描述数据解压缩模块。首先可以利用Rbit1计算出第一个压缩组中的4个残差字段的长度,随后选择对应的残差数据,依次为R0、R1、R2和R3。类似地,可以利用Rbit2计算出第二个压缩组中的4个残差字段的长度并选择残差数据,依次为R4、R5、R6和R7。
随后,将R0~R7进行符号位拓展(在R0~R7的高位拓展R0~R7中的符号位)成为9bit的像素差值D1~D8。
随后,将同压缩组内的像素差值除第一个之外的每个都与第一个相加,得到同初始像素的差值D’1~D’8:
{D’1,D’2,D’3,D’4}={D1,D2+D1,D3+D1,D4+D1},
{D’5,D’6,D’7,D’8}={D5,D6+D5,D7+D5,D8+D5}。
随后,根据上述D’1到D’8恢复出像素值p1至p8。具体地,将D’1到D’4同初始像素p0相加,将D’5到D’8同像素p4相加,计算p1=D1+p0,p2=D2+p0,p3=D3+p0,p4=D4+p0以及p5=D5+p4,p6=D6+p4,p7=D7+p4,p8=D8+p4。其中,p0为该压缩块之前的压缩块中的最后一个像素值,可以从初始像素寄存器(reg_pi)中获取。
如此,便可以通过解压缩得到一个压缩块中的所有像素值。并且,还可以将该压缩块的最后一个像素值,即p8,存入到初始像素寄存器(reg_pi)中,以便用于下一个压缩块的解压缩。
应当理解的是,本发明实施例中的数据解压缩模块进行解压缩的算法,例如哪个数与哪个数相加等,依赖于数据压缩所使用的算法,也就是说,解压缩与压缩的过程应该是对应的。
示例性地,S104中,可以将解压缩后的数据进行数据打包,以匹配总线位宽;将数据打包之后的数据执行写入操作。
其中,执行写入操作可以包括:发送写数据命令,以便将解压缩之后的数据写入到片上存储器。具体地,可以根据解压缩指令中所包含的特征图在片上存储器的基地址进行写入操作。
示例性地,该方法还可以包括:对来自至少两个解压缩路径的写数据命令进行仲裁,并优先发送仲裁胜出的写数据命令。其中,在进行仲裁时所使用的仲裁规则可以包括预先配置的优先级机制或者公平轮询机制等。
由此可见,本发明实施例提供的用于数据解压缩的方法能够将压缩系统采用差值方法压缩之后数据进行解压缩以恢复原始数据,用于处理器后续对数据的处理过程。由于压缩数据的占用空间小,从而能够减小处理器与外部存储器之间进行数据读取时的带宽,减小了对外部存储器进行读取的次数,进而能够降低功耗。
应当理解的是,本发明实施例的用于数据解压缩的系统能够实现在处理器上,例如可以是计算机、服务器、工作站、移动终端、云台等各种设备的处理器上。并且,处理器通过解压缩得到的解压缩的数据可以是特征图数据,该特征图数据例如可以是在执行卷积神经网络的过程中生成的,并且可以继续用于卷积神经网络的其他迭代操作等。
示例性地,本发明实施例还提供了一种处理器,该处理器可以包括片上存储器以及解压缩系统40。示例性地,处理器还可以包括压缩系统20。其中,解压缩系统40可以如图5或图6所示。
本发明实施例中,处理器可以包括中央处理单元(Central Processing Unit,CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,例如现场可编程门阵列(Field-Programmable Gate Array,FPGA)或进阶精简指令集机器(Advanced RISC(Reduced Instruction Set Computer)Machine,ARM)等,并且处理器可以包括其他组件以执行各种期望的功能。
应当理解的是,本发明实施例中的“特征图”、“特征图数据”、“原始特征图”等术语在没有相反指示的前提下,是指经本发明实施例的解压缩系统将压缩数据进行解压缩之后的数据,其可以具有宽度、高度、通道三个维度,或者可选地可以具有宽度和高度两个维度。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结 合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
另外,本发明实施例还提供了一种计算机存储介质,其上存储有计算机程序。当所述计算机程序由处理器执行时,可以实现前述所示的用于数据解压缩的方法的步骤。例如,该计算机存储介质为计算机可读存储介质。例如,计算机程序指令在被计算机或处理器运行时使计算机或处理器执行如图9或图10等所示的方法的步骤。
在一个实施例中,所述计算机程序指令在被计算机或处理器运行时使计算机或处理器执行以下步骤:接收解压缩指令,并解析所述解压缩指令;根据已解析的解压缩指令,向至少两个解压缩路径中的各个解压缩路径分发通道解码指令;根据接收到的通道解码指令,所述至少两个解压缩路径中的每个解压缩路径分别获取各自对应的待解压数据,并对所述待解压数据进行解压缩得到解压缩后的数据,其中,所述待解压数据至少包括与一个压缩块对应的压缩数据;针对所述解压缩后的数据执行写入操作。
计算机存储介质例如可以包括智能电话的存储卡、平板电脑的存储部件、个人计算机的硬盘、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、或者上述存储介质的任意组合。计算机可读存储介质可以是一个或多个计算机可读存储介质的任意组合。
另外,本发明实施例还提供了一种计算机程序产品,其包含指令,当该指令被计算机所执行时,使得计算机执行上述如图9或图10所示的用于数据解压缩的方法的步骤。
在一个实施例中,当该指令被计算机所执行时,使得计算机执行:接收解压缩指令,并解析所述解压缩指令;根据已解析的解压缩指令,向至少两 个解压缩路径中的各个解压缩路径分发通道解码指令;根据接收到的通道解码指令,所述至少两个解压缩路径中的每个解压缩路径分别获取各自对应的待解压数据,并对所述待解压数据进行解压缩得到解压缩后的数据,其中,所述待解压数据至少包括与一个压缩块对应的压缩数据;针对所述解压缩后的数据执行写入操作。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
可见,本发明实施例提供的解压缩系统包括至少两个解压缩路径,能够根据需要灵活配置数量,以兼容各种数据输出速率的应用,满足不同的处理任务的性能需求。并且,该解压缩系统能够将压缩系统采用差值方法压缩之后数据进行解压缩以恢复原始数据,用于处理器后续对数据的处理过程。由于压缩数据的占用空间小,从而能够减小处理器与外部存储器之间进行数据读取时的带宽,减小了对外部存储器进行读取的次数,进而能够降低功耗。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理器中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (40)

  1. 一种用于数据解压缩的方法,其特征在于,包括:
    接收解压缩指令,并解析所述解压缩指令;
    根据已解析的解压缩指令,向至少两个解压缩路径中的各个解压缩路径分发通道解码指令;
    根据接收到的通道解码指令,所述至少两个解压缩路径中的每个解压缩路径分别获取各自对应的待解压数据,并对所述待解压数据进行解压缩得到解压缩后的数据,其中,所述待解压数据至少包括与一个压缩块对应的压缩数据;
    针对所述解压缩后的数据执行写入操作。
  2. 根据权利要求1所述的方法,其特征在于,所述待解压数据包括头信息和码流信息,
    所述根据接收到的通道解码指令获取待解压数据,包括:
    所述根据接收到的通道解码指令发送头信息读取命令;
    获取所述头信息,所述头信息至少包括所述码流信息的基地址;
    根据所述头信息发送码流信息读取命令;
    获取所述码流信息。
  3. 根据权利要求2所述的方法,其特征在于,所述根据接收到的通道解码指令发送头信息读取命令,包括:
    对所述通道解码指令进行解析,得到所述头信息的基地址和所述头信息的长度;
    发送所述头信息读取命令,所述头信息读取命令包括所述头信息的基地址和所述头信息的长度。
  4. 根据权利要求2或3所述的方法,其特征在于,根据所述头信息发送码流信息读取命令,包括:
    根据所述头信息确定所述码流信息的基地址和所述码流信息的长度;
    发送所述码流信息读取命令,所述码流信息读取命令包括所述码流信息的基地址和所述码流信息的长度;
    其中,所述码流信息的基地址和所述码流信息的长度在地址空间上是连 续的,并且所述码流信息是对多个第一码流信息合并而得到的。
  5. 根据权利要求4所述的方法,其特征在于,
    如果所述头信息是特征图的首行头信息,则所述码流信息的基地址以及所述码流信息的长度是基于所述首行头信息所标识的基地址以及所述特征图的高度方向的偏移所确定的;
    如果所述头信息是特征图的非首行头信息,则所述码流信息的基地址是所述首行头信息所标识的基地址。
  6. 根据权利要求2至5中任一项所述的方法,其特征在于,还包括:
    根据仲裁规则,对来自所述至少两个解压缩路径的头信息读取命令进行仲裁,和/或,对来自所述至少两个解压缩路径的码流信息读取命令进行仲裁,并优先发送仲裁胜出的命令。
  7. 根据权利要求2至6中任一项所述的方法,其特征在于,所述码流信息读取命令是根据总线的配置转换后的满足最大猝发长度和地址对齐要求的码流信息读取命令。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,对所述待解压数据进行解压缩得到解压缩后的数据,包括:
    将所述码流信息缓存在寄存器中,所述码流信息包括至少一个压缩块对应的压缩数据;
    从所述寄存器中读取一个压缩块对应的压缩数据并进行解压缩,得到所述压缩块中的各个像素。
  9. 根据权利要求8所述的方法,其特征在于,所述寄存器为变长移位寄存器且包括L个寄存空间,其中每个寄存空间具有总线位宽BW个比特,L为正整数;
    将所述码流信息缓存在寄存器中,包括:
    将所述码流信息首先写入所述L个寄存空间中的最高位寄存空间,并将所述最高位寄存空间的寄存标识置为第一标识;
    当存在更低位寄存空间的寄存标识为第二标识时,将所述码流信息向低位寄存空间进行移位。
  10. 根据权利要求9所述的方法,其特征在于,当所述L个寄存空间中 的最低位寄存空间的寄存标识为第一标识时,从所述最低位寄存空间开始进行解压缩。
  11. 根据权利要求9或10所述的方法,其特征在于,所述第一标识为1,所述第二标识为0。
  12. 根据权利要求8至11中任一项所述的方法,其特征在于,将一个压缩块对应的压缩数据进行解压缩,包括:
    从当前码头字段指针的位置开始,对一个压缩块对应的压缩数据进行解压缩。
  13. 根据权利要求12所述的方法,其特征在于,所述一个压缩块包括M个压缩组,每个压缩组包括N个像素,
    所述一个压缩块对应的压缩数据包括码头字段,所述码头字段至少包括M个标志位,用于表示所述M个压缩组是否为全零。
  14. 根据权利要求13所述的方法,其特征在于,
    如果所述M个标志位中的第一标志位指示所述M个压缩组中的第一压缩组为全零,则将所述第一压缩组解压缩为N个零像素;
    如果所述M个标志位中的第二标志位指示所述M个压缩组中的第二压缩组为非全零,所述一个压缩块对应的压缩数据还包括与所述第二标志位对应的残差长度以及残差字段,则根据所述与所述第二标志位对应的残差长度以及残差字段将所述第二压缩组解压缩为N个非全零像素。
  15. 根据权利要求14所述的方法,其特征在于,
    与所述第二标志位对应的残差字段包括N个有符号位残差,每个有符号位残差除去符号位之外的长度为所述残差长度。
  16. 根据权利要求14或15所述的方法,其特征在于,
    根据所述M个标志位中指示非全零的标志位的数量,以及所述残差长度的位宽,确定所述残差字段的位置;
    从所述残差字段的位置开始,读取所述数量个残差字段;
    将所述数量个残差字段中的每个残差字段解压缩为N个像素。
  17. 根据权利要求16所述的方法,其特征在于,一个残差字段包括N个有符号位残差,
    将所述一个残差字段解压缩为N个像素,包括:
    根据所述N个有符号位残差,通过符号位扩展,确定对应的N个有符号数像素差值;
    根据所述N个有符号数像素差值确定N个像素同初始像素的差值;
    根据所述N个像素同初始像素的差值,确定所述N个像素的像素值。
  18. 根据权利要求17所述的方法,其特征在于,所述N个无符号位残差包括第一个无符号位残差和其他N-1个无符号位残差,
    根据所述N个无符号位残差,确定N个无符号位像素,包括:
    将所述其他N-1个无符号位残差都与所述第一个无符号位残差相加得到N个二级残差,其中,所述二级残差中的第一个二级残差等于所述第一个无符号位残差;
    根据所述N个二级残差中的所述第一个二级残差以及初始像素,确定所述N个无符号位像素中的第一个无符号位像素;
    根据所述N个二级残差中的其余N-1个二级残差以及所述第一个无符号位像素,确定其余N-1个无符号位像素。
  19. 根据权利要求18所述的方法,其特征在于,
    如果所述第一个无符号位像素的位置是特征图的一行的第一个位置,则所述初始像素为零;否则所述初始像素为位于所述N个像素之前的最后一个像素。
  20. 根据权利要求12至19中任一项所述的方法,其特征在于,
    在完成对第一压缩块对应的第一压缩数据的解压缩之后,根据所述第一压缩数据的长度确定第二压缩块对应的第二压缩数据的位置,并将所述当前码头字段指针移位到所述第二压缩块对应的第二压缩数据的位置,以便对所述第二压缩块对应的第二压缩数据进行解压缩。
  21. 根据权利要求1至20中任一项所述的方法,其特征在于,针对所述解压缩后的数据执行写入操作,包括:
    将所述解压缩后的数据进行数据打包,以匹配总线位宽;
    将所述数据打包之后的数据执行写入操作。
  22. 根据权利要求21所述的方法,其特征在于,在执行写入操作的过程 中,还包括:
    根据仲裁规则,对来自所述至少两个解压缩路径的写数据命令进行仲裁,并优先发送仲裁胜出的命令。
  23. 根据权利要求6或22所述的方法,其特征在于,所述仲裁规则包括预先配置的优先级机制或者公平轮询机制。
  24. 一种用于数据解压缩的系统,其特征在于,包括指令解析模块和至少两个解压缩路径,
    所述指令解析模块,被配置为:
    接收解压缩指令,并解析所述解压缩指令;
    根据已解析的解压缩指令,向所述至少两个解压缩路径中的各个解压缩路径分发通道解码指令;
    所述至少两个解压缩路径中的每个解压缩路径,被配置为:
    根据接收到的通道解码指令,获取对应的待解压数据,并对所述待解压数据进行解压缩得到解压缩后的数据,其中,所述待解压数据至少包括与一个压缩块对应的压缩数据;
    针对所述解压缩后的数据执行写入操作。
  25. 根据权利要求24所述的系统,其特征在于,所述每个解压缩路径包括:头信息装载模块、码流信息装载模块、数据解压缩模块,
    所述头信息装载模块,被配置为:
    根据接收到的通道解码指令发送头信息读取命令;
    获取与所述头信息读取命令对应的头信息;
    对所述头信息进行解析,确定码流信息的基地址和码流信息的长度信息;
    所述码流信息装载模块,被配置为:
    从所述头信息装载模块获取所述码流信息的基地址和所述码流信息的长度信息;
    发送码流信息读取命令;
    获取与所述码流信息读取命令对应的码流信息;
    所述数据解压缩模块,被配置为:
    对所述码流信息进行解压缩以得到解压缩后的数据。
  26. 根据权利要求25所述的系统,其特征在于,所述头信息装载模块,被具体配置为:
    如果所述头信息是特征图的首行头信息,则所述码流信息的基地址是基于所述首行头信息所标识的基地址以及所述特征图的高度方向的偏移所确定的;
    如果所述头信息是特征图的非首行头信息,则所述码流信息的基地址是所述首行头信息所标识的基地址。
  27. 根据权利要求25或26所述的系统,其特征在于,所述码流信息读取命令是根据总线的配置转换后的满足最大猝发长度和地址对齐要求的码流信息读取命令。
  28. 根据权利要求25至27中任一项所述的系统,其特征在于,所述每个解压缩路径还包括变长移位寄存器模块,
    所述数据解压缩模块,被具体配置为:
    从所述变长移位寄存器模块获取一个压缩块对应的压缩数据;
    对所获取的一个压缩块对应的压缩数据进行解压缩,得到所述压缩块中的各个像素。
  29. 根据权利要求28所述的系统,其特征在于,所述每个解压缩路径还包括头信息缓存模块和码流信息缓存模块,
    所述头信息缓存模块,被配置为存储所述头信息装载模块所获取的所述头信息;
    所述码流信息缓存模块,被配置为存储所述码流信息装载模块所获取的所述码流信息;
    其中,所述码流信息缓存模块中的所述码流信息缓存到所述变长移位寄存器模块中,以便用于所述数据解压缩模块进行解压缩。
  30. 根据权利要求29所述的系统,其特征在于,所述变长移位寄存器包括L个寄存空间,其中每个寄存空间具有总线位宽BW个比特,L为正整数;
    所述变长移位寄存器模块,被配置为:
    将来自所述码流信息缓存模块的所述码流信息首先写入所述L个寄存空间中的最高位寄存空间,并将所述最高位寄存空间的寄存标识置为第一标 识;
    当存在更低位寄存空间的寄存标识为第二标识时,将所述码流信息向低位寄存空间进行移位。
  31. 根据权利要求30所述的系统,其特征在于,所述第一标识为1,所述第二标识为0。
  32. 根据权利要求30或31所述的系统,其特征在于,所述数据解压缩模块,被具体配置为:
    当所述L个寄存空间中的最低位寄存空间的寄存标识为第一标识时,从当前码头字段指针的位置开始,对一个压缩块对应的压缩数据进行解压缩。
  33. 根据权利要求32所述的系统,其特征在于,所述数据解压缩模块,还被配置为:
    在完成对第一压缩块对应的第一压缩数据的解压缩之后,根据所述第一压缩数据的长度确定第二压缩块对应的第二压缩数据的位置,并将所述当前码头字段指针移位到所述第二压缩块对应的第二压缩数据的位置,以便对所述第二压缩块对应的第二压缩数据进行解压缩。
  34. 根据权利要求25至33中任一项所述的系统,其特征在于,所述每个解压缩路径还包括数据打包模块,被配置为:将所述解压缩后的数据进行打包,并发送写数据命令。
  35. 根据权利要求34所述的系统,其特征在于,所述数据打包模块,被配置为:
    将所述数据解压缩模块得到的所述解压缩后的数据进行数据打包,以匹配总线位宽;
    发送写数据命令,以便将所述打包后的数据写入到片上存储器。
  36. 根据权利要求24至35中任一项所述的系统,其特征在于,所述系统还包括读仲裁模块,被配置为:对来自所述至少两个解压缩路径的头信息读取命令和/或码流信息读取命令进行仲裁,并优先发送仲裁胜出的命令。
  37. 根据权利要求24至36中任一项所述的系统,其特征在于,所述系统还包括写仲裁模块,被配置为:对来自所述至少两个解压缩路径的写数据命令进行仲裁,并优先发送仲裁胜出的写数据命令。
  38. 根据权利要求36或37所述的系统,其特征在于,进行仲裁所使用的仲裁规则包括预先配置的优先级机制或者公平轮询机制。
  39. 一种处理器,其特征在于,包括:
    片上存储器;以及
    如权利要求24至38中任一项所述的系统。
  40. 一种计算机存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至23中任一项所述方法的步骤。
PCT/CN2020/092608 2020-05-27 2020-05-27 数据解压缩的方法、系统、处理器及计算机存储介质 WO2021237510A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/092608 WO2021237510A1 (zh) 2020-05-27 2020-05-27 数据解压缩的方法、系统、处理器及计算机存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/092608 WO2021237510A1 (zh) 2020-05-27 2020-05-27 数据解压缩的方法、系统、处理器及计算机存储介质

Publications (1)

Publication Number Publication Date
WO2021237510A1 true WO2021237510A1 (zh) 2021-12-02

Family

ID=78745226

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092608 WO2021237510A1 (zh) 2020-05-27 2020-05-27 数据解压缩的方法、系统、处理器及计算机存储介质

Country Status (1)

Country Link
WO (1) WO2021237510A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023169109A1 (zh) * 2022-03-11 2023-09-14 上海华兴数字科技有限公司 挖掘机控制方法、装置及挖掘机
CN117097346A (zh) * 2023-10-19 2023-11-21 深圳大普微电子股份有限公司 一种解压器及数据解压方法、系统、设备、计算机介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102473093A (zh) * 2009-06-30 2012-05-23 英特尔公司 对多个通道中的紧缩数据解压缩
US20120183079A1 (en) * 2009-07-30 2012-07-19 Panasonic Corporation Image decoding apparatus, image decoding method, image coding apparatus, and image coding method
CN103428462A (zh) * 2013-08-29 2013-12-04 中安消技术有限公司 一种多通道音视频处理方法和装置
CN107094257A (zh) * 2017-05-14 2017-08-25 华中科技大学 一种对卫星图像数据实时解压缩的系统
CN109308192A (zh) * 2017-07-28 2019-02-05 苹果公司 用于执行存储器压缩的系统和方法
CN110795497A (zh) * 2018-08-02 2020-02-14 阿里巴巴集团控股有限公司 分布式存储系统中的协同压缩

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102473093A (zh) * 2009-06-30 2012-05-23 英特尔公司 对多个通道中的紧缩数据解压缩
US20120183079A1 (en) * 2009-07-30 2012-07-19 Panasonic Corporation Image decoding apparatus, image decoding method, image coding apparatus, and image coding method
CN103428462A (zh) * 2013-08-29 2013-12-04 中安消技术有限公司 一种多通道音视频处理方法和装置
CN107094257A (zh) * 2017-05-14 2017-08-25 华中科技大学 一种对卫星图像数据实时解压缩的系统
CN109308192A (zh) * 2017-07-28 2019-02-05 苹果公司 用于执行存储器压缩的系统和方法
CN110795497A (zh) * 2018-08-02 2020-02-14 阿里巴巴集团控股有限公司 分布式存储系统中的协同压缩

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023169109A1 (zh) * 2022-03-11 2023-09-14 上海华兴数字科技有限公司 挖掘机控制方法、装置及挖掘机
CN117097346A (zh) * 2023-10-19 2023-11-21 深圳大普微电子股份有限公司 一种解压器及数据解压方法、系统、设备、计算机介质
CN117097346B (zh) * 2023-10-19 2024-03-19 深圳大普微电子股份有限公司 一种解压器及数据解压方法、系统、设备、计算机介质

Similar Documents

Publication Publication Date Title
US11705924B2 (en) Low-latency encoding using a bypass sub-stream and an entropy encoded sub-stream
US9158686B2 (en) Processing system and method including data compression API
US10366467B1 (en) Method and apparatus for accessing compressed data and/or uncompressed data of image frame in frame buffer
US10810763B2 (en) Data compression using entropy encoding
US11463102B2 (en) Data compression method, data decompression method, and related apparatus, electronic device, and system
US10044370B1 (en) Lossless binary compression in a memory constrained environment
WO2021237510A1 (zh) 数据解压缩的方法、系统、处理器及计算机存储介质
US9894371B2 (en) Video decoder memory bandwidth compression
EP3783891B1 (en) Image data decompression
CN108377394B (zh) 视频编码器的图像数据读取方法、计算机装置及计算机可读存储介质
CN103914404A (zh) 一种粗粒度可重构系统中的配置信息缓存装置及压缩方法
JP6647340B2 (ja) 改善されたファイルの圧縮及び暗号化
CN113613289B (zh) 一种蓝牙数据传输方法、系统及通信设备
US10103747B1 (en) Lossless binary compression in a memory constrained environment
JP3860081B2 (ja) 画像処理装置及び画像処理方法
TWI545961B (zh) 固定壓縮倍率的影像壓縮方法、影像解壓縮方法及其電子裝置
WO2021237513A1 (zh) 数据压缩存储的系统、方法、处理器及计算机存储介质
JP2004509528A (ja) DCTインタフェースのためのRGB色空間とYCrCb色空間との間のデータ転送
JP2011151572A (ja) 画像データ処理装置およびその動作方法
CN115185865A (zh) 基于芯片的数据传输方法、设备及存储介质
CN114422801A (zh) 优化视频压缩控制逻辑的方法、系统、设备和存储介质
WO2021237518A1 (zh) 数据存储的方法、装置、处理器及计算机存储介质
JP4578444B2 (ja) ゼロ画素カウント回路
WO2019227447A1 (zh) 数据处理方法和处理电路
WO2013149045A1 (en) Processing system and method including data compression api

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20938414

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20938414

Country of ref document: EP

Kind code of ref document: A1