WO2022041906A1 - Procédé de compression et appareil de compression de données - Google Patents

Procédé de compression et appareil de compression de données Download PDF

Info

Publication number
WO2022041906A1
WO2022041906A1 PCT/CN2021/097764 CN2021097764W WO2022041906A1 WO 2022041906 A1 WO2022041906 A1 WO 2022041906A1 CN 2021097764 W CN2021097764 W CN 2021097764W WO 2022041906 A1 WO2022041906 A1 WO 2022041906A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
memory space
length
compressed
compression
Prior art date
Application number
PCT/CN2021/097764
Other languages
English (en)
Chinese (zh)
Inventor
苏毅
刘中全
姚建业
周文
周保文
周慧
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022041906A1 publication Critical patent/WO2022041906A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3059Digital compression and data reduction techniques where the original information is represented by a subset or similar information, e.g. lossy compression
    • H03M7/3064Segmenting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Definitions

  • the embodiments of the present application relate to the field of data processing, and in particular, to a data compression method and a compression device.
  • the traditional compression method is software compression. Specifically, the central processing unit (CPU, central processing unit) of the device calls the compression program, writes the data to be compressed into the device memory, and then compresses the data to be compressed to obtain the compressed data.
  • CPU central processing unit
  • the CPU When the data to be compressed includes both the data that needs to be compressed and the associated data that does not need to be compressed, the CPU will directly compress the data to be compressed including the associated data.
  • the associated data destroys the laws and characteristics of the original data, which will lead to The compression rate of the algorithm is reduced; at the same time, the compression of the associated data will also cause the compressed data to include unneeded content, reducing the compression rate.
  • Embodiments of the present application provide a data compression method and a compression device, which can improve the compression rate.
  • division refers to logical division, that is, the compression apparatus can distinguish two parts of the data or memory space, and perform different operations on the two parts.
  • a first aspect of the embodiments of the present application provides a data compression method, which is applied to a compression device, including:
  • the compression device When a user or the outside world issues a compression task, the compression device will receive an instruction for data compression;
  • the compression device When the compression device receives an instruction for data compression, the instruction will carry an identifier of the data to be compressed, and the compression device can determine the data to be compressed according to the identifier; or when the compression device starts compression according to its own needs, the compression device will also obtain The identifier of the data to be compressed;
  • the compression device can divide the data to be compressed into a plurality of data blocks.
  • the data block includes the first part of the data and the second part of the data.
  • the data that needs to be compressed is related to the text data block, also known as the associated data block;
  • the compressing device may compress the first part of the data without compressing the second part of the data to obtain compressed data, and the compressed data only includes the data obtained after the first part of the data is compressed.
  • the second part of the data may be a DIF field.
  • the compression apparatus may divide the One part of the data and the second part of the data, specifically,
  • the compression device can obtain the first starting address, the first data length and the second data length, the first starting address represents the starting address of the data to be compressed, the first data length represents the size of the first part of the data, and the second data length represents the second data length. the size of the partial data;
  • the compression apparatus may divide the data to be compressed into a plurality of data blocks according to the first starting address, the first data length and the second data length.
  • the compression apparatus may obtain the first starting address, the first data length and the first starting address according to the first description information of the data to be compressed.
  • the first description information may indicate the size of each block in the data to be compressed.
  • the compression apparatus may divide the data to be compressed by using a counter. Specifically,
  • the compression device can determine the starting address of the data to be compressed according to the first starting address, read the data to be compressed, and the counter records the first length corresponding to the read data until the first length is the same as the first data length, and then read The data taken is the text data block;
  • the memory space of the data to be processed is skipped, and the counter records the second length corresponding to the skipped memory space until the second length is the same as the second data length. At this time, the skipped data is the associated data block.
  • the compression apparatus may further divide the data to be compressed by means of memory mapping.
  • the compression apparatus may further divide the data to be compressed by calculating the start and end addresses of the data blocks. .
  • the length of each text data block is equal to is the same fixed value, and the first data length can represent the length of each text data block.
  • a first data length may be used to represent the length of each text data segment, and there is no need to perform multiple first data lengths. processing, can save computing resources.
  • the length of the text data block may have multiple values, and the A data length can have two or more.
  • the length of each associated data block is the same and fixed. value, the second data length may represent the length of each associated data block.
  • a second data length may be used to represent the length of each associated data segment, and there is no need to perform multiple second data lengths. processing, can save computing resources.
  • the length of the associated data block may have multiple values, and the Two data lengths can be two or more.
  • the compression apparatus may divide the processed memory space , the processed memory space is used to store the processed data, specifically,
  • the compression device can divide the processed memory space into a plurality of memory space blocks, the memory space block includes a first memory space and a second memory space, the first memory space is used for storing compressed data blocks, and the second memory space is used for storing second associated data, the second associated data is associated with the compressed data block;
  • the compression device can write the compressed data into the first memory space, skip the second memory space, and obtain processed data, which not only includes the compressed data, but also reserves the second memory space for storing the second memory space.
  • Linked Data
  • the vacated first memory space can be directly obtained.
  • the processed data in the second memory space does not need the storage device to copy the compressed data to free up the second memory space, thereby reducing the time delay of the compression process and improving the compression bandwidth of the storage device.
  • the second associated data may be a DIF field.
  • the compression apparatus may be divided according to the second starting address, the first memory space length, and the second memory space length.
  • the first memory space and the second memory space specifically,
  • the compression device can obtain the second start address, the first memory space length and the second memory space length, the second start address represents the start address of the processed data, the first memory space length represents the size of the first memory space, and the first memory space length represents the size of the first memory space.
  • the length of the second memory space indicates the size of the second memory space;
  • the compression device may divide the processed memory space for storing the processed data into a plurality of memory space blocks according to the second starting address, the first memory space length and the second memory space length.
  • the compression apparatus may obtain the second starting address, the first memory address, and the first memory according to the second description information of the processed data.
  • the space length and the second memory space length, the second description information may indicate the size of each block in the processed memory space.
  • the compression apparatus may divide the processed memory space by using a counter. Specifically,
  • the compression device can determine the starting address of the processed memory space according to the second starting address, write the compressed data into the processed memory space, and record the third length corresponding to the written data until the third length is the same as the first.
  • the length of the memory space is the same, the memory space written at this time is the first memory space, and the data block written is the compressed data block;
  • the memory space after processing is skipped, and the counter records the fourth length corresponding to the skipped memory space until the fourth length is the same as the length of the second memory space. At this time, the skipped memory space is the second memory space.
  • the compression apparatus may further divide the data to be compressed by memory mapping.
  • the compression apparatus may further divide the data by calculating the start and end addresses of the data blocks. Data to be compressed.
  • the length of each first memory space is With reference to any one of the thirteenth implementation manner to the seventeenth implementation manner of the first aspect, in the eighteenth implementation manner of the first aspect of the embodiments of the present application, the length of each first memory space is With the same fixed value, the length of the first memory space may represent the length of each first memory space.
  • the length of each first memory space when the length of each first memory space is the same fixed value, the length of each first memory space may be represented by one first memory space length, and there is no need for a plurality of first memory spaces Length processing can save computing resources.
  • the length of the first memory space may be multiple value, the length of the first memory space can be two or more.
  • the length of each second memory space is The same fixed value, the length of the second memory space can represent the length of each second memory space.
  • the length of each second memory space when the length of each second memory space is the same fixed value, the length of each second memory space may be represented by one second memory space length, and there is no need to perform a calculation of multiple second memory spaces. Length processing can save computing resources.
  • the length of the second memory space may be as many as possible. There are two or more lengths of the second memory space.
  • the compression apparatus may be coupled with the storage device, and the Both the compressed data and the first description information are stored in the storage device; the compression apparatus can call the first description information from the storage device, and obtain the first starting address, the first data length and the second data according to the first description information length.
  • the compression apparatus may be coupled to the storage device, Both the processed data and the second description information are stored in the storage device; the compression device can call the second description information from the storage device, and obtain the second starting address, the first memory space length and the second description information according to the second description information. The length of the second memory space.
  • the compression apparatus is located in the storage device, and the storage The central chip of the device is coupled, and both the data to be compressed and the processed data are stored in this storage device.
  • the compression device is located in the storage device and is integrated in the storage device.
  • the central chip of the storage device, the data to be compressed and the processed data are stored in the storage device.
  • the compression device is a software program, which is stored in a storage In the device, the storage device is instructed to perform corresponding operations, and both the data to be compressed and the processed data are stored in the storage device.
  • both the data to be compressed and the processed data are stored in the compression device.
  • the second part of the data may be It is the check data of the first part of the data.
  • the second part of the data may be is the description data of the first part of the data.
  • the second associated data may be check data of compressed data blocks.
  • the second associated data may be description data of the compressed data block.
  • a second aspect of the embodiments of the present application provides a compression device, including:
  • processors one or more processors, memories, input and output devices, and buses;
  • the above-mentioned one or more processors, memories, and input-output devices are connected to the above-mentioned bus;
  • the one or more processors described above are used to perform the following steps:
  • the data block includes the first part of the data and the second part of the data.
  • the data that needs to be compressed in the first part of the data is also called the text data block, and the second part of the data does not need to be compressed.
  • the data is associated with the text data block, also known as the associated data block;
  • the compression device is adapted to perform the method of the aforementioned first aspect.
  • a third aspect of the embodiments of the present application provides a compression device, including:
  • the first division unit is used to divide the data to be compressed into at least two data blocks, the data block includes the first part of the data and the second part of the data, the data that needs to be compressed in the first part of the data is also called the text data block, and the first part of the data needs to be compressed.
  • the second part of the data is the data that does not need to be compressed, and is related to the text data block, also known as the associated data block;
  • the compression unit is used for compressing the first part of data without compressing the second part of data to obtain compressed data.
  • the compression device is adapted to perform the method of the aforementioned first aspect.
  • a fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and when the computer executes the program, the method described in the foregoing first aspect is performed.
  • a fifth aspect of the embodiments of the present application provides a computer program product.
  • the computer program product When the computer program product is executed on a computer, the computer executes the method described in the foregoing first aspect.
  • FIG. 1 is a schematic flowchart of a data compression method in the prior art
  • FIG. 2 is a schematic structural diagram of a compression device in an embodiment of the application.
  • FIG. 3 is a schematic flowchart of a data compression method in an embodiment of the application.
  • FIG. 4 is another schematic flowchart of the data compression method in the embodiment of the present application.
  • FIG. 5 is another schematic flowchart of the data compression method in the embodiment of the present application.
  • FIG. 6 is another schematic structural diagram of the compression device in the embodiment of the present application.
  • An embodiment of the present application provides a compression device that can be used for data compression.
  • the compression device may be located inside or outside a storage device, or may exist in the form of software, and the storage device performs corresponding functions.
  • the compression device provided by the embodiment of the present application The compression ratio of the compression process can be improved, and the memory and compression bandwidth of the storage device occupied during the compression process are reduced.
  • the data that needs to be compressed may be divided into multiple blocks by the data that does not need to be compressed.
  • the block composed of the data that needs to be compressed is called a text data block.
  • the chunks composed of data to be compressed are called associated data chunks, and text data chunks and associated data chunks may appear alternately in the data to be compressed.
  • the associated data segment is the data associated with the text data segment, and the associated relationship may be an association relationship for the compression device or an association relationship for the data source; if it is an association relationship for the data source , for the compression device, the associated data block may not be associated with the text data.
  • the data source That is to say, for the sensor, A data and B data are related; but for the compression device, the compression device does not know the relationship between A data and B data, only that A data needs to be compressed, B data does not need to be compressed.
  • the B data is the related data; for example, A is the displacement of the object movement, B is the instantaneous speed, the two are collected at the same time, but in the calculation of the average Only the displacement A is needed for the speed, and the instantaneous speed B is not needed.
  • the B data is the associated data of the A data; for the compression device, the associated data may also be associated with the text data.
  • the associated data can be
  • the verification data used to verify the integrity of the text data blocks, or other data related to the text data, such as description data used to describe the text data blocks, is not specifically limited here.
  • the above examples are only examples of linked data, and do not limit the linked data. As long as it is non-text data that appears in the data to be compressed, that is, data that does not need to be compressed, it can be called linked data.
  • the associated data chunk is the data collected together with the text data chunk, and the size of the associated data may be smaller or much smaller than the text data chunk, for example, when the associated data chunk is the verification data of the text data chunk , the size of the associated data block can be a few bytes, and the size of the text data block can be hundreds of kilobytes or even larger; the size of the associated data can also be similar to or the same as the text data, or it can be larger than the text data, There is no specific limitation here.
  • the above description of the relationship between the size of the associated data and the size of the text data is only an example of the relationship between the two, and the specific numerical value and data size unit do not constitute a limitation on the two data and the size relationship between the two.
  • the compression device reads the data to be compressed, and when the data to be compressed includes associated data blocks, the associated data blocks are also read together.
  • the compression device compresses the data to be compressed, and when the data to be compressed includes associated data blocks, the associated data blocks are also compressed together to obtain compressed data including the compressed associated data.
  • the compression device copies the compressed data to the storage device.
  • the compressed data includes the compressed associated data to obtain processed data including the compressed associated data.
  • the compression device When the data to be compressed includes both the text data that needs to be compressed and the associated data that does not need to be compressed, the compression device will compress the data to be compressed including the associated data, and the associated data will destroy the rules and characteristics of the original text data , resulting in the reduction of the compression rate of the algorithm; at the same time, the compression of the associated data will also cause the compressed data to include unneeded content, reducing the compression rate.
  • the present application provides a data compression method and a compression device, which are used to improve the compression rate.
  • the data to be compressed is located in a storage device, and the storage device is also used to store processed data obtained by corresponding processing of the data to be compressed.
  • the data to be compressed includes a plurality of data blocks, including text data blocks and corresponding data blocks.
  • the text data segment is associated with a first associated data segment.
  • the text data segment is referred to as the first part of data, and the first associated data segment is referred to as the second part of data.
  • the processed data may also be divided into multiple blocks, including the compressed data block, and a memory space is reserved after each compressed data block to store the compressed data block.
  • the memory space used to store the processed data is called the processed memory space, wherein the memory space used to store the compressed data blocks is called the first memory space, and the memory space used to store the second associated data blocks is called the first memory space.
  • Second memory space If the processed data does not need to be added to the second associated data, the processed data may not include the second memory space, but instead consists of a whole segment of compressed data, which is not specifically limited here.
  • the compression device may use the first starting address indicating the storage location of the data to be compressed, the first data length indicating the size of the text data block, and the first data length indicating the A second data length of the associated data block size divides the data to be compressed into a plurality of data blocks, that is, distinguishes the text data block from the first associated data block.
  • the compression device can indicate the size of the first memory space by representing the second starting address of the storage location of the processed data.
  • the first memory space length and the second memory space length indicating the size of the second memory space the processed memory space is divided into multiple memory space blocks, that is, the first memory space and the second memory space are distinguished.
  • the division refers to the logical division, that is, the compression device can distinguish the text data and the first check data, or distinguish the first memory space and the second memory space, and the two parts can be divided into two parts. perform different actions.
  • the associated data can be a data consistency protection (DIF, data integrity field) field, which is used to verify the integrity of the data sex.
  • DIF data consistency protection
  • this embodiment of the present application may also be other fields used for checking the integrity of a block, such as a self-defined field used for checking the integrity of a block, or a field used for checking the integrity of a block in other protocols. field; the associated data in this embodiment of the present application may also be data for other purposes, such as description data used to describe text data, which is not specifically limited here.
  • the associated data block in the embodiment of the present application only takes the DIF field as an example, all DIF fields can be replaced by other associated data, and all memory spaces used for storing DIF fields can be used for storing other associated data.
  • its role also changes with the changes of linked data.
  • the DIF field is used to verify data integrity, or a custom data is used to describe data information, and the description of the role of linked data. It does not constitute a restriction on Linked Data.
  • the function of dividing the data to be compressed and the memory space after processing needs to be realized by a compression device.
  • the compression device can be a hardware device located inside or outside the storage device, or it can be a software program stored in the storage device to guide the storage device. Perform corresponding operations, or be a storage device that can implement division and compression functions.
  • the storage device 200 in this embodiment of the present application may include a processor 201, a memory 202, a processor 203, and a communication bus 204, and the processor 203 is the above-mentioned compression device.
  • Storage device 200 may include multiple processors, such as processor 201 and processor 203 shown in FIG. 2 .
  • the processor 201 is the control center of the storage device.
  • the processor 201 is a central processing unit (CPU, central processing unit), including one CPU core or multiple CPU cores, such as CPU1 and CPU2 shown in FIG. 2 .
  • the processor 201 may also be a specific integrated circuit (ASIC, application specific integrated circuit), or be configured as one or more integrated circuits, for example: one or more microprocessors (DSP, digital signal processor), or , one or more field programmable gate arrays (FPGA, field programmable gate array).
  • the processor 201 can perform various functions of the storage device 200 by running or executing software programs stored in the memory 202 and calling data stored in the memory 202 .
  • the memory 202 can be used to store the data to be compressed and the processed data, and the data to be compressed can include a text data block and a first DIF field block, and the first DIF field block is used to verify the integrity of the text data block, and the two appear alternately.
  • the processed memory space corresponding to the processed data may include a first memory space and a second memory space, and the two appear alternately.
  • the first memory space is used to store compressed data blocks
  • the second memory space is used to store second DIF field blocks
  • the second DIF field blocks are used to verify the integrity of the compressed data blocks.
  • the memory 202 may also store a software program for executing the solution of the present application, and the execution is controlled by the processor 201 or the processor 203 .
  • the memory 202 may be a static storage device that can store static information and instructions, a random access memory (RAM, random access memory) or other types of dynamic storage devices that can store information and instructions, or an electrically accessible memory device. Erasing programmable read-only memory (EEPROM, electrically erasable programmable read-only memory) or other forms of virtual memory.
  • the memory 202 may exist independently and be connected to the processor 201 through the communication bus 204, or may be integrated with the processor 201, which is not specifically limited here.
  • the memory 202 may include multiple parts, the memory 202 for storing the data to be compressed may be a read-only memory (ROM, read-only memory) or any of the above forms, and the memory 202 for storing the processed data may be any of the above forms.
  • ROM read-only memory
  • the memory 202 for storing the processed data may be any of the above forms.
  • the processor 201 may invoke the processor 203 through an interface to perform corresponding operations, and Table 1 describes an interface form:
  • the data types of the parameters appearing in the embodiments of the present application may be integer type int, character type char as shown in Table 1, and may also be other types, such as long integer type long, etc., which are not specifically limited here.
  • the parameter inBuf is also called the first starting address, and the parameter outBuf is also called the second starting address, which respectively represent the storage locations of the data to be compressed and the processed data in the memory 202 .
  • the parameter inContinuity represents the data composition of the data to be compressed in the memory 202.
  • This parameter includes two sub-parameters, one is the first data length, which represents the length of a single text data block that needs to be compressed in the data to be compressed, that is, the first part of the data.
  • the other is the second data length, which represents the length of a single first DIF field block, that is, the length of the second part of the data.
  • the parameter outContinuity represents the composition of the processed memory space in the storage 202, the processed data is stored in the processed memory space, and this parameter also reflects the data composition of the processed data.
  • This parameter includes two sub-parameters, one is the length of the first memory space, indicating the length of a single compressed data block in the processed data, that is, the length of the first memory space; the other is the length of the second memory space, indicating the length of the processed data.
  • the length of a single second DIF field block in the post data that is, the length of the second memory space, and the second DIF field block is used to verify the integrity of the compressed data block.
  • the interface can also provide other parameters, including but not limited to the parameter inLen, the parameter outBufLen, and the parameter compressParams;
  • the parameter inLen represents the length of the data to be compressed
  • the parameter outBufLen represents the size of the memory space that can be used to store the processed data
  • the parameter compressParams It is used to describe the compression parameters.
  • the parameter compressParams can include but is not limited to the compression window length.
  • the compression window length represents the length of the encoded area in the sliding window of the LZ compression algorithm; when applying other compression algorithms, the interface Other compression parameters of other algorithms may also be provided, which are not specifically limited here.
  • the parameters provided by the interface of the present application include but are not limited to the above parameters, and the interface may not provide the above parameters, which is not limited here.
  • the first data length and the second data length in the parameter inContinuity can be two fixed values, using Indicates the length of all text data blocks and the first DIF field block in the data to be compressed.
  • the first data length in the parameter inContinuity can include the length value of each first data, or include part of the first data length.
  • the length value of a data that is, the value of the first data length, may have multiple values, which are not specifically limited here.
  • the second data length is similar to the first data length, and details are not repeated here.
  • the length of the first memory space and the length of the second memory space in the parameter outContinuity can be two fixed values.
  • the value is used to indicate the length of all compressed data blocks and the second DIF field block in the processed data.
  • the length of the first memory space in the parameter outContinuity may include the length value of each first memory space, Or include part of the length value of the first memory space, that is, there may be multiple values of the length of the first memory space, which is not specifically limited here.
  • the length of the second memory space is similar to the length of the first memory space, and details are not described herein again.
  • the processor 203 may be configured to divide the text data into blocks and the first DIF field blocks, and at the same time undertakes the functions of reading the text data blocks and data compression, which can avoid the first DIF field block. compression to increase the compression ratio.
  • the processor 203 may call the parameter inBuf, that is, the first starting address, and the parameter inContinuity, that is, the first data length and the second data length, according to the above-mentioned interface.
  • the compressed data is compressed to obtain compressed data.
  • the reading process of the processor 203 may be performed by means of direct memory access (DMA, direct memory access), or may be a new reading method that will appear in the future, as long as the data can be read from the memory 202
  • DMA direct memory access
  • the method of entering the compression device is not specifically limited here.
  • the processor 203 can also be used to divide the first memory space and the second memory space, and at the same time assumes the function of writing out the compressed data, so as to realize that a space is vacated in the processed data for storing the second DIF field data.
  • the processor 203 may call the parameter outBuf, that is, the second starting address, and the parameter outContinuity, that is, the length of the first memory space and the length of the second memory space, according to the above-mentioned interface. Determine the first memory space and the second memory space in the processed data according to the parameter outBuf and the parameter outContinuity, write the compressed data into the first memory space, and empty the second memory space for storing the second DIF field Block.
  • the processor 203 may include multiple compression cores, such as compression core 1 and compression core 2 shown in FIG. 2 , each core independently processes different compression tasks, that is, each The complete set of tasks for compression and writing, different compression cores process different data to be compressed.
  • the processor 203 may have the same physical form as the processor 201 , or may have a different physical form from the processor 201 .
  • the processor 203 may be a computing unit integrated in the central chip, or may exist in other forms, such as a processing chip with computing capability coupled with the central chip, or a hardware device with computing capability connected with a storage device through a physical interface, The hardware device is located in the compression device, which is not specifically limited here.
  • the central chip may include the processor 201 .
  • the processor 203 may be an acceleration card or a co-processor, a graphics processing unit (GPU, graphics processing unit) or a neural network processor (NPU, neural-network processing unit), or the like.
  • one or more processors 201 may be configured, and one or more processors 203 may also be configured.
  • the compression device may also be a software program stored in the memory 202, and the actions performed by the processor 203 are performed by the processor 201 according to the software program; or the compression device is the above storage device, and the processor 203 is a module specially used by the storage device for compression.
  • the function of the processor 203 may be implemented by the processor 201 or a special processor 203, which is not specifically limited here.
  • Both the processor 201 and the processor 203 can access the memory 202 through the communication bus 204 , read software programs and data stored in the memory 202 , and write data into the memory 202 .
  • the processor 201 When the processor 201 receives the instruction of the compression task, it can obtain the location identifier and the block length according to the continuity information of the data to be processed in the instruction.
  • the location identifier includes the first starting address and the second starting address
  • the block length includes the first A data length, a second data length, a first memory space length, and a second memory space length
  • a compression instruction, a location identifier and a block length are sent to the processor 203 through the interface to instruct the processor 203 to perform corresponding processing on the data to be processed. deal with.
  • the location identifier and block length may be issued to the processor 203 along with the compression instruction, or may be stored in the memory 202, and the processor 203 retrieves from the memory 202 after receiving the compression instruction, which is not specifically limited here.
  • the processor 203 is specifically used in scenarios that require data transmission and storage, and may specifically be a cloud server or a cloud service storage device, or other storage devices, such as servers used in networks, big data, storage, etc. or other storage devices, which are not specifically limited here.
  • the processor 203 can be applied to data compression or communication, and can also be applied to other scenarios, such as video encoding or other data processing scenarios, which are not specifically limited here.
  • the communication bus 204 may be an industry standard architecture (ISA, industry standard architecture) bus, a peripheral component interconnect (PCI, peripheral component) bus, or an extended industry standard architecture (EISA, extended industry standard architecture) bus or the like.
  • ISA industry standard architecture
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used in FIG. 2, but it does not mean that there is only one bus or one type of bus.
  • the device structure shown in FIG. 2 does not constitute a limitation on the storage device, and may include more or less components than shown, or combine some components, or arrange different components.
  • the compression apparatus may also be located outside the storage device, and specifically may be a hardware device coupled to the storage device through a physical interface, the hardware device is located inside the compression device and includes a processor 203, and the processor 203 passes through the physical interface and the storage device.
  • the bus 204 realizes the actions of reading data and writing data, and realizes the functions that can be realized by the above-mentioned processor 203 located inside the storage device.
  • the processor 203 can read data in a remote direct data access (RDMA, remote direct memory access) manner, which is not specifically limited here.
  • RDMA remote direct data access
  • the compression device determines the text data block to be compressed according to the first data length, the second data length and the first start address, and only reads the text data block, and does not read the first associated data block.
  • block that is, the above-mentioned first DIF field block, so that the compressed data does not include the first associated data block, preventing the reduction of the algorithm compression rate caused by the first associated data block destroying the law and characteristics of the original data, Therefore, the compression rate is improved; at the same time, the input data is prevented from becoming larger because the first associated data block is prevented from participating in the compression, thereby increasing the compression rate.
  • the compression apparatus determines, by using the first memory space length, the second memory space length, and the second starting address, the first memory space for storing the compressed data and the second memory space for storing the second association.
  • the data block that is, the second memory space of the second DIF field block, the compressed data is written into the first memory space, and the second memory space is vacated for storing the second associated data block, which can be stored in the memory 202
  • the processed data that frees up the second memory space is directly obtained from the storage device, and there is no need for the storage device to copy the compressed data to free up the second memory space, thereby reducing the time delay of the compression process and improving the compression bandwidth of the storage device.
  • the compression device may be located inside the storage device, or may not be located inside the storage device, which will be described separately below:
  • the compression device is inside the storage device.
  • the data compression method provided by an embodiment of the present application will be described in detail.
  • the method is applied to the storage device 200 shown in FIG. 2 .
  • the processor 201 and the processor 203 are used as examples for description.
  • both the processor 201 and the processor 203 are located in a storage device.
  • the method may include the following steps.
  • the processor 201 receives the compression instruction:
  • the processor 201 may receive the compression instruction, and the compression instruction may also be received through other channels. For example, when another device sends a compression request to the storage device 200, the processor 201 may also receive the compression instruction. , which is not specifically limited here.
  • the processor 201 obtains the location description information and the segment length:
  • the compression instruction will include the segment length, which includes the length of the text data block in the data to be processed and the length of the first DIF field block used to verify the text data block, and the length of the compressed data in the processed data.
  • the processor 201 may determine the location description information according to the memory space of the memory 202, that is, the storage location of the data to be compressed and the storage location of the processed data.
  • step S301 When the compression device starts compression according to its own needs, the processor 201 will not receive a compression instruction, that is, step S301 does not exist, and the compression device will also obtain the location description information and the segment length at this time.
  • the processor 201 obtains the data to be processed:
  • the processor 201 copies the data to be processed from the external storage of the memory to the memory 202. Specifically, the processor can also obtain the data to be processed from other places. For example, when the compression instruction in step S301 is sent externally, and the external data to be processed, the processor 201 may save the data to be processed, which is not specifically limited here.
  • the processor 201 may call the processor 203, and send a compression instruction to the processor 203, so that the processor 203 performs corresponding processing on the data to be processed. Specifically, calling the processor 203 needs to pass through the interface, and the interface provides the parameter inBuf, the parameter outBuf, the parameter inContinuity and the parameter outContinuity.
  • the above parameters have been described in the embodiment shown in FIG. 2 and will not be repeated here.
  • the processor 203 obtains the first starting address, the first data length and the second data length:
  • the processor 203 receives the compression instruction issued by the processor 201, and the instruction may include the location description information and segment length in step S302, and the processor 203 may obtain the parameter inBuf according to the storage location of the data to be compressed in the location description information, that is, the first a starting address, obtain the first data length according to the length of the text data block in the segment length, obtain the second data length according to the length of the first DIF field block in the segment length, the above-mentioned first data length and second data length
  • the length is the parameter inContinuity.
  • the processor 203 may also obtain the above-mentioned information without compressing the instruction.
  • the processor 203 may call the location description information and segment length from the memory 202, and then The first start address, the first data length and the second data length are obtained according to the location description information and the segment length.
  • the memory 202 will store the location description information and the segment length.
  • the processor 203 may directly use the storage location of the data to be compressed as the parameter inBuf, or perform data type conversion on the storage location of the data to be compressed, which is not specifically limited here.
  • the method of obtaining the parameter inContinuity is similar to that of the parameter inBuf, which will not be repeated here.
  • the processor 203 obtains the data to be compressed:
  • the processor 203 can determine the data to be compressed according to the compression instruction issued by the processor 201, that is, the data to be processed stored in the memory 202 is regarded as the data to be compressed. The location of the data and other descriptive information.
  • the processor 203 reads the text data into blocks:
  • the processor 203 may determine the starting address of the data to be compressed in the memory 202 according to the parameter inBuf.
  • the processor 203 starts a counter and initializes the value of the counter to 0.
  • the processor 203 reads the data to be compressed in the memory 202 into the cache memory of the processor 203, and the counter synchronously records the length of the read data.
  • the value of the counter is set to 0, and the data read by the processor 203 at this time is the text data block.
  • the processor 203 skips the length of the first DIF field block in the parameter inContinuity, and the data skipped by the processor 203 at this time is the first DIF field block.
  • the processor 203 continues to read the text data block and skips the first DIF field block to obtain continuous data to be compressed that does not include the first DIF field block, and the continuous data to be compressed includes all text data. Chunked, i.e. including the complete body data.
  • the names of the parameters are only examples, and are not intended to limit the parameters. For parameters with the same meaning, they should all be the parameters referred to in the embodiments of the present application.
  • the processor 203 may also determine the text data blocks in other ways, for example, by memory mapping, or by calculating the start and end addresses of the data blocks, There is no specific limitation here.
  • the processor 203 compresses the continuous data to be compressed to obtain compressed data, and the compressed data is stored in the cache memory of the processor 203 .
  • the processor 203 obtains the second starting address, the first memory space length and the second memory space length:
  • the compressed instruction received by the processor 203 from the processor 201 may include the location description information and the segment length, and the processor 203 may obtain the parameter outBuf according to the storage location of the processed data in the location description information, that is, the second starting address.
  • the length of the compressed data block in the segment length obtains the first memory space length, and according to the length of the second data block in the segment length, the second memory space length is obtained.
  • the above-mentioned first memory space length and second memory space length are For the parameter outContinuity.
  • the processor 203 may also obtain the above-mentioned information without compressing the instruction, which is similar to the above-mentioned step S305, and details are not repeated here.
  • the processor 203 writes the compressed data into the memory 202 to obtain the processed data.
  • the processor 203 may determine the starting address of the processed data in the memory 202 according to the parameter outBuf.
  • the processor 203 writes the compressed data into the memory 202, and the counter synchronously records the length of the written data.
  • the value of the counter is set to 0, and the memory space written by the processor 203 at this time is the first memory space.
  • the processor 203 skips the memory space of the block length of the second DIF field in the parameter outContinuity, and the memory space skipped by the processor 203 at this time is the second memory space.
  • the processor 203 continues to write the compressed data in the first memory space and skips the second memory space to obtain processed data including the compressed data blocks and the second memory space.
  • the processor 203 may also determine the first memory space in other ways, for example, by memory mapping, or by calculating the start and end of memory blocks
  • the address method is not specifically limited here.
  • step S309 may be performed after any one of steps S304 to S308, or may be performed simultaneously with step S305, as long as it is performed after step S304 and before step S310, which is not specifically limited here.
  • the identifier representing the data location may also be a parameter in other forms.
  • the parameters inBuf and outBuf are used as examples, which are not specifically limited here.
  • the identifier representing the length of the data block may also be a parameter in other forms.
  • the parameters inContinuity and outContinuity are used as examples, which are not specifically limited here.
  • the process of data compression may also be performed synchronously with the process of writing the compressed data into the memory 202 (step S310 ).
  • a character obtained from the compression process is stored in the memory 202 Write a character; or, the process of reading the text data into the cache memory of the processor 203 in blocks (step S307) can also be performed synchronously with steps S308 and S310, for example, the processor 203 calculates first in order to obtain compression The data is divided into chunks, how long the text data needs to be used, the text data of the corresponding length is read, the compressed data of the entire segment is directly obtained after compression, and then the compressed data of the entire segment is written into the memory 202; the details are not limited here. .
  • the processor 203 generates a DIF field and writes it into the second memory space.
  • the processed data obtained in step S310 includes a second memory space for placing the second DIF field.
  • the processor 203 may generate a second DIF field and fill it into the second memory space.
  • the action of generating the second DIF field and filling the second memory space can be completed not only by the processor 203 but also by the processor 201, which is not specifically limited here.
  • step S310 can also be performed synchronously with step S308.
  • the processor 203 loads a second memory space
  • the processor 203/processor 201 fills in the corresponding first memory space in the second memory space.
  • Two DIF fields which are not specifically limited here.
  • the processor 203 may be a computing unit or computing module integrated in the central chip, or may exist in other forms, such as a processing chip with computing capability coupled to the central chip, which is not specifically limited here.
  • the central chip may include the processor 201 .
  • the specific form of the processor 203 is the embodiment shown in FIG. 2 , and details are not repeated here.
  • the compression device is the storage device, that is, the processor 203 . It is the processor 201, which is located inside the storage device.
  • the method may include the following steps:
  • S401-S403 Similar to steps S301-S303 in the embodiment shown in FIG. 3, except that the execution body of the action is changed to the processor 203, which is not repeated here.
  • the processor 203 obtains the first starting address, the first data length and the second data length according to the compression instruction issued by the user or the outside world, or the parameters obtained when the compression is started by itself.
  • This step is similar to step S305 in the embodiment shown in FIG. 3 , and details are not repeated here.
  • step S405-S409 Similar to steps S307-S311 in the embodiment shown in FIG. 3, all actions of the processor 201 are executed by the processor 203, wherein step S408 is obtained according to the compression instruction issued by the user or the outside world, or when the compression is started by itself parameters to obtain the second starting address, the length of the first memory space, and the length of the second memory space, which will not be repeated here.
  • the compression device is not inside the storage device.
  • FIG. 3 a data compression method provided by another embodiment of the present application will be described.
  • the method is applied to the storage device 200 shown in FIG. 2 .
  • the processor 201 is located in the storage device, and the processor 203 is located in the storage device.
  • the processor 203 is coupled to the storage device through a physical interface.
  • the specific steps are similar to the previous embodiment shown in FIG. 3 , and details are not repeated here.
  • Fig. 5 is another schematic flow chart of the method shown in Fig. 3, which clearly shows the data composition of the data to be compressed, the processed data and the continuous data to be compressed, and the processing process of the data to be compressed by the compression device is as follows:
  • This step is the same as step S305 to step S307 in the embodiment of FIG. 3 , and details are not repeated here.
  • This step is the same as step S308 in the embodiment of FIG. 3 , and details are not repeated here.
  • S503 Divide the first memory space and the second memory space, copy the compressed data to the first memory space, and reserve the second memory space for the second DIF field:
  • step S309 This step is the same as step S309 to step S310 in the embodiment of FIG. 3 , and details are not repeated here.
  • another structure of the compression device includes:
  • a first dividing unit 601 configured to divide the data to be compressed into at least two data blocks, the data blocks including a first part of data and a second part of data;
  • the compressing unit 602 is configured to compress the first part of the data, ignore the second part of the data, and obtain compressed data.
  • a second dividing unit 603, configured to divide the processed memory space into at least two memory space blocks, where the memory space blocks include a first memory space and a second memory space;
  • a writing unit 604 configured to write the compressed data into the first memory space, ignore the second memory space, and obtain processed data
  • the first storage unit 605 is configured to store the data to be compressed.
  • the first dividing unit 601 may include a first obtaining subunit 6011 and a first dividing subunit 6012 .
  • the first obtaining subunit 6011 is specifically configured to obtain a first starting address, a first data length, and a second data length, where the first starting address indicates the starting address of the data to be compressed, and the first data length indicates the the size of the first part of the data, the second data length represents the size of the second part of the data;
  • the first dividing subunit 6012 is specifically configured to divide the data to be compressed into at least two data blocks according to the first starting address, the first data length and the second data length.
  • the first obtaining subunit 6011 may include a first obtaining module 60111.
  • the first obtaining module 60111 is specifically configured to obtain the first start address, the first data length and the second data length according to the first description information of the data to be compressed.
  • the second dividing unit 603 may include a second acquiring subunit 6031 and a second dividing subunit 6032 .
  • the second obtaining subunit 6031 is specifically configured to obtain a second starting address, a first memory space length and a second memory space length, where the second starting address represents the starting address of the processed memory space, and the first A memory space length represents the size of the first memory space, and the second memory space length represents the size of the second memory space;
  • the second dividing subunit 6032 is specifically configured to divide the processed memory space into at least two memory space blocks according to the second starting address, the first memory space length and the second memory space length.
  • the second obtaining subunit 6031 may include a second obtaining module 60311 .
  • the second obtaining module 60311 is specifically configured to obtain the second starting address, the first memory space length and the second memory space length according to the second description information of the processed data.
  • the compression apparatus may perform the operations performed by the compression apparatus in the foregoing embodiments shown in FIG. 3 to FIG. 5 , and details are not described herein again.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Des modes de réalisation de la présente demande divulguent un procédé de compression et un appareil de compression de données, utilisés pour augmenter le taux de compression. Le procédé des modes de réalisation de la présente demande comprend les étapes consistant à : au moyen d'informations de description continues et d'identification d'emplacement, diviser des données à compresser en blocs de données de texte principal qui doivent être compressés, et en premiers blocs de données associés qui n'ont pas besoin d'être compressés; compresser uniquement les blocs de données de texte principal et ne pas compresser les premiers blocs de données associés, pour obtenir des données compressées.
PCT/CN2021/097764 2020-08-31 2021-06-01 Procédé de compression et appareil de compression de données WO2022041906A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010897258.1A CN114124103A (zh) 2020-08-31 2020-08-31 一种数据压缩方法以及压缩装置
CN202010897258.1 2020-08-31

Publications (1)

Publication Number Publication Date
WO2022041906A1 true WO2022041906A1 (fr) 2022-03-03

Family

ID=80352556

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/097764 WO2022041906A1 (fr) 2020-08-31 2021-06-01 Procédé de compression et appareil de compression de données

Country Status (2)

Country Link
CN (1) CN114124103A (fr)
WO (1) WO2022041906A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030012275A1 (en) * 2001-06-25 2003-01-16 International Business Machines Corporation Multiple parallel encoders and statistical analysis thereof for encoding a video sequence
CN101088084A (zh) * 2003-12-29 2007-12-12 文丘里无线公司 可再用压缩对象
CN101957836A (zh) * 2010-09-03 2011-01-26 清华大学 一种文件系统中可配置的实时透明压缩方法
CN111384961A (zh) * 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 数据压缩解压装置和数据压缩方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030012275A1 (en) * 2001-06-25 2003-01-16 International Business Machines Corporation Multiple parallel encoders and statistical analysis thereof for encoding a video sequence
CN101088084A (zh) * 2003-12-29 2007-12-12 文丘里无线公司 可再用压缩对象
CN101957836A (zh) * 2010-09-03 2011-01-26 清华大学 一种文件系统中可配置的实时透明压缩方法
CN111384961A (zh) * 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 数据压缩解压装置和数据压缩方法

Also Published As

Publication number Publication date
CN114124103A (zh) 2022-03-01

Similar Documents

Publication Publication Date Title
CN109918018B (zh) 一种数据存储方法及存储设备
TW201905714A (zh) 以輔助處理器記憶體進行儲存裝置的直接輸入輸出操作的計算系統操作方法、計算系統、車輛及電腦可讀媒體
CN111913659B (zh) 块数据处理方法、装置、系统及存储介质
CN111813713A (zh) 数据加速运算处理方法、装置及计算机可读存储介质
CN116662038B (zh) 基于共享内存的工业信息检测方法、装置、设备及介质
WO2024169368A1 (fr) Procédé, appareil et dispositif de rendu de carte et produit programme
CN115168259B (zh) 一种数据存取方法、装置、设备和计算机可读存储介质
CN111930697A (zh) 一种基于3d信息的数据传输方法、计算设备及系统
CN111491169B (zh) 一种数字图像压缩方法、装置、设备、介质
JPWO2014024610A1 (ja) データ転送装置、データ転送方法、及びプログラム
CN115470156A (zh) 基于rdma的内存使用方法、系统、电子设备和存储介质
WO2022041906A1 (fr) Procédé de compression et appareil de compression de données
CN109213745B (zh) 一种分布式文件存储方法、装置、处理器及存储介质
US11604753B2 (en) Inter device data exchange via external bus by utilizing communication port
WO2024066753A1 (fr) Procédé de compression de données et appareil associé
CN117632843A (zh) 一种数据处理方法、装置、片上系统和电子设备
CN112153054A (zh) 一种任意字节长度拼接缓存的实现方法和系统
CN112817526B (zh) 一种数据存储方法、装置及介质
CN108874994A (zh) 一种分块读取数据的方法、装置及计算机存储介质
JP3604795B2 (ja) 印字制御装置と印字制御方法
WO2021237513A1 (fr) Système et procédé de stockage de compression de données, processeur et support de stockage informatique
CN114238250A (zh) 程序文件压缩方法、装置及电子设备
WO2024082695A1 (fr) Procédé et appareil de traitement par lots pour des instructions de dessin
JP3138693B2 (ja) データ圧縮回路
CN117909268B (zh) 一种gpu驱动优化方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21859750

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21859750

Country of ref document: EP

Kind code of ref document: A1