WO2023284851A1 - 数据压缩模型训练方法及装置、存储介质 - Google Patents

数据压缩模型训练方法及装置、存储介质 Download PDF

Info

Publication number
WO2023284851A1
WO2023284851A1 PCT/CN2022/105929 CN2022105929W WO2023284851A1 WO 2023284851 A1 WO2023284851 A1 WO 2023284851A1 CN 2022105929 W CN2022105929 W CN 2022105929W WO 2023284851 A1 WO2023284851 A1 WO 2023284851A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
data block
redundant
compression
possibility
Prior art date
Application number
PCT/CN2022/105929
Other languages
English (en)
French (fr)
Inventor
白智德
白志得
哈米德
黄坤
殷燕
Original Assignee
深圳智慧林网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳智慧林网络科技有限公司 filed Critical 深圳智慧林网络科技有限公司
Publication of WO2023284851A1 publication Critical patent/WO2023284851A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the technical field of data processing, and in particular to a data compression model training method and device, and a storage medium.
  • Huffman and Run-length algorithms tend to find pure redundancies, meaning they tend to notice a piece of data (such as features of text) in order to be as close as possible to a larger piece of data. It is possible to find a large amount of duplicated data that is identical to the block data.
  • Those algorithms perform better to some extent, but their main problem is that they have developed a compression bottleneck, and all those redundancy-based algorithms cannot discover new ways to generate redundancy.
  • the present application provides a data compression model training method and device, and a storage medium to provide high-ratio data block compression.
  • a data compression model training method comprising:
  • a function corresponding to the index number is used to generate redundant data in the data block.
  • the analyzing the possibility of adding redundancy in the data block includes:
  • the possibility of adding redundancy in the data block is analyzed.
  • the method also includes:
  • a first heat map is generated, the first heat map includes redundant m-bit high-value numbers in the data block, where m is a positive integer.
  • the method also includes:
  • the redundant data is stored in the data block.
  • the method also includes:
  • the number of compressed data blocks is predicted.
  • the method also includes:
  • a second heat map is generated, the second heat map includes n-bit high-value numbers in the data block, n ⁇ m, where n is a positive integer.
  • the method also includes:
  • the data block containing redundant data is deleted.
  • a data compression model training device comprising:
  • an analysis unit for analyzing the possibility of adding redundancy in said data block
  • a determining unit configured to determine an index number of a function that generates redundant data in the data block
  • a first generating unit configured to generate redundant data in the data block by using the function corresponding to the index number.
  • the analysis unit is configured to analyze the possibility of adding redundancy in the data block according to the data type of the data block.
  • the device further includes:
  • the second generating unit is configured to generate a first heat map, where the first heat map includes redundant m-bit high-value numbers in the data block, where m is a positive integer.
  • the device further includes:
  • a storage unit configured to store the redundant data in the data block.
  • the device further includes:
  • the prediction unit is configured to predict the number of compressed data blocks according to a probability prediction algorithm.
  • the device further includes:
  • a compression unit configured to compress a set number of data blocks, the data blocks originating from one or more files
  • the third generating unit is configured to generate a second heat map, the second heat map includes n-bit high-value numbers in the data block, n ⁇ m, n is a positive integer.
  • the storage unit is further configured to delete the data block containing redundant data when it is detected that the data block containing redundant data is not suitable for permanent storage.
  • a data compression model training device including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the computer program, the first aspect is implemented. Or any one of the first aspect to implement the method.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method as described in the first aspect or any one of the implementations of the first aspect is implemented.
  • Fig. 1 is a schematic flow chart of a data compression model training method provided by the embodiment of the present application
  • Fig. 2 is a schematic flow chart of another data compression model training method provided by the embodiment of the present application.
  • FIG. 3 is a schematic diagram of a data compression model training system provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a data compression model training device provided in an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of another data compression model training device provided by an embodiment of the present application.
  • This application provides a data compression training scheme, which is different from the traditional compression algorithm that directly addresses redundant data.
  • the corresponding compression algorithm is used to generate redundant data in the data block. A compression ratio beyond what was previously possible can be obtained, improving compression performance.
  • the method may include the following steps:
  • a redundancy generator algorithm (RGA) is responsible for generating redundancy by manipulating data blocks that increase the number of duplicate values in a data block. Unlike traditional compression algorithms that directly look for redundant data, RGA creates as many redundant data blocks as possible.
  • RGA The general purpose of RGA is to hand data over to other compression algorithms for that matter, to provide compression ratios beyond what is currently possible. RGA can read a specific block of data of any size, analyzing the possibility of adding redundancy in several smaller parts of the data.
  • RGA utilizes a list of basic and advanced mathematical formulas to figure out how to create more redundant data in a given data block.
  • A) RGA data type which is a number between 0 and 8 (3 bits), indicating how redundancy is generated in a given data block. The higher the value, the better the redundancy generation of the RGA for its corresponding data block.
  • Heatmap which is actually a heatmap-like map that shows the resulting redundancies between regions of a given data block. For example, in a given data block, RGA can detect n-bit long numbers with greater redundancy than m-bit long numbers. This will generate a heatmap of high value numbers that are more redundant in the data block.
  • the vitality of storing actual data blocks is determined by an artificial intelligence algorithm that scans previously recorded data in the RGA section to see if incoming actual data blocks will increase the RGA inventory.
  • This application provides a data compression training method, which is different from the traditional direct addressing redundant data compression algorithm, by analyzing the possibility of adding redundancy in the data block, using the corresponding compression algorithm to generate redundant data in the data block, A compression ratio beyond what was previously possible can be obtained, improving compression performance.
  • FIG. 2 it is a schematic flowchart of another data compression model training method provided by the embodiment of the present application.
  • Model training is responsible for documenting the compression process in various ways. Its ultimate goal is to make the trained model increase the overall performance and scale of the entire compression and decompression process.
  • This method can be applied to the data compression model training system as shown in Figure 3, and this system comprises:
  • the bulk same-type data analyzer (the bulk same-type data analyzer, BSTDA) 301;
  • the probability predictor algorithm (the probability predictor algorithm, PPA) 302;
  • Redundancy generator algorithm redundancy generator algorithm, RGA
  • the method may include the following steps:
  • a set number of data blocks may be compressed through BSTDA, and a second heat map may be generated.
  • BSTDA is an algorithm that uses RGA and PPA not on a specific data, but on a large number of data blocks, wherein each data block in a large number of data blocks belongs to an independent or non-independent file.
  • BSTDA tends to study, analyze, and train on a large number of files of the same specific form.
  • BSTDA is more useful when dealing with large data and lots of data compression. This can greatly improve compression efficiency.
  • Data from BSTDA represents data in files of the same type with the same compression parameters.
  • A) data type that is, data from a bitmap (bitmap, BMP) file.
  • the second heatmap which is actually a heatmap-like map showing the concentration of a given value (binary or hexadecimal), normally distributed in a file of the same file format.
  • BSTDA can detect n-digit long high value numbers at the beginning of most .mp4 files (excluding their headers). This will generate a heatmap of high value numbers that are more dense at the beginning of the data block.
  • Data storage if required, stores the incoming data in actual data blocks.
  • the vitality of storing actual data blocks is determined by an artificial intelligence algorithm that scans previously recorded data in the BSTDA section to see if incoming actual data blocks will increase the BSTDA inventory.
  • the PPA algorithm may be used to predict the number of compressed data blocks.
  • PPA is an algorithm that predicts how likely to compress a block of data using RGA, using a series of large-length variables, often storing its findings as a new training pattern for the next input data.
  • a PPA is like a living organism that monitors incoming chunks of data and adds knowledge of how it can perform better next time.
  • the main purpose of this algorithm is to ensure that the data can be consumed by the computer with less time and resources in the next compression.
  • the PPA data type which is a number between 0 and 8 (3 bits) indicating how well the probability prediction occurs in a given data block. The higher the value, the better the probability prediction of the PPA for its corresponding data block.
  • Data storage if required, stores the incoming data in actual data blocks.
  • the vitality of storing actual data blocks is determined by an artificial intelligence algorithm that scans previously recorded data in the RGA section to see if incoming actual data blocks will increase the RGA inventory.
  • RGA is responsible for generating redundancy by manipulating data blocks that will increase the number of duplicate values in a certain data block. Unlike traditional compression algorithms that directly look for redundant data, RGA creates as many redundant data blocks as possible.
  • RGA The general purpose of RGA is to hand data over to other compression algorithms for that matter, to provide compression ratios beyond what is currently possible. RGA can read a specific block of data of any size, analyzing the possibility of adding redundancy in several smaller parts of the data.
  • RGA utilizes a list of basic and advanced mathematical formulas to figure out how to create more redundant data in a given data block.
  • A) RGA data type which is a number between 0 and 8 (3 bits), indicating how redundancy is generated in a given data block. The higher the value, the better the redundancy generation of the RGA for its corresponding data block.
  • the first heatmap which is actually a heatmap-like map, shows the resulting redundancies between regions of a given data block.
  • RGA can detect n-bit long numbers with greater redundancy than m-bit long numbers. This will generate a heatmap of high value numbers that are more redundant in the data block.
  • the vitality of storing actual data blocks is determined by an artificial intelligence algorithm that scans previously recorded data in the RGA section to see if incoming actual data blocks will increase the RGA inventory.
  • step S209 Detect whether the data block containing redundant data is suitable for permanent storage. If yes, go to step S210; otherwise, go to step S211.
  • the permanent data storage part is used to store the data described in the data storage part of BSTDA, PPA and RGA, and stores the data determined by the algorithm driven by artificial intelligence.
  • This data is used to save the record of the actual data to complete the next compression process by the set of compression algorithms described in this application.
  • the temporary data storage (cache) section is used to store data that needs to be first analyzed and then transformed into other values. As described in the data storage chapters of BSTDA, PPA and RGA, artificial intelligence algorithms will detect whether the actual data is more suitable for storage. If it is determined that the actual data is not stored, it should be deleted.
  • compression algorithms described in this application may exist in parallel with other compression algorithms, or may be combined in whole or in part, depending on the input data.
  • This compression algorithm is designed to provide high ratio compression of data blocks.
  • This application provides a data compression training method, which is different from the traditional direct addressing redundant data compression algorithm, by analyzing the possibility of adding redundancy in the data block, using the corresponding compression algorithm to generate redundant data in the data block, Compression ratios beyond existing possibilities can be obtained, improving compression performance;
  • the data compression model training device includes corresponding hardware structures and/or software modules for performing various functions.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software with reference to the units and method steps of the examples described in the embodiments disclosed in the present application. Whether a certain function is executed by hardware or computer software drives the hardware depends on the specific application scenario and design constraints of the technical solution.
  • the device 400 may include:
  • a reading unit 44 configured to read a data block of a set size
  • An analysis unit 45 configured to analyze the possibility of adding redundancy in the data block
  • a determining unit 46 configured to determine an index number of a function that generates redundant data in the data block
  • the first generating unit 47 is configured to generate redundant data in the data block by using the function corresponding to the index number.
  • the analysis unit 45 is configured to analyze the possibility of adding redundancy in the data block according to the data type of the data block.
  • the device further includes:
  • the second generating unit 48 is configured to generate a first heat map, where the first heat map includes redundant m-bit high-value numbers in the data block, where m is a positive integer.
  • the device further includes:
  • the storage unit 49 is configured to store the redundant data in the data block.
  • the device further includes:
  • the prediction unit 43 is configured to predict the number of compressed data blocks according to a probability prediction algorithm.
  • the device further includes:
  • a compression unit 41 configured to compress a set number of data blocks, the data blocks originating from one or more files;
  • the third generation unit 42 is configured to generate a second heat map, the second heat map includes n-bit high-value numbers in the data block, n ⁇ m, n is a positive integer.
  • the storage unit 49 is further configured to delete the data block containing redundant data when it is detected that the data block containing redundant data is not suitable for permanent storage.
  • the compression unit 41, the third generation unit 42, the prediction unit 43, the second generation unit 48 and the storage unit 49 are optional units, which are indicated and connected by dotted lines in the figure.
  • one or more of the above units or units may be implemented by software, hardware or a combination of both.
  • the software exists in the form of computer program instructions and is stored in the memory, and the processor can be used to execute the program instructions and realize the above method flow.
  • the processor can be built into a system on chip (system on chip, SoC) or ASIC, or it can be an independent semiconductor chip.
  • SoC system on chip
  • the core of the processor is used to execute software instructions for calculation or processing, and may further include necessary hardware accelerators, such as field programmable gate array (field programmable gate array, FPGA), programmable logic device (programmable logic device, PLD), or a logic circuit that implements a dedicated logic operation.
  • the hardware can be CPU, microprocessor, digital signal processing (digital signal processing, DSP) chip, microcontroller unit (microcontroller unit, MCU), artificial intelligence processor, ASIC, Any one or any combination of SoC, FPGA, PLD, dedicated digital circuit, hardware accelerator or non-integrated discrete device, which can run necessary software or not depend on software to execute the above method flow.
  • DSP digital signal processing
  • MCU microcontroller unit
  • ASIC artificial intelligence processor
  • a data compression model training device provided in the embodiment of the present application, it is different from the traditional direct addressing redundant data compression algorithm, by analyzing the possibility of adding redundancy in the data block, using the corresponding compression algorithm in the data block By generating redundant data, it is possible to obtain a compression ratio beyond what was previously possible, improving compression performance.
  • the device 500 may include:
  • An input device 51, an output device 52, a memory 53 and a processor 54 (the number of processors 54 in the device may be one or more, one processor is taken as an example in FIG. 5).
  • the input device 51 , the output device 52 , the memory 53 and the processor 54 may be connected through a bus or in other ways, wherein connection through a bus is taken as an example in FIG. 5 .
  • processor 54 is used to perform the following steps:
  • a function corresponding to the index number is used to generate redundant data in the data block.
  • the processor 54 performs the step of analyzing the possibility of adding redundancy in the data block, including:
  • the possibility of adding redundancy in the data block is analyzed.
  • processor 54 is also configured to perform the following steps:
  • a first heat map is generated, the first heat map includes redundant m-bit high-value numbers in the data block, where m is a positive integer.
  • processor 54 is also configured to perform the following steps:
  • the redundant data is stored in the data block.
  • processor 54 is also configured to perform the following steps:
  • the number of compressed data blocks is predicted.
  • processor 54 is also configured to perform the following steps:
  • the second heat map includes n-bit long high-value numbers in the data block, n ⁇ m, n is a positive integer.
  • processor 54 is also configured to perform the following steps:
  • the data block containing redundant data is deleted.
  • processor in the embodiments of the present application may be a central processing unit (central processing unit, CPU), and may also be other general processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.
  • CPU central processing unit
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor can be a microprocessor, or any conventional processor.
  • a data compression model training device provided in the embodiment of the present application, it is different from the traditional direct addressing redundant data compression algorithm, by analyzing the possibility of adding redundancy in the data block, using the corresponding compression algorithm in the data block By generating redundant data, it is possible to obtain a compression ratio beyond what was previously possible, improving compression performance.
  • the method steps in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory, flash memory, read-only memory, programmable read-only memory, erasable programmable read-only memory, electrically erasable programmable read-only Memory, registers, hard disk, removable hard disk, CD-ROM or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and storage medium can be located in the ASIC.
  • the ASIC may be located in the data compression device.
  • the processor and the storage medium can also exist in the data compression device as discrete components.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product comprises one or more computer programs or instructions. When the computer program or instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are executed in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, a base station, user equipment or other programmable devices.
  • the computer program or instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be downloaded from a website, computer, A server or data center transmits to another website site, computer, server or data center by wired or wireless means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrating one or more available media.
  • the available medium may be a magnetic medium, such as a floppy disk, a hard disk, or a magnetic tape; it may also be an optical medium, such as a digital video disk; and it may also be a semiconductor medium, such as a solid state disk.
  • words such as “first” and “second” are used to distinguish the same or similar items with basically the same function and effect.
  • words such as “first” and “second” do not limit the number and execution order, and words such as “first” and “second” do not necessarily limit the difference.
  • words such as “exemplary” or “for example” are used as examples, illustrations or illustrations. Any embodiment or design scheme described as “exemplary” or “for example” in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes.
  • the use of words such as “exemplary” or “such as” is intended to present related concepts in a concrete manner for easy understanding.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本申请公开了一种数据压缩模型训练方法及装置、存储介质。该方法包括:读取设定大小的数据块;分析在所述数据块中增加冗余的可能性;确定在所述数据块中生成冗余数据的函数的索引号;以及采用所述索引号对应的函数在所述数据块中生成冗余数据。采用本申请的方案,与传统的直接寻址冗余数据的压缩算法不同,通过分析在数据块中增加冗余的可能性,采用对应压缩算法在数据块中生成冗余数据,可以获得超出现有可能性的压缩比,提高了压缩性能。

Description

数据压缩模型训练方法及装置、存储介质
本申请要求于2021年07月16日提交中国国家知识产权局、申请号为202110812042.5、发明名称为“数据压缩模型训练方法及装置、存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,尤其涉及一种数据压缩模型训练方法及装置、存储介质。
背景技术
目前存在的压缩技术是利用传统信息理论得到的算法。这导致压缩,尤其是无损压缩的结果就是查找和移除文件中的冗余数据。传统的压缩算法,即使是那些利用AI和ML的新的压缩算法,均聚焦于冗余。发现的冗余越多,压缩比会更好。
例如,哈夫曼(Huffman)和行程长度(Run-length)算法倾向于发现纯粹的冗余,意味着它们倾向于注意到一块数据(例如文本的特征),从而在更大块的数据中尽可能地发现大量的和该块数据完全相同的复制的数据。那些算法在某种程度上执行得较好,但是它们的主要的问题是它们已经发展到压缩的瓶颈,所有那些基于冗余的算法不能发现新的产生冗余的方式。
已有的方法都是基于去除或减少选出的数据块中存在的冗余。除了专注于存在的冗余,而不是产生更多的冗余,传统的压缩算法的问题实质在于它们均考虑了具有固定大小的或一定大小可变的数据块,或者考虑了仅存在于一个文件中的包含的所有的大量的数据块。并且大部分的传统的压缩算法仅执行检查小数据块中的冗余,也就是2的指数(即4,8,16,32,63,128,256字节)。
仅依赖于已有的在小块数据中发现冗余,限制了那些传统的压缩算法的性能。
发明内容
本申请提供一种数据压缩模型训练方法及装置、存储介质,以提供高比例的数据块压缩。
第一方面,提供了一种数据压缩模型训练方法,所述方法包括:
读取设定大小的数据块;
分析在所述数据块中增加冗余的可能性;
确定在所述数据块中生成冗余数据的函数的索引号;
采用所述索引号对应的函数在所述数据块中生成冗余数据。
在一种可能的实现中,所述分析在所述数据块中增加冗余的可能性,包括:
根据所述数据块的数据类型,分析在所述数据块中增加冗余的可能性。
在另一种可能的实现中,所述方法还包括:
生成第一热图,所述第一热图包括在所述数据块中冗余的m位长的高值数字,m为正整数。
在又一种可能的实现中,所述方法还包括:
将所述冗余数据存储在所述数据块中。
在又一种可能的实现中,所述方法还包括:
根据概率预测算法,预测压缩的数据块的数量。
在又一种可能的实现中,所述方法还包括:
对设定数量的数据块进行压缩,所述数据块来源于一个或多个文件;
生成第二热图,所述第二热图包括在所述数据块中的n位长的高值数字,n∠m,n为正整数。
在又一种可能的实现中,所述方法还包括:
当检测到包含冗余数据的数据块不适合永久存储,则删除所述包含冗余数据的数据块。
第二方面,提供了一种数据压缩模型训练装置,所述装置包括:
读取单元,用于读取设定大小的数据块;
分析单元,用于分析在所述数据块中增加冗余的可能性;
确定单元,用于确定在所述数据块中生成冗余数据的函数的索引号;
第一生成单元,用于采用所述索引号对应的函数在所述数据块中生成冗余数据。
在一种可能的实现中,所述分析单元,用于根据所述数据块的数据类型,分析在所述数据块中增加冗余的可能性。
在另一种可能的实现中,所述装置还包括:
第二生成单元,用于生成第一热图,所述第一热图包括在所述数据块中冗余的m位长的高值数字,m为正整数。
在又一种可能的实现中,所述装置还包括:
存储单元,用于将所述冗余数据存储在所述数据块中。
在又一种可能的实现中,所述装置还包括:
预测单元,用于根据概率预测算法,预测压缩的数据块的数量。
在又一种可能的实现中,所述装置还包括:
压缩单元,用于对设定数量的数据块进行压缩,所述数据块来源于一个或多个文件;
第三生成单元,用于生成第二热图,所述第二热图包括在所述数据块中的n位长的高值数字,n∠m,n为正整数。
在又一种可能的实现中,所述存储单元,还用于当检测到包含冗余数据的数据块不适合永久存储,则删除所述包含冗余数据的数据块。
第三方面,提供了一种数据压缩模型训练装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如第一方面或第一方面的任一种实现所述的方法。
第四方面,提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面或第一方面的任一种实现所述的方法。
采用本申请的数据压缩模型训练方案,具有如下有益效果:
与传统的直接寻址冗余数据的压缩算法不同,通过分析在数据块中增加冗余的可能性,采用对应压缩算法在数据块中生成冗余数据,可以获得超出现有可能性的压缩比,提高了压缩性能。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种数据压缩模型训练方法的流程示意图;
图2为本申请实施例提供的又一种数据压缩模型训练方法的流程示意图;
图3为本申请实施例提供的数据压缩模型训练系统示意图;
图4为本申请实施例提供的一种数据压缩模型训练装置的结构示意图;
图5为本申请实施例提供的又一种数据压缩模型训练装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供一种数据压缩训练方案,与传统的直接寻址冗余数据的压缩算法不同,通过分析在数据块中增加冗余的可能性,采用对应压缩算法在数据块中生成冗余数据,可以获得超出现有可能性的压缩比,提高了压缩性能。
如图1所示,为本申请实施例提供的一种数据压缩模型训练方法的流程示意图,该方法可以包括以下步骤:
S101、读取设定大小的数据块。
S102、分析在所述数据块中增加冗余的可能性。
S103、确定在所述数据块中生成冗余数据的函数的索引号。
S104、采用所述索引号对应的函数在所述数据块中生成冗余数据。
冗余生成算法(redundancy generator algorithm,RGA)负责通过操作数据块来生成冗余,该操作将增加某个数据块中重复值的数量。与直接寻找冗余数据的传统压缩算法不同,RGA创建尽可能多的冗余数据块。
RGA的一般目的是将数据交给与此相关的其他压缩算法,以提供超出现有可能性的压缩比。RGA可以读取任意大小的特定数据块,分析在数据的几个较小部分中增加冗余的可能性。
RGA利用基本和高级数学公式的清单来找出如何在给定的数据块中创建更多的冗余数据。
存储的该部分的数据具有如下特性:
A)RGA数据类型,这是一个介于0到8之间的数字(为3比特),表明冗余是如何在给定的数据块中产生的。数值越高,RGA对其对应数据块的冗余生成越好。
B)热图,它实际上是一种类似热图的地图,展示了给定数据块的区域之间产生的冗余。例如,在一个给定的数据块中,RGA可以检测到n位长的数字,其冗余度比m位长的数字更大。这将生成一个在数据块中更冗余的高值数字的热图。
C)用于在给定数据块中生成冗余的RGA函数的索引号如下表1所示:
表1
函数的名称 索引号
伯努利数Bn 0
欧拉数 1
黎曼ζ函数或者欧拉-黎曼ζ函数,ζ(s) 2
伽马函数 3
多阶函数m 4
Poly-对数 5
二项式系数 6
斐波纳契数 7
阶乘 8
Primization 9
最大公约数(GCD) 10
最小公倍数(LCM) 11
当然,生成冗余的RGA函数与其索引号的对应关系不限于表1,本申请对该对应关系不作限制。此外,还可以包含更多的RGA函数。
D)数据存储,必要时将冗余数据存储在实际数据块中。存储实际数据块的生命力是由人工智能算法决定的,该算法扫描RGA部分中先前记录的数据,以查看输入的实际数据块是否会增加RGA库存。
本申请提供一种数据压缩训练方法,与传统的直接寻址冗余数据的压缩算法不同,通过分析在数据块中增加冗余的可能性,采用对应压缩算法在数据块中生成冗余数据,可以获得超出现有可能性的压缩比,提高了压缩性能。
如图2所示,为本申请实施例提供的又一种数据压缩模型训练方法的流程示意图。
模型训练负责记录以各种方式进行的压缩过程。它的最终目标是使得训练后的模型增加整个压缩和解压缩进程的整体性能和比例。
该方法可以应用于如图3所示的数据压缩模型训练系统,该系统包括:
1、批量相同类型的数据分析器(the bulk same-type data analyzer,BSTDA)301;
2、概率预测算法(the probability predictor algorithm,PPA)302;
3、冗余生成算法(redundancy generator algorithm,RGA)303;
4、临时数据存储(缓存)304;
5、永久性数据存储305。
具体地,该方法可以包括以下步骤:
S201、对设定数量的数据块进行压缩,所述数据块来源于一个或多个文件。
S202、生成第二热图,所述第二热图包括在所述数据块中的n位长的高值数字,n∠m,n为正整数。
具体地,可以通过BSTDA对设定数量的数据块进行压缩,并生成第二热图。BSTDA是一种不是在某个具体的数据上、而是在大量的数据块上运用RGA和PPA的算法,其中,大量的数据块中每个数据块属于独立或非独立的文件。
不像传统的压缩技术进行压缩,每一步操作都在一个文件内进行,BSTDA倾向于对相同的具体形式的大量文件进行研究、分析和训练。
当处理大数据和大量数据压缩时,BSTDA更有用。这样可以大大提高压缩效率。
来自BSTDA的数据表示具有相同类型的文件中的数据具有相同的压缩参数。
这部分存储的数据有如下特性:
A)数据类型,即来自位图(bitmap,BMP)文件的数据。
B)索引,即每个文件形式的索引数据/值。
C)第二热图,它实际上是一种类似于热图的地图,显示给定值(二进制或十六进制)的浓度,正态分布在相同文件格式的文件中。例如,BSTDA可以在大多数.mp4文件(不包括它们的头文件)的开头检测到n位长的高值数字。这将生成一个在数据块开始时更加密集的高值数字的热图。
D)数据存储,如果需要,将输入的数据存储在实际的数据块中。存储实际数据块的生命力是由一个人工智能算法决定的,它扫描BSTDA部分中先前记录的数据,以查看输入的实际数据块是否会增加BSTDA库存。
S203、根据概率预测算法,预测压缩的数据块的数量。
具体地,可以采用PPA算法预测压缩的数据块的数量。
PPA是一种算法,预测可能压缩多少使用RGA的数据块,使用一系列大-长度变量,往往将其调查结果存储作为下一个输入数据的新的训练模式。
PPA就像一个活的有机体,它监控输入的数据块,并增加它如何能在下一次表现得更好的知识。该算法的主要目的是保证数据在下次压缩时能够以更少的时间和资源被计算机消耗。
这部分存储的数据有如下特性:
A)PPA数据类型,它是一个介于0到8之间的数字(为3个比特),表明在给定的数据块中发生的概率预测有多好。数值越高,说明PPA对其对应数据块的概率预测越好。
B)数据存储,如果需要,将输入的数据存储在实际的数据块中。存储实际数据块的生命力是由人工智能算法决定的,该算法扫描RGA部分中先前记录的数据,以查看输入的实际数据块是否会增加RGA库存。
S204、读取设定大小的数据块。
S205、根据所述数据块的数据类型,分析在所述数据块中增加冗余的可能性。
S206、确定在所述数据块中生成冗余数据的函数的索引号。
S207、采用所述索引号对应的函数在所述数据块中生成冗余数据。
S208、生成第一热图,所述第一热图包括在所述数据块中冗余的m位长的高值数字,m为正整数。
RGA负责通过操作数据块来生成冗余,该操作将增加某个数据块中重复值的数量。与直接寻找冗余数据的传统压缩算法不同,RGA创建尽可能多的冗余数据块。
RGA的一般目的是将数据交给与此相关的其他压缩算法,以提供超出现有可能性的压缩比。RGA可以读取任意大小的特定数据块,分析在数据的几个较小部分中增加冗余的可能性。
RGA利用基本和高级数学公式的清单来找出如何在给定的数据块中创建更多的冗余数据。
存储的该部分的数据具有如下特性:
A)RGA数据类型,这是一个介于0到8之间的数字(为3比特),表明冗余是如何在给定的数据块中产生的。数值越高,RGA对其对应数据块的冗余生成越好。
B)第一热图,它实际上是一种类似热图的地图,展示了给定数据块的区域之间产生的冗余。例如,在一个给定的数据块中,RGA可以检测到n位长的数字,其冗余度比m位长的数字更大。这将生成一个在数据块中更冗余的高值数字的热图。
C)用于在给定数据块中生成冗余的RGA函数的索引号如上述表1所示。
当然,生成冗余的RGA函数与其索引号的对应关系不限于表1,本申请对该对应关系不作限制。此外,还可以包含更多的RGA函数。
D)数据存储,必要时将冗余数据存储在实际数据块中。存储实际数据块的生命力是由人工智能算法决定的,该算法扫描RGA部分中先前记录的数据,以查看输入的实际数据块是否会增加RGA库存。
S209、检测包含冗余数据的数据块是否适合永久存储。若是,则进行到步骤S210;否则,进行到步骤S211。
S210、将所述冗余数据存储在所述数据块中。
永久性数据存储部分用于存储BSTDA、PPA和RGA的数据存储部分中描述的数据,存储由人工智能驱动的算法确定的数据。
该数据用于保存实际数据的记录,以由本申请描述的压缩算法集合完成下一个压缩过程。
S211、删除所述包含冗余数据的数据块。
临时数据存储(缓存)部分用于存储需要首先分析然后转换为其他值的数据。如BSTDA、PPA和RGA的数据存储章节所描述的,人工智能算法将检测实际数据是否更适合存储。如果确定实际数据不存储,则应将其删除。
同时人工智能驱动的算法决定了数据是否应该永久存储,临时数据存储(缓存)是人工智能驱动的算法以及与此压缩技术相关的其他算法存储和分析实际数据的地方。
本申请描述的压缩算法可以与其它压缩算法并行存在,也可以整体或部分地合并执行,取决于输入的数据。该压缩算法旨在提供数据块的高比例压缩。
本申请提供一种数据压缩训练方法,与传统的直接寻址冗余数据的压缩算法不同,通过分析在数据块中增加冗余的可能性,采用对应压缩算法在数据块中生成冗余数据,可以获得超出现有可能性的压缩比,提高了压缩性能;
且通过对大量数据或大数据进行压缩,提高了压缩效率。
可以理解的是,为了实现上述实施例中的功能,数据压缩模型训练装置包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。
如图4所示,为本申请提供的一种数据压缩模型训练装置的结构示意图,该装置400可包括:
读取单元44,用于读取设定大小的数据块;
分析单元45,用于分析在所述数据块中增加冗余的可能性;
确定单元46,用于确定在所述数据块中生成冗余数据的函数的索引号;
第一生成单元47,用于采用所述索引号对应的函数在所述数据块中生成冗余数据。
在一种可能的实现中,所述分析单元45,用于根据所述数据块的数据类型,分析在所述数据块中增加冗余的可能性。
在另一种可能的实现中,所述装置还包括:
第二生成单元48,用于生成第一热图,所述第一热图包括在所述数据块中冗余的m位长的高值数字,m为正整数。
在又一种可能的实现中,所述装置还包括:
存储单元49,用于将所述冗余数据存储在所述数据块中。
在又一种可能的实现中,所述装置还包括:
预测单元43,用于根据概率预测算法,预测压缩的数据块的数量。
在又一种可能的实现中,所述装置还包括:
压缩单元41,用于对设定数量的数据块进行压缩,所述数据块来源于一个或多个文件;
第三生成单元42,用于生成第二热图,所述第二热图包括在所述数据块中的n位长的高值数字,n∠m,n为正整数。
在又一种可能的实现中,所述存储单元49,还用于当检测到包含冗余数据的数据块不适 合永久存储,则删除所述包含冗余数据的数据块。
有关上述各单元的具体实现可参考图1~图3中的相应描述,在此不再赘述。
上述压缩单元41、第三生成单元42、预测单元43、第二生成单元48和存储单元49为可选的单元,图中以虚线表示和连接。
需要说明的是,以上单元或单元的一个或多个可以软件、硬件或二者结合来实现。当以上任一单元或单元以软件实现的时候,所述软件以计算机程序指令的方式存在,并被存储在存储器中,处理器可以用于执行所述程序指令并实现以上方法流程。该处理器可以内置于片上系统(system on chip,SoC)或ASIC,也可是一个独立的半导体芯片。该处理器内处理用于执行软件指令以进行运算或处理的核外,还可进一步包括必要的硬件加速器,如现场可编程门阵列(field programmable gate array,FPGA)、可编程逻辑器件(programmable logic device,PLD)、或者实现专用逻辑运算的逻辑电路。
当以上单元或单元以硬件实现的时候,该硬件可以是CPU、微处理器、数字信号处理(digital signal processing,DSP)芯片、微控制单元(microcontroller unit,MCU)、人工智能处理器、ASIC、SoC、FPGA、PLD、专用数字电路、硬件加速器或非集成的分立器件中的任一个或任一组合,其可以运行必要的软件或不依赖于软件以执行以上方法流程。
根据本申请实施例提供的一种数据压缩模型训练装置,与传统的直接寻址冗余数据的压缩算法不同,通过分析在数据块中增加冗余的可能性,采用对应压缩算法在数据块中生成冗余数据,可以获得超出现有可能性的压缩比,提高了压缩性能。
如图5所示,为本申请提供的又一种数据压缩模型训练装置的结构示意图,该装置500可包括:
输入装置51、输出装置52、存储器53和处理器54(装置中的处理器54的数量可以一个或多个,图5中以一个处理器为例)。在本申请的一些实施例中,输入装置51、输出装置52、存储器53和处理器54可通过总线或其它方式连接,其中,图5中以通过总线连接为例。
其中,处理器54用于执行以下步骤:
读取设定大小的数据块;
分析在所述数据块中增加冗余的可能性;
确定在所述数据块中生成冗余数据的函数的索引号;
采用所述索引号对应的函数在所述数据块中生成冗余数据。
在一种可能的实现中,所述处理器54执行所述分析在所述数据块中增加冗余的可能性的步骤,包括:
根据所述数据块的数据类型,分析在所述数据块中增加冗余的可能性。
在另一种可能的实现中,所述处理器54还用于执行如下步骤:
生成第一热图,所述第一热图包括在所述数据块中冗余的m位长的高值数字,m为正整数。
在又一种可能的实现中,所述处理器54还用于执行如下步骤:
将所述冗余数据存储在所述数据块中。
在又一种可能的实现中,所述处理器54还用于执行如下步骤:
根据概率预测算法,预测压缩的数据块的数量。
在又一种可能的实现中,所述处理器54还用于执行如下步骤:
对设定数量的数据块进行压缩,所述数据块来源于一个或多个文件;
生成第二热图,所述第二热图包括在所述数据块中的n位长的高值数字,n∠m,n为正 整数。
在又一种可能的实现中,所述处理器54还用于执行如下步骤:
当检测到包含冗余数据的数据块不适合永久存储,则删除所述包含冗余数据的数据块。
可以理解的是,本申请的实施例中的处理器可以是中央处理单元(central processing unit,CPU),还可以是其它通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。
根据本申请实施例提供的一种数据压缩模型训练装置,与传统的直接寻址冗余数据的压缩算法不同,通过分析在数据块中增加冗余的可能性,采用对应压缩算法在数据块中生成冗余数据,可以获得超出现有可能性的压缩比,提高了压缩性能。
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器、闪存、只读存储器、可编程只读存储器、可擦除可编程只读存储器、电可擦除可编程只读存储器、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于数据压缩装置中。当然,处理器和存储介质也可以作为分立组件存在于数据压缩装置中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、基站、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘;还可以是半导体介质,例如,固态硬盘。
在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,不同的实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。
应理解,在本申请的描述中,除非另有说明,“/”表示前后关联的对象是一种“或”的关系,例如,A/B可以表示A或B;其中A,B可以是单数或者复数。并且,在本申请的描述中,除非另有说明,“多个”是指两个或多于两个。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字 样也并不限定一定不同。同时,在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念,便于理解。
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定。

Claims (10)

  1. 一种数据压缩模型训练方法,其特征在于,所述方法包括:
    读取设定大小的数据块;
    分析在所述数据块中增加冗余的可能性;
    确定在所述数据块中生成冗余数据的函数的索引号;
    采用所述索引号对应的函数在所述数据块中生成冗余数据。
  2. 根据权利要求1所述的方法,其特征在于,所述分析在所述数据块中增加冗余的可能性,包括:
    根据所述数据块的数据类型,分析在所述数据块中增加冗余的可能性。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    生成第一热图,所述第一热图包括在所述数据块中冗余的m位长的高值数字,m为正整数。
  4. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    将所述冗余数据存储在所述数据块中。
  5. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    根据概率预测算法,预测压缩的数据块的数量。
  6. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    对设定数量的数据块进行压缩,所述数据块来源于一个或多个文件;
    生成第二热图,所述第二热图包括在所述数据块中的n位长的高值数字,n∠m,n为正整数。
  7. 根据权利要求1、2、4~6中任一项所述的方法,其特征在于,所述方法还包括:
    当检测到包含冗余数据的数据块不适合永久存储,则删除所述包含冗余数据的数据块。
  8. 一种数据压缩模型训练装置,其特征在于,所述装置包括:
    读取单元,用于读取设定大小的数据块;
    分析单元,用于分析在所述数据块中增加冗余的可能性;
    确定单元,用于确定在所述数据块中生成冗余数据的函数的索引号;
    第一生成单元,用于采用所述索引号对应的函数在所述数据块中生成冗余数据。
  9. 一种数据压缩模型训练装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1~7中任一项所述的方法。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1~7中任一项所述的方法。
PCT/CN2022/105929 2021-07-16 2022-07-15 数据压缩模型训练方法及装置、存储介质 WO2023284851A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110812042.5 2021-07-16
CN202110812042.5A CN113687773B (zh) 2021-07-16 2021-07-16 数据压缩模型训练方法及装置、存储介质

Publications (1)

Publication Number Publication Date
WO2023284851A1 true WO2023284851A1 (zh) 2023-01-19

Family

ID=78577319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/105929 WO2023284851A1 (zh) 2021-07-16 2022-07-15 数据压缩模型训练方法及装置、存储介质

Country Status (2)

Country Link
CN (1) CN113687773B (zh)
WO (1) WO2023284851A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113687773B (zh) * 2021-07-16 2023-08-11 深圳智慧林网络科技有限公司 数据压缩模型训练方法及装置、存储介质
CN117313562B (zh) * 2023-11-30 2024-02-27 西华大学 适用于机载防撞系统的逻辑表压缩方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030065656A1 (en) * 2001-08-31 2003-04-03 Peerify Technology, Llc Data storage system and method by shredding and deshredding
CN112506880A (zh) * 2020-12-18 2021-03-16 深圳智慧林网络科技有限公司 数据处理方法及相关设备
CN112994701A (zh) * 2019-12-02 2021-06-18 阿里巴巴集团控股有限公司 数据压缩方法、装置、电子设备及计算机可读介质
CN113055017A (zh) * 2019-12-28 2021-06-29 华为技术有限公司 数据压缩方法及计算设备
CN113687773A (zh) * 2021-07-16 2021-11-23 深圳智慧林网络科技有限公司 数据压缩模型训练方法及装置、存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5764357A (en) * 1996-04-12 1998-06-09 Vlsi Technology, Inc. Zero-run-length encoder with shift register
US7093182B2 (en) * 1999-08-02 2006-08-15 Inostor Corporation Data redundancy methods and apparatus
CN100390790C (zh) * 2002-05-10 2008-05-28 甲骨文国际公司 存储和访问数据,以及提高数据库查询语言语句性能的方法和机制
US9418450B2 (en) * 2006-08-31 2016-08-16 Ati Technologies Ulc Texture compression techniques
CN102831245B (zh) * 2012-09-17 2015-08-05 洛阳翔霏机电科技有限责任公司 一种关系型数据库的实时数据存储和读取方法
US10037245B2 (en) * 2016-03-29 2018-07-31 International Business Machines Corporation Raid system performance enhancement using compressed data and byte addressable storage devices
US10809941B2 (en) * 2019-03-11 2020-10-20 International Business Machines Corporation Multi-tiered storage

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030065656A1 (en) * 2001-08-31 2003-04-03 Peerify Technology, Llc Data storage system and method by shredding and deshredding
CN112994701A (zh) * 2019-12-02 2021-06-18 阿里巴巴集团控股有限公司 数据压缩方法、装置、电子设备及计算机可读介质
CN113055017A (zh) * 2019-12-28 2021-06-29 华为技术有限公司 数据压缩方法及计算设备
CN112506880A (zh) * 2020-12-18 2021-03-16 深圳智慧林网络科技有限公司 数据处理方法及相关设备
CN113687773A (zh) * 2021-07-16 2021-11-23 深圳智慧林网络科技有限公司 数据压缩模型训练方法及装置、存储介质

Also Published As

Publication number Publication date
CN113687773A (zh) 2021-11-23
CN113687773B (zh) 2023-08-11

Similar Documents

Publication Publication Date Title
WO2023284851A1 (zh) 数据压缩模型训练方法及装置、存储介质
US9235651B2 (en) Data retrieval apparatus, data storage method and data retrieval method
US7587401B2 (en) Methods and apparatus to compress datasets using proxies
US10817474B2 (en) Adaptive rate compression hash processor
US8515882B2 (en) Efficient storage of individuals for optimization simulation
EP3051700A1 (en) Hardware efficient fingerprinting
Zou et al. Performance optimization for relative-error-bounded lossy compression on scientific data
Knorr et al. ndzip-gpu: efficient lossless compression of scientific floating-point data on GPUs
Barbarioli et al. Hierarchical residual encoding for multiresolution time series compression
US20230076729A2 (en) Systems, methods and devices for eliminating duplicates and value redundancy in computer memories
Sarangi et al. Canonical huffman decoder on fine-grain many-core processor arrays
US12001237B2 (en) Pattern-based cache block compression
Krishna et al. On Compressing Time-Evolving Networks
Kim et al. Low-overhead compressibility prediction for high-performance lossless data compression
CN113659992B (zh) 数据压缩方法及装置、存储介质
Li et al. An efficient and fast VLIW compression scheme for stream processor
CN105843837B (zh) 硬件有效的拉宾指纹识别
Ozsoy Culzss-bit: A bit-vector algorithm for lossless data compression on gpgpus
Wu et al. Accelerating a lossy compression method with fine-grained parallelism on a gpu
US11868644B1 (en) Techinques for tracking frequently accessed memory
Wu et al. CAMP: A new bitmap index for data retrieval in traffic archival
Khairi et al. Design of a Huffman Data Encoder Architecture.
Kundeti PaKman+: Fast Distributed Sequence Assembly with a Concurrent K-Mer Counting Algorithm
Sarangi Hardware Architectures for Lossless Compression
TWI401614B (zh) 資料轉換方法及資料轉換裝置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22841484

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE